Compare commits

...

52 Commits

Author SHA1 Message Date
lxb 20355b24bf 主动连接方式增加公告的显示,客户端发送方式的公告支持连接不到的时候也可以显示服务器的公告 5 months ago
lxb 15690321ee 增加主动模式下CPU信息获取 5 months ago
lxb f3314388ea 新增主动连接服务器的方式,可以客户端发送的方式同时使用 5 months ago
lxb ca5d2fa7b2 增加存储空间增加可以看剩余空间 6 months ago
lxb c4cddfb165 修改了下Readme,网页下方增加一个footer 9 months ago
lxb 65ee80f50c 更新readme和版本 9 months ago
lxb fac97ee26d 修复了一下error_dict全是memory的问题 9 months ago
lxb a326bfc793 调整了一下,过期时也显示公告 9 months ago
lxb 80b4cd711b 增加公告的显示 10 months ago
lxb fe0c86f85c update 12 months ago
lxb e36ad1739a update 1 year ago
lxb ab4c849d21 update 1 year ago
lxb 6fe811458e update 更新可以通过配置文件修改api的名称 1 year ago
lxb bc3d4d5d70 update 1 year ago
lxb 3578785df6 fixbug 1 year ago
lxb 768edbf8fe 增加可以通过参数修改配置文件的路径 1 year ago
lxb 052f1d5612 打包了服务器和客户端的python直接执行 1 year ago
lxb aa07e0d592 update 1 year ago
lxb 0e5e8e9b56 update 1 year ago
lxb 4cdd195a1a update 1 year ago
lxb d4c40042d0 增加了cpu的显示 1 year ago
lxb f3d8b1298a 增加网络的简单显示 1 year ago
lxb fd3d5b1959 基本实现了内存、存储攻坚、GPU的显示 1 year ago
lxb 75164c0b4c 修改了一下结构 1 year ago
lxb 61b56e8339 update 1 year ago
lxb cf61103462 update 1 year ago
lxb 73d215bcf4 除了网络数据外都能获取了 1 year ago
lxb 11c7158d53 重新改造 1 year ago
lxb cec4b4028a fixbug. ssh异常时继续泡池,解决不重连的问题 1 year ago
lxb be437429f3 Merge branch 'master' of http://git.lxblxb.top/lxb/Tool_CheckGPUsWeb 1 year ago
lxb 05a85a09d8 增加可以不显示gpu 1 year ago
lxbhahaha 718b26cfca 增加使用情况的按大小排序,并且过滤50MB以下的 1 year ago
lxbhahaha c6570b007b 增加用户使用情况 1 year ago
lxbhahaha d1509a5609 更新readme 1 year ago
lxbhahaha 33c96824d4 更新readme 1 year ago
lxbhahaha 4d483a2c8c 修改网速的显示 1 year ago
lxbhahaha ca9682a4af 修复:最后一条分割线删除,点击checkbox后立刻刷新 1 year ago
lxbhahaha 07d0422095 增加了显示选项的开光 1 year ago
lxbhahaha 9c3127595e 更新一下模板 1 year ago
lxbhahaha 7280ca69d3 增加网速的查看 1 year ago
lxbhahaha c116a81c2c update 1 year ago
lxbhahaha a5631fe369 修改实现方式 1 year ago
鱼骨剪 4b940a89af 修改了一下实现方式 1 year ago
鱼骨剪 7014aa37d2 修改了一下标题的显示方式 1 year ago
鱼骨剪 fad4dce56a 增加了内存的显示 1 year ago
鱼骨剪 ddb945fd5d 完善了一下存储空间的显示 1 year ago
鱼骨剪 a0f28e1a84 初步增加了存储空间的显示 1 year ago
鱼骨剪 f0904e3893 增加显示当前数据的时间,更详细的错误信息 1 year ago
鱼骨剪 3e4ee65cc0 update 1 year ago
鱼骨剪 f1d627f718 增加了颜色显示 1 year ago
鱼骨剪 62aad057cb 初步实现了服务器数据的显示 1 year ago
鱼骨剪 8b26843851 初步实现获取服务器GPU数据 1 year ago
  1. 10
      .gitignore
  2. 153
      README.md
  3. 321
      active_connector.py
  4. 38
      app.py
  5. 221
      client.py
  6. 14
      client_config.json
  7. 58
      data_define/client_data_example.json
  8. 62
      data_define/server_data_example.json
  9. 28
      index.html
  10. BIN
      pics/demo.png
  11. 85
      server.py
  12. 27
      server_config.json
  13. 1
      version.py
  14. 103
      web/css/style_1.css
  15. 27
      web/index.html
  16. 291
      web/js/script.js

10
.gitignore

@ -0,0 +1,10 @@
serverList.json
**/__pycache__/
__pycache__/
.vscode/
# build
build/
dist/
*.spec

153
README.md

@ -0,0 +1,153 @@
# 1. 简介
在网页上同时查看多个服务器的信息(CPU、网络、内存、硬盘、显卡)
分为服务端和客户端,客户端向服务端发送本机数据,服务端整理所有客户端的数据,**服务端不再需要保存客户端的密钥**。
![](pics/demo.png)
**Tips:** 将鼠标悬浮在`网络`、`CPU占用率`、`GPU使用情况的用户`上时可以查看更详细的信息。
# 2. 开发环境
可在conda中安装虚拟环境,linux和windows都可以。
```bash
pip install flask flask-cors psutil -i https://pypi.tuna.tsinghua.edu.cn/simple
```
# 3. 运行部署
客户端的机器上 **可能** 还需要通过APT安装一下`gpustat`,最好是`1.1.1`版的,`0.6.0`貌似会出问题。
可以使用`pyinstaller`将python程序打包得到客户端和服务端的可执行程序,则不再需要安装运行环境。如果不打包直接用python执行的话就要安装前面的开发环境。
```bash
pyinstaller --onefile client.py
pyinstaller --onefile server.py
```
执行命令之后,可以`dist`目录内找到两个可执行文件,将`client`文件放到客户端的合适的地方,`server`放到服务器的合适的地方。客户端指的就是需要获取数据的机器,服务端就是网页所在的机器。
以及放置对应的`client_config.json`和`server_config.json`。
## 3.1. 服务器
执行以下命令即可,注意server和json要改为实际的路径。可以用screen或者systemctl来保持后台执行,推荐使用systemctl实现开机自启。
```bash
/home/lxb/projects/Tool_CheckGPUsWeb/dist/server --cfg /home/lxb/projects/Tool_CheckGPUsWeb/server_config.json
```
其中`server_config.json`的内容如下:
```json
{
"host": "0.0.0.0",
"port": 15002,
"server_list":["76", "174", "233", "222"],
"note_dict":{
"76" : "这是一个公告内容"
},
"api_name": "api"
}
```
- host:不用改。
- port:改成合适的端口,服务器记得要开放这个端口。
- server_list:所查询的服务器名称列表,客户端访问的时候只有下列对应的名称才会被处理。
- note_dict:公告字典,可以给对应服务器显示公告。
- api_name:api的名称,保持服务器、客户端和nginx的设置统一即可。
修改配置文件之后需要**重启**程序才能生效。
## 3.2. 客户端
执行以下命令即可,注意client和json要改为实际的路径。可以用screen或者systemctl来保持后台执行,推荐使用systemctl实现开机自启。
```bash
/home/lxb/projects/ServerInfo-client/client --cfg /home/lxb/projects/ServerInfo-client/client_config.json
```
其中`client_config.json`中的内容如下,
```json
{
"server_url": "http://10.1.16.174:15001",
"title": "174",
"interval": 3.0,
"note": "",
"enable": ["gpu", "cpu", "memory", "storage", "network"],
"storage_list":[
"/",
"/media/D",
"/media/E",
"/media/F"
],
"api_name": "api"
}
```
- server_url:访问服务器的路径,即IP+端口,根据情况修改。(按理来说是直接15002,但是用了nginx之后会根据设置将api的请求转发到15002上,nginx监听的15001所以这里就是15001)
- title:客户端本机的名称,只有这个名称在服务器的server_list中才会被处理。(那是不是任一个客户端程序设置到其他的title就会干扰其他服务器数据的显示?**是的**,除了这个title没有做其他验证,只能是人工确保一下每个客户端有单独的title)
- interval:获取信息的间隔
- note:公告,会与在服务器那边设置的公告合并显示。
- enable:开启检测的内容,如果不需要检测某个数据删除掉即可,目前只支持`gpu`、`cpu`、`memory`、`storage`、`network`。
- storage_list:检测的硬盘的路径,需要检测哪条路径就加到这个上面。
- api_name:与服务器、nginx的设置保持一致即可,不然无法正常访问。
修改配置文件之后需要**重启**程序才能生效。
## 3.3. 网页部署
> **注意**,网页部署这一块的文档可能还存在着不少问题,主要是当时是用GPT等工具辅助的搞的,现在再写这个文档已经过了好久细节记不清了。还有nginx这里写的也是用容器,但是好像不用容器的话会更好一些。
>
> 这部分文档还需要优化。
可以使用docker运行一个nginx的容器来简单的部署这个网页。
首先安装docker,安装完之后可执行命令`docker run -d -p 80:80 -v /home/lxb/nginx_gpus:/usr/share/nginx/html --name nginx_gpus nginx:latest`,注意**按需修改命令**,具体可修改内容如下。
```bash
docker run -d \
-p <宿主机上映射的端口>:80 \
-v <宿主机上数据卷的位置>:/usr/share/nginx/html \
--name <容器名称> \
nginx:latest
```
将web目录下的内容放到数据卷的对应位置。
另外nginx的配置如下:
```
server {
listen 15001;
listen [::]:15001;
# server_name *;
location / {
root /usr/share/nginx/html/serverInfo_web;
index index.html index.htm;
try_files $uri $uri/ =404;
}
location /api/ {
proxy_pass http://localhost:15002;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
set $env "internal";
add_header X-Environment $env;
}
```
也就是直接访问服务器的15001端口可以访问网页,然后通过访问对应的`api`接口会转发到15002端口上,也就是flask的端口。
最后,有域名的话也可以搞一个反向代理,可参考 [服务器上使用Nginx部署网页+反向代理](http://blog.lxblxb.top/archives/1723257245091)(现在看来这个博客可能写的是有些问题的)。
**特别的**,`web/js/script.js`中有如下的代码,是通过`internal`来修改访问的URL,以实现访问内网的时候就采用内网的地址访问服务器,避免在内网的时候访问公网。这部分也要按需的修改一下。如果只是在内网访问则直接让`apiURL`等于内网地址即可。
```js
// 根据环境变量设置API URL
if (environment === 'internal') {
apiURL = 'http://10.1.16.174:15001';
} else {
apiURL = 'http://gpus.lxblxb.top';
}
```
以及反向代理的nginx那边也需要一些配置,大概是设置下面的内容。对应内网nginx设置的`internal`。
```
set $env "public";
add_header X-Environment $env;
```
# 4. 其他
- `永辉`帮忙搞了一下顶部checkbox布局的问题(在这一版中没有加上checkbox)。
- 参考`治鹏`的方法加了每张显卡的用户使用的情况。

321
active_connector.py

@ -0,0 +1,321 @@
from version import version
import threading
import paramiko
import time
import json
import re
class Connector:
def __init__(self, data_dict : dict, server_cfg : dict, note_dict : dict, lock : threading.Lock, connect_check_interval : float, reconnect_interval : float, multiple_of_timeout : float = 2):
self.data_dict = data_dict
self.tmp_data_dict = dict()
self.server_cfg = server_cfg
self.note_dict = note_dict
self.lock = lock
self.tmp_lock = threading.Lock()
self.connect_check_interval = connect_check_interval
self.reconnect_interval = reconnect_interval
self.multiple_of_timeout = multiple_of_timeout
def run(self):
# 开启查询线程
for i, server_data in enumerate(self.server_cfg):
self.tmp_data_dict[server_data['title']] = {}
# self.tmp_data_dict[server_data['title']]['server_data'] = server_data
thread_check = threading.Thread(target=self.__keep_check_one, args=(server_data, self.tmp_data_dict, server_data['title'], self.connect_check_interval, self.reconnect_interval))
thread_check.daemon = True
thread_check.start()
# 开启同步数据线程
thread_transmit = threading.Thread(target=self.__transmit_data, args=(self.connect_check_interval,))
thread_transmit.daemon = True
thread_transmit.start()
# 持续获取一个服务器的信息
def __keep_check_one(self, server: dict, shared_data_list: dict, server_title: str, interval: float, re_connect_time: float=5):
# 处理一下需要检查的存储空间路径
if not 'storage_list' in server:
server['storage_list'] = []
if not '/' in server['storage_list']:
server['storage_list'].insert(0, '/')
re_try_count = 0
# 循环连接
while True:
try:
# 建立SSH连接
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(server['ip'], port=server['port'], username=server['username'], password=server.get('password', None), key_filename=server.get('key_filename', None), timeout=interval*self.multiple_of_timeout)
with self.tmp_lock:
if 'error_dict' in shared_data_list[server_title]:
shared_data_list[server_title].pop('error_dict')
re_try_count = 0
# 循环检测
keep_run = True
while keep_run:
try:
error_info_dict = dict()
# 网络 信息
network_info = self.__get_network_info(client, interval*self.multiple_of_timeout, server.get('network_interface_name', None), error_info_dict)
# CPU 信息
cpu_info = self.__get_cpu_info(client, interval*self.multiple_of_timeout, error_info_dict)
# 内存 信息
memory_info = self.__get_memory_info(client, interval*self.multiple_of_timeout, error_info_dict)
# 存储空间 信息
storage_info = self.__get_storage_info(client, interval*self.multiple_of_timeout, server['storage_list'], error_info_dict)
# GPU 信息
gpu_info = self.__get_gpus_info(client, interval*self.multiple_of_timeout, error_info_dict, ignore_gpu=server.get('ignore_gpu', False))
# 记录信息
with self.tmp_lock:
# 添加公告
if server_title in self.note_dict:
shared_data_list[server_title]['note'] = self.note_dict[server_title]
shared_data_list[server_title]['interval'] = interval
shared_data_list[server_title]['title'] = server_title
shared_data_list[server_title]['version'] = version
shared_data_list[server_title]['update_time_stamp'] = int(time.time())
if cpu_info is not None:
shared_data_list[server_title]['cpu'] = cpu_info
if memory_info is not None:
shared_data_list[server_title]['gpu_list'] = gpu_info
if memory_info is not None:
shared_data_list[server_title]['memory'] = memory_info
if storage_info is not None:
shared_data_list[server_title]['storage_list'] = storage_info
if network_info is not None:
shared_data_list[server_title]['network_list'] = [{
'default' : False,
'name' : server['network_interface_name'],
'in' : network_info['in'],
'out' : network_info['out'],
}]
if len(error_info_dict) > 0:
shared_data_list[server_title]['error_dict'] = error_info_dict
except Exception as e:
keep_run = False
with self.tmp_lock:
shared_data_list[server_title]['error_dict']['connect'] = f'{e}'
if 'gpu_list' in shared_data_list[server_title]:
shared_data_list[server_title].pop('gpu_list')
time.sleep(interval)
# 关闭连接
client.close()
except Exception as e:
if 'error_dict' not in shared_data_list[server_title]:
shared_data_list[server_title]['error_dict'] = dict()
shared_data_list[server_title]['error_dict']['connect'] = f'retry:{re_try_count}, {e}'
time.sleep(re_connect_time)
re_try_count += 1
# 持续的将主动获取的服务器信息同步到最终信息中
def __transmit_data(self, interval):
# 等待一点时间再开始
time.sleep(interval * 0.5)
while True:
time.sleep(interval)
# 将临时字典中的内容移动到正式数据中
with self.tmp_lock:
with self.lock:
for k, v in self.tmp_data_dict.items():
self.data_dict[k] = v
#region 获取信息的方法
def __get_cpu_info(self, client, timeout, info_dict:dict=None):
def get_cpu_temp_via_sysfs(client):
try:
command = r'''
for zone in /sys/class/thermal/thermal_zone*; do
if [ -f "$zone/type" ] && [ -f "$zone/temp" ]; then
type=$(cat "$zone/type")
temp=$(cat "$zone/temp")
echo "$type:$temp"
fi
done
'''
stdin, stdout, stderr = client.exec_command(command)
output = stdout.read().decode().strip()
temperatures = []
for line in output.splitlines():
if ':' in line:
type_str, temp_str = line.split(':', 1)
if 'cpu' in type_str.lower() or 'pkg' in type_str.lower():
try:
temp = int(temp_str.strip())
temperatures.append(round(temp / 1000, 1)) # 转为摄氏度
except ValueError:
continue
return temperatures
except Exception as e:
return []
def get_cpu_avg_usage_via_procstat(client, interval_sec=0.3):
def read_cpu_stat_line():
stdin, stdout, _ = client.exec_command("grep '^cpu ' /proc/stat")
line = stdout.read().decode().strip()
parts = list(map(int, line.split()[1:]))
idle = parts[3] + parts[4] # idle + iowait
total = sum(parts)
return idle, total
try:
idle1, total1 = read_cpu_stat_line()
time.sleep(interval_sec)
idle2, total2 = read_cpu_stat_line()
total_delta = total2 - total1
idle_delta = idle2 - idle1
if total_delta == 0:
return 0.0
usage = 100.0 * (1.0 - idle_delta / total_delta)
return round(usage, 1)
except Exception as e:
return None
try:
result = dict()
# 1. 获取 CPU 型号
stdin, stdout, stderr = client.exec_command("lscpu")
lscpu_output = stdout.read().decode()
model_match = re.search(r"Model name\s*:\s*(.+)", lscpu_output)
if model_match:
result["name"] = model_match.group(1).strip()
else:
# 如果没有找到“Model name”,则尝试匹配“型号名称”
model_name_match_cn = re.search(r'型号名称\s*:\s*(.+)', lscpu_output)
if model_name_match_cn:
result["name"] = model_name_match_cn.group(1).strip()
else:
result["name"] = "未知CPU"
# 2. 获取 CPU 温度
result["temperature_list"] = get_cpu_temp_via_sysfs(client)
# 3. 获取 CPU 占用率
result["core_avg_occupy"] = get_cpu_avg_usage_via_procstat(client)
result["core_occupy_list"] = [-1] # TODO 用现在这种方法去查询所有核心的消耗比较大,也没什么用处,暂时不查了
return result
except paramiko.ssh_exception.SSHException as e:
# ssh 的异常仍然抛出
raise
except Exception as e:
if info_dict is not None:
info_dict['cpu'] = f'{e}'
return None
def __get_memory_info(self, client, timeout, info_dict:dict=None):
try:
stdin, stdout, stderr = client.exec_command('free', timeout=timeout)
output = stdout.read().decode().split('\n')[1]
if output == "":
return None
data = output.split()
result = {
"total": int(data[1]),
"used": int(data[2])
}
return result
except paramiko.ssh_exception.SSHException as e:
# ssh 的异常仍然抛出
raise
except Exception as e:
if info_dict is not None:
info_dict['memory'] = f'{e}'
return None
def __get_storage_info(self, client, timeout, path_list, info_dict:dict=None):
try:
result = []
for target_path in path_list:
stdin, stdout, stderr = client.exec_command(f'df {target_path} | grep \'{target_path}\'', timeout=timeout)
output = stdout.read().decode()
if output == "":
continue
data = output.split()
tmp_res = {
"path": target_path,
"total": int(data[1]),
"available": int(data[3])
}
result.append(tmp_res)
return result
except paramiko.ssh_exception.SSHException as e:
# ssh 的异常仍然抛出
raise
except Exception as e:
if info_dict is not None:
info_dict['storage'] = f'{e}'
return None
def __get_network_info(self, client, timeout, interface_name, info_dict:dict=None):
try:
if interface_name is None:
return None
stdin, stdout, stderr = client.exec_command(f'ifstat -i {interface_name} 0.1 1', timeout=timeout)
output = stdout.read().decode().split('\n')[2]
data = output.split()
result = {
"in": float(data[0]),
"out": float(data[1])
}
return result
except paramiko.ssh_exception.SSHException as e:
# ssh 的异常仍然抛出
raise
except Exception as e:
if info_dict is not None:
info_dict['network'] = f'{e}'
return None
def __get_gpus_info(self, client, timeout, info_dict:dict=None, ignore_gpu=False):
if ignore_gpu:
return None
try:
stdin, stdout, stderr = client.exec_command("gpustat --json")
output = stdout.read().decode()
gpus_info = json.loads(output)
result = []
for gpu_info in gpus_info['gpus']:
# 处理一下
gpu_name = gpu_info['name'].replace('NVIDIA ', '').replace('GeForce ', '')
process_list = []
for process_info in gpu_info.get('processes', []):
cmd = process_info.get('command', '')
if 'full_command' in process_info:
cmd = ' '.join(process_info["full_command"])
process_list.append({
"user": process_info.get('username'),
"memory": process_info.get('gpu_memory_usage'),
"cmd": cmd
})
# 加到list中
result.append({
"idx": gpu_info['index'],
"name": gpu_name,
"temperature": gpu_info['temperature.gpu'],
"used_memory": gpu_info['memory.used'],
"total_memory": gpu_info['memory.total'],
"utilization": gpu_info['utilization.gpu'],
"process_list": process_list
})
return result
except paramiko.ssh_exception.SSHException as e:
# ssh 的异常仍然抛出
raise
except Exception as e:
if info_dict is not None:
info_dict['gpu'] = f'{e}'
return None
#endregion

38
app.py

@ -1,38 +0,0 @@
from flask import Flask, jsonify
from flask_cors import CORS
import threading
import paramiko
import time
#region 全局
app = Flask(__name__)
CORS(app)
port = 15002
#endregion
#region 接口
# 测试用
@app.route('/')
def hello():
return 'hi. —— CheckGPUsWeb'
@app.route('/data', methods=['GET'])
def get_data():
data = {'name': 'John', 'age': 25, 'city': 'New York'}
return jsonify(data)
# 开始连接服务器
def connect_server():
pass
#endregion
# 测试
def test():
app.run(debug=True, port=port)
if __name__ == '__main__':
test()

221
client.py

@ -0,0 +1,221 @@
import os
import json
import time
import psutil
import argparse
import requests
import subprocess
from version import version
# region get data
# 获取显卡相关信息
def get_gpus_info(error_dict):
result_list = list()
try:
gpus_info = json.load(os.popen('gpustat --json'))
for gpu_info in gpus_info['gpus']:
# 处理一下
gpu_name = gpu_info['name']
gpu_name = gpu_name.replace('NVIDIA ', '').replace('GeForce ', '')
process_list = list()
for process_info in gpu_info['processes']:
cmd = process_info['command']
if 'full_command' in process_info:
cmd = ' '.join(process_info["full_command"])
process_list.append({
"user": process_info['username'],
"memory": process_info['gpu_memory_usage'],
"cmd": cmd
})
# 加到list中
result_list.append({
"idx": gpu_info['index'],
"name": gpu_name,
"temperature": gpu_info['temperature.gpu'],
"used_memory": gpu_info['memory.used'],
"total_memory": gpu_info['memory.total'],
"utilization": gpu_info['utilization.gpu'],
"process_list": process_list
})
except Exception as e:
error_dict['gpu'] = f'{e}'
return result_list
# 获取cpu相关信息
cpu_name = None
def get_cpu_info(error_dict):
result_dict = dict()
try:
# 获取cpu型号
global cpu_name
def get_cpu_name():
if cpu_name == None:
import re
# 执行lscpu命令并获取输出
result = subprocess.run(['lscpu'], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
output = result.stdout
# 使用正则表达式匹配“Model name”或“型号名称”
model_name_match = re.search(r'Model name\s*:\s*(.+)', output)
if model_name_match:
return model_name_match.group(1).strip()
else:
# 如果没有找到“Model name”,则尝试匹配“型号名称”
model_name_match_cn = re.search(r'型号名称\s*:\s*(.+)', output)
if model_name_match_cn:
return model_name_match_cn.group(1).strip()
else:
return "CPU型号信息未找到"
else:
return cpu_name
cpu_name = get_cpu_name()
# 获取每个cpu的温度
temperature_list = list()
temperatures = psutil.sensors_temperatures()
if 'coretemp' in temperatures:
for entry in temperatures['coretemp']:
if entry.label.startswith('Package'):
temperature_list.append(entry.current)
# 记录信息
result_dict["name"] = cpu_name
result_dict["temperature_list"] = temperature_list
result_dict["core_avg_occupy"] = psutil.cpu_percent(interval=None, percpu=False)
result_dict["core_occupy_list"] = psutil.cpu_percent(interval=None, percpu=True)
except Exception as e:
error_dict['cpu'] = f'{e}'
return result_dict
# 获取存储相关信息
def get_storages_info(error_dict, path_list):
result_list = list()
try:
for target_path in path_list:
data = subprocess.run(['df', target_path, '|', 'grep', target_path], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True).stdout
data = data.split('\n')[1].split()
tmp_res = {
"path": target_path,
"total": int(data[1]),
"available": int(data[3])
}
result_list.append(tmp_res)
except Exception as e:
error_dict['storage'] = f'{e}'
return result_list
# 获取内存相关信息
def get_memory_info(error_dict):
result_dict = dict()
try:
mem = psutil.virtual_memory()
result_dict["total"] = mem.total / 1024
result_dict["used"] = mem.used / 1024
except Exception as e:
error_dict['memory'] = f'{e}'
return result_dict
# 获取网络相关信息
last_network_stats = None
last_network_time = None
def get_networks_info(error_dict):
result_list = list()
try:
global last_network_stats
global last_network_time
current_stats = psutil.net_io_counters(pernic=True)
if last_network_stats is None:
# 第一次检测
for k in current_stats.keys():
if k == 'lo':
continue
result_list.append({
"name": k,
"default": False,
"in": 0,
"out": 0
})
else:
time_interval = time.time() - last_network_time
for k in current_stats.keys():
if k == 'lo':
continue
result_list.append({
"name": k,
"default": False,
"in": (current_stats[k].bytes_recv - last_network_stats[k].bytes_recv) / time_interval / 1000,
"out": (current_stats[k].bytes_sent - last_network_stats[k].bytes_sent) / time_interval / 1000
})
# 记录信息下次用
last_network_stats = current_stats
last_network_time = time.time()
except Exception as e:
error_dict['network'] = f'{e}'
return result_list
# endregion
client_cfg = None
def collect_data():
result_dict = dict()
error_dict = dict()
# 根据设置采集信息
if 'gpu' in client_cfg['enable']:
result_dict['gpu_list'] = get_gpus_info(error_dict)
if 'cpu' in client_cfg['enable']:
result_dict['cpu'] = get_cpu_info(error_dict)
if 'storage' in client_cfg['enable']:
result_dict['storage_list'] = get_storages_info(error_dict, client_cfg['storage_list'])
if 'memory' in client_cfg['enable']:
result_dict['memory'] = get_memory_info(error_dict)
if 'network' in client_cfg['enable']:
result_dict['network_list'] = get_networks_info(error_dict)
# 记录其他信息
result_dict['update_time_stamp'] = int(time.time())
result_dict['error_dict'] = error_dict
result_dict['note'] = client_cfg['note']
result_dict['title'] = client_cfg['title']
result_dict['interval'] = client_cfg['interval']
result_dict['version'] = version
return result_dict
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--cfg', default='client_config.json', type=str, help='the path of config json.')
args = parser.parse_args()
# 加载配置文件
cfg_path = args.cfg
global client_cfg
with open(cfg_path, 'r') as f:
client_cfg = json.load(f)
# 持续发送
send_interval = client_cfg['interval']
api_name = client_cfg['api_name']
api_url = client_cfg['server_url'] + f'/{api_name}/update_data'
while True:
data = collect_data()
try:
result = requests.post(api_url, json=data)
except Exception as e:
print(e)
time.sleep(send_interval)
if __name__ == '__main__':
main()

14
client_config.json

@ -0,0 +1,14 @@
{
"server_url": "http://10.1.16.174:15001",
"title": "174",
"interval": 3.0,
"note": "",
"enable": ["gpu", "cpu", "memory", "storage", "network"],
"storage_list":[
"/",
"/media/D",
"/media/E",
"/media/F"
],
"api_name": "api"
}

58
data_define/client_data_example.json

@ -0,0 +1,58 @@
{
"title": "server title",
"update_time_stamp": "1673082950",
"note": "some note",
"interval": 3.0,
"error_dict":{
"gpu": "some error",
"cpu": "some error"
},
"gpu_list":[
{
"idx": 0,
"name": "RTX 3090",
"temperature": 100,
"used_memory": 1000,
"total_memory": 10240,
"utilization": 34,
"process_list":[
{
"user": "lxb",
"memory": 100,
"cmd": "python run.py"
}
]
}
],
"cpu":
{
"name": "i5 6500",
"temperature_list": [50, 30],
"core_avg_occupy": 31.25,
"core_occupy_list":[
12,
23,
0,
90
]
},
"storage_list":[
{
"path": "/media/F",
"available": 211108624,
"total": 5813178480
}
],
"memory":{
"total": 1935468,
"used": 1382196
},
"network_list":[
{
"name": "eth0",
"default": true,
"in": 67.8,
"out": 12.3
}
]
}

62
data_define/server_data_example.json

@ -0,0 +1,62 @@
{
"server_dict":{
"174":{
"title": "server title",
"update_time_stamp": "1673082950",
"note": "some note",
"interval": 3.0,
"error_dict":{
"gpu": "some error",
"cpu": "some error"
},
"gpu_list":[
{
"idx": 0,
"name": "RTX 3090",
"temperature": 100,
"used_memory": 1000,
"total_memory": 10240,
"utilization": 34,
"process_list":[
{
"user": "lxb",
"memory": 100,
"cmd": "python run.py"
}
]
}
],
"cpu":
{
"name": "i5 6500",
"temperature_list": [50, 30],
"core_avg_occupy": 31.25,
"core_occupy_list":[
12,
23,
0,
90
]
},
"storage_list":[
{
"path": "/media/F",
"available": 211108624,
"total": 5813178480
}
],
"memory":{
"total": 1935468,
"used": 1382196
},
"network_list":[
{
"name": "eth0",
"default": true,
"in": 67.8,
"out": 12.3
}
]
}
}
}

28
index.html

@ -1,28 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Fetch JSON from Flask</title>
</head>
<body>
<h1>Fetch JSON from Flask Example</h1>
<button onclick="fetchData()">Fetch Data</button>
<div id="output"></div>
<script>
function fetchData() {
fetch('http://lxblxb.top:15002/data') // 发起 GET 请求到 Flask 服务器的 '/get_data' 路径
.then(response => response.json()) // 解析 JSON 响应
.then(data => {
// 处理 JSON 数据
console.log(data);
document.getElementById('output').innerHTML = '<pre>' + JSON.stringify(data, null, 2) + '</pre>';
})
.catch(error => {
console.error('Error fetching data:', error);
});
}
</script>
</body>
</html>

BIN
pics/demo.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 754 KiB

After

Width:  |  Height:  |  Size: 754 KiB

85
server.py

@ -0,0 +1,85 @@
from flask import Flask, jsonify, request
from flask_cors import CORS
from version import version
from active_connector import Connector
import json
import argparse
import threading
#region 全局
app = Flask(__name__)
CORS(app)
server_cfg = None
data_dict = dict()
# 线程锁
data_lock = threading.Lock()
parser = argparse.ArgumentParser()
parser.add_argument('--cfg', default='server_config.json', type=str, help='the path of config json.')
args = parser.parse_args()
# 加载配置文件
cfg_path = args.cfg
with open(cfg_path, 'r') as f:
server_cfg = json.load(f)
api_name = server_cfg['api_name']
#endregion
#region 接口
# 测试用
@app.route(f'/{api_name}')
def hello():
return 'hi. —— CheckGPUsWeb'
@app.route(f'/{api_name}/get_data', methods=['GET'])
def get_data():
with data_lock:
return jsonify(data_dict)
@app.route(f'/{api_name}/update_data', methods=['POST'])
def receive_data():
data = request.json
# 如果存在对应标题则更新记录
if data['title'] in server_cfg['server_list']:
with data_lock:
data_dict['server_dict'][data['title']] = data
# 合并显示信息
if data['title'] in server_cfg['note_dict']:
client_note = data_dict['server_dict'][data['title']]['note']
server_note = server_cfg['note_dict'][data['title']]
note = server_note if client_note == '' \
else server_note + '\n' + client_note
data_dict['server_dict'][data['title']]['note'] = note
return jsonify({"status": "success"})
#endregion
def init():
with data_lock:
data_dict['server_dict'] = dict()
data_dict['version'] = version
for server_name in server_cfg['server_list']:
if server_name in server_cfg['note_dict']:
data_dict['server_dict'][server_name] = dict()
data_dict['server_dict'][server_name]['note'] = server_cfg['note_dict'][server_name]
else:
data_dict['server_dict'][server_name] = None
def main():
init()
# 主动连接
if 'connect_server' in server_cfg and len(server_cfg['connect_server']) > 0:
connector = Connector(data_dict['server_dict'], server_cfg['connect_server'], server_cfg['note_dict'], data_lock, server_cfg['connect_check_interval'], server_cfg['reconnect_interval'])
connector.run()
print('开启主动服务器主动连接 : ' + ''.join([s['title'] for s in server_cfg['connect_server']]))
else:
print('未设置主动连接的服务器')
# flask
app.run(debug=False, host=server_cfg['host'], port=server_cfg['port'])
if __name__ == '__main__':
main()

27
server_config.json

@ -0,0 +1,27 @@
{
"host": "0.0.0.0",
"port": 15002,
"server_list":["76", "174", "233", "222"],
"note_dict":{
"76" : "test1",
"SERVER_76" : "test2"
},
"api_name": "api",
"reconnect_interval" : 10,
"connect_check_interval" : 3,
"connect_server" : [
{
"title": "SERVER_76",
"ip": "lxblxb.top",
"port": 66666,
"username": "lxb",
"key_filename": "/home/lxb/.ssh/id_rsa",
"network_interface_name": "eno2",
"storage_list": [
"/media/D",
"/media/F"
]
}
]
}

1
version.py

@ -0,0 +1 @@
version = "0.2.2.20250626_beta"

103
web/css/style_1.css

@ -0,0 +1,103 @@
#header-container {
font-size: 32px;
font-weight: bold;
padding-left: 20px;
padding-top: 10px;
/* padding-bottom: 5px; */
}
/* 设置html和body的高度为100% */
html, body {
height: 100%;
margin: 0;
padding: 0;
}
/* 使用flexbox布局 */
body {
display: flex;
flex-direction: column;
}
/* 主要内容区域,flex-grow: 1使其占据剩余空间 */
.content {
flex-grow: 1;
/* padding: 20px; */
}
/* Footer样式 */
footer {
background-color: #f1f1f1;
color: rgb(172, 172, 172);
text-align: center;
padding: 10px 0;
}
.card {
padding: 5px 10px;
margin: 5px;
border-radius: 8px;
box-shadow: 0px 1px 10px rgba(0, 0, 0, 0.3);
width: 300px;
display: inline-block;
vertical-align: top;
margin: 12px;
}
.note-info {
border-style: solid;
border-width: 4px;
border-color: #a10000;
border-radius: 8px;
padding: 6px 10px;
margin-top: 4px;
margin-bottom: 6px;
}
.server-name {
background-color: rgb(0, 0, 0);
color: white;
border-radius: 8px;
padding: 6px 10px;
font-size: 26px;
margin-top: 4px;
margin-bottom: 6px;
}
.gpu-info {
background-color: aqua;
border-style: solid;
border-width: 1px;
border-color: #ccc;
border-radius: 8px;
margin-top: 5px;
padding: 4px 8px;
margin-bottom: 12px;
background-color: #f9f9f9;
box-shadow: 0 2px 5px rgba(0, 0, 0, 0.1);
}
.process-item {
color: rgb(26, 92, 247);
font-weight: bold;
}
/*占用状态*/
.state-free {
color: green;
}
.state-occupy {
color: red;
}
.state-light-occupy {
color: orange;
}
.state-super-occupy {
color: rgb(255, 0, 0);
font-weight: bold;
font-size: 20px;
background-color: rgb(255, 255, 0);
border-radius: 8px;
padding: 1px 6px;
}

27
web/index.html

@ -0,0 +1,27 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>服务器信息</title>
<link rel="stylesheet" href="./css/style_1.css">
</head>
<body>
<div class="content">
<div id="header-container">
服务器信息
</div>
<div id="server-data">
</div>
</div>
<footer>
<p>项目源码在<a href="http://git.lxblxb.top/lxb/Tool_CheckGPUsWeb/src/branch/v2">这里</a>,有问题可联系<span title="xiongbin_lin@163.com">lxb</span></p>
</footer>
<script src="./js/script.js"></script>
</body>
</html>

291
web/js/script.js

@ -0,0 +1,291 @@
// 判断内网还是公网
let apiURL = '';
fetch('/index.html') // 随便请求一个资源
.then(response => {
// 获取X-Environment响应头
const environment = response.headers.get('X-Environment');
// 根据环境变量设置API URL
if (environment === 'internal') {
apiURL = 'http://10.1.16.174:15001';
} else {
apiURL = 'http://gpus.lxblxb.top';
}
console.log('访问地址: ' + apiURL);
})
// 请求服务器获取数据
function fetchData() {
fetch(apiURL + '/api/get_data')
// 获取服务器和显卡数据
.then(response => response.json()) // 解析 JSON 响应
.then(data => {
// 处理 JSON 数据
// console.log(data);
displayServerData(data); // 调用显示数据的函数
})
.catch(error => {
// console.error('Error fetching data:', error);
displayError(error + " (多半是没有正确连接服务器端,可能是没开、网络错误)");
});
}
// 显示错误
function displayError(err_info){
let serverDataContainer = document.getElementById('server-data');
serverDataContainer.innerHTML = ''; // 清空容器
let errDiv = document.createElement('div');
errDiv.classList.add('error-info');
errDiv.innerText = err_info;
serverDataContainer.appendChild(errDiv);
}
// 将KB转为合适的格式
function parse_data_unit(num, fixedLen=2){
if (num < 1024){
return num.toFixed(fixedLen) + " KB";
}
num /= 1024;
if (num < 1024){
return num.toFixed(fixedLen) + " MB";
}
num /= 1024;
if (num < 1024){
return num.toFixed(fixedLen) + " GB";
}
num /= 1024;
if (num < 1024){
return num.toFixed(fixedLen) + " TB";
}
}
function add_bar(serverCard){
let bar = document.createElement('hr');
serverCard.appendChild(bar);
}
// 添加服务器信息的元素
function displayServerData(data){
let serverDataContainer = document.getElementById('server-data');
serverDataContainer.innerHTML = ''; // 清空容器
let serverDataDict = data['server_dict'];
// 创建每一个服务器的信息
for (let serverTitle in serverDataDict){
let serverCard = document.createElement('div');
serverCard.className = 'card';
// 标题
let serverName = document.createElement('div');
serverName.className = 'server-name';
serverName.textContent = serverTitle;
serverCard.appendChild(serverName);
serverData = serverDataDict[serverTitle];
// 如果没有数据则跳过
if (serverData == null){
let errText = document.createElement('div');
errText.className = 'error-text';
errText.textContent = "No data.";
serverCard.appendChild(errText);
serverDataContainer.appendChild(serverCard);
continue;
}
// 添加公告
if ('note' in serverData && serverData['note'] != ''){
let noteInfo = document.createElement('div');
noteInfo.className = 'note-info';
noteInfo.innerHTML = '<div style="text-align: center;"><strong>公告</strong></div>' + serverData['note'];
serverCard.appendChild(noteInfo);
}
// 判断时间
let lastTime = new Date(serverData['update_time_stamp'] * 1000);
let timeFromUpdate = Date.now() - lastTime;
if (timeFromUpdate > serverData['interval'] * 1000 * 4){
let errText = document.createElement('div');
errText.className = 'error-text';
errText.textContent = "长时间未更新,上次更新时间: " + lastTime.toLocaleString();
serverCard.appendChild(errText);
serverDataContainer.appendChild(serverCard);
continue;
}else if (timeFromUpdate > serverData['interval'] * 1000 * 2.5){
serverName.textContent = serverTitle + " - Not update -";
}
// 网速
if ('network_list' in serverData){
let networkInfo = document.createElement('div');
networkInfo.className = 'network-info';
// todo 暂时采用所有网卡总和的方法
let inSum = 0;
let outSum = 0;
let tmpTitle = "";
serverData.network_list.forEach(function(network){
inSum += network['in'];
outSum += network['out'];
tmpTitle += network['name'] + " in: " + parse_data_unit(network['in']) + "/s out: " + parse_data_unit(network['out']) + "/s\n";
});
let inStr = parse_data_unit(inSum);
let outStr = parse_data_unit(outSum);
networkInfo.innerHTML += "<strong> 网络 : </strong> <span title=\"" + tmpTitle + "\">in:" + inStr + "/s out:" + outStr + "/s</span><br>";
serverCard.appendChild(networkInfo);
// 分割线
add_bar(serverCard);
}
// CPU
if ('cpu' in serverData){
let cpuInfo = document.createElement('div');
cpuInfo.className = 'cpu-info';
temperature_list_str = "";
serverData.cpu['temperature_list'].forEach(function(v){
temperature_list_str += v + " ℃ ";
});
cpuInfo.innerHTML = "<strong>" + serverData.cpu['name'] + "</strong><br>" +
"<strong>温度 : </strong>" + temperature_list_str + "<br>" +
"<strong>占用率 : </strong><span title=\"" + serverData.cpu['core_occupy_list'] + "\">" + serverData.cpu['core_avg_occupy'] + "%";
serverCard.appendChild(cpuInfo);
// 分割线
add_bar(serverCard);
}
// 内存
if ('memory' in serverData){
let memoryInfo = document.createElement('div');
memoryInfo.className = 'memory-info';
let totalNum = serverData.memory.total
let usedNum = serverData.memory.used
let totalMem = parse_data_unit(totalNum);
let usedMem = parse_data_unit(usedNum);
let tmpClass = "state-free";
if (usedNum / totalNum > 0.95)
tmpClass = "state-super-occupy";
else if (usedNum / totalNum > 0.8)
tmpClass = "state-occupy";
else if (usedNum / totalNum > 0.6)
tmpClass = "state-light-occupy";
memoryInfo.innerHTML += "<strong> 内存 : </strong> <span class=\"" + tmpClass + "\">" + usedMem + " / " + totalMem + "</span><br>";
serverCard.appendChild(memoryInfo);
// 分割线
add_bar(serverCard);
}
// 存储空间
if ('storage_list' in serverData){
let storageInfo = document.createElement('div');
storageInfo.className = 'storage-info';
for (let i = 0; i < serverData.storage_list.length; i++) {
let targetPath = serverData.storage_list[i].path;
let totalNum = serverData.storage_list[i].total
let availableNum = serverData.storage_list[i].available
let totalStorage = parse_data_unit(totalNum);
let availableStorage = parse_data_unit(totalNum - availableNum);
let tmpClass = "state-free";
if (availableNum / totalNum < 0.01)
tmpClass = "state-super-occupy";
else if (availableNum / totalNum < 0.1)
tmpClass = "state-occupy";
else if (availableNum / totalNum < 0.3)
tmpClass = "state-light-occupy";
storageInfo.innerHTML += '<strong>' + targetPath + " :</strong> <span title=\"剩余可用:" + parse_data_unit(availableNum) + "\" class=\"" + tmpClass
+ "\">" + availableStorage + " / " + totalStorage + "</span><br>";
}
serverCard.appendChild(storageInfo);
// 分割线
add_bar(serverCard);
}
// gpu
if ('gpu_list' in serverData){
serverData.gpu_list.forEach(function(gpu){
let gpuInfo = document.createElement('div');
gpuInfo.className = 'gpu-info';
let markFree = '<span class="state-free"> 空闲</span>';
let markLightOccupy = '<span class="state-light-occupy"> 占用</span>';
let markOccupy = '<span class="state-occupy"> 占用</span>';
let tmpMark = markFree;
let memory_used_ratio = gpu.used_memory / gpu.total_memory;
if (memory_used_ratio > 0.25 && gpu.utilization > 50){
tmpMark = markOccupy;
}
else if (memory_used_ratio > 0.25 || gpu.utilization > 50){
tmpMark = markLightOccupy;
}else{
tmpMark = markFree;
}
gpuInfo.innerHTML = '<strong>' + gpu.idx + ' - ' + gpu.name + tmpMark + '</strong><br>'
+ '温度: ' + gpu.temperature + '°C<br>'
+ '显存: ' + gpu.used_memory + ' / ' + gpu.total_memory + " MB" + '<br>'
+ '利用率: ' + gpu.utilization + '%';
// 添加进程信息
let processInfo = document.createElement('div');
processInfo.classList.add('process-info');
processInfo.innerHTML = "使用情况: ";
gpu.process_list.sort((a, b) => b.memory - a.memory);
gpu.process_list.forEach(function(item, index){
if (item.memory > 40)
processInfo.innerHTML += `<span class="process-item" title="${item.cmd}">${item.user} (${item.memory}) </span>`;
});
gpuInfo.appendChild(processInfo); // 将用户信息添加到GPU信息中
serverCard.appendChild(gpuInfo);
});
// 分割线
add_bar(serverCard);
}
// 错误信息
if ('error_dict' in serverData){
let errorInfo = document.createElement('div');
errorInfo.className = 'storage-info';
if (Object.keys(serverData.error_dict).length > 0){
for (let k in serverData.error_dict){
errorInfo.innerHTML += '<strong>' + k + " :</strong>" + serverData.error_dict[k] + "<br>";
}
serverCard.appendChild(errorInfo);
// 分割线
add_bar(serverCard);
}
}
// 删除最后的分割线
if (serverCard.lastElementChild && serverCard.lastElementChild.tagName === 'HR') {
serverCard.removeChild(serverCard.lastElementChild);
}
// 单个服务器信息作为child加入
serverDataContainer.appendChild(serverCard);
}
}
// TODO test
// fetchData()
// 页面加载时获取数据并定时刷新
document.addEventListener('DOMContentLoaded', function() {
fetchData();
setInterval(fetchData, 4000); // 每4秒刷新一次数据
});
Loading…
Cancel
Save