Compare commits
52 Commits
9807754d5f
...
20355b24bf
Author | SHA1 | Date |
---|---|---|
|
20355b24bf | 3 weeks ago |
|
15690321ee | 3 weeks ago |
|
f3314388ea | 3 weeks ago |
|
ca5d2fa7b2 | 2 months ago |
|
c4cddfb165 | 4 months ago |
|
65ee80f50c | 4 months ago |
|
fac97ee26d | 4 months ago |
|
a326bfc793 | 4 months ago |
|
80b4cd711b | 5 months ago |
|
fe0c86f85c | 7 months ago |
|
e36ad1739a | 7 months ago |
|
ab4c849d21 | 7 months ago |
|
6fe811458e | 7 months ago |
|
bc3d4d5d70 | 8 months ago |
|
3578785df6 | 8 months ago |
|
768edbf8fe | 8 months ago |
|
052f1d5612 | 8 months ago |
|
aa07e0d592 | 8 months ago |
|
0e5e8e9b56 | 8 months ago |
|
4cdd195a1a | 8 months ago |
|
d4c40042d0 | 8 months ago |
|
f3d8b1298a | 8 months ago |
|
fd3d5b1959 | 8 months ago |
|
75164c0b4c | 8 months ago |
|
61b56e8339 | 8 months ago |
|
cf61103462 | 8 months ago |
|
73d215bcf4 | 8 months ago |
|
11c7158d53 | 8 months ago |
|
cec4b4028a | 8 months ago |
|
be437429f3 | 9 months ago |
|
05a85a09d8 | 9 months ago |
|
718b26cfca | 9 months ago |
|
c6570b007b | 9 months ago |
|
d1509a5609 | 9 months ago |
|
33c96824d4 | 9 months ago |
|
4d483a2c8c | 10 months ago |
|
ca9682a4af | 10 months ago |
|
07d0422095 | 10 months ago |
|
9c3127595e | 10 months ago |
|
7280ca69d3 | 10 months ago |
|
c116a81c2c | 10 months ago |
|
a5631fe369 | 10 months ago |
|
4b940a89af | 10 months ago |
|
7014aa37d2 | 10 months ago |
|
fad4dce56a | 10 months ago |
|
ddb945fd5d | 10 months ago |
|
a0f28e1a84 | 10 months ago |
|
f0904e3893 | 11 months ago |
|
3e4ee65cc0 | 11 months ago |
|
f1d627f718 | 11 months ago |
|
62aad057cb | 11 months ago |
|
8b26843851 | 11 months ago |
16 changed files with 1373 additions and 66 deletions
@ -0,0 +1,10 @@ |
|||||
|
serverList.json |
||||
|
|
||||
|
**/__pycache__/ |
||||
|
__pycache__/ |
||||
|
.vscode/ |
||||
|
|
||||
|
# build |
||||
|
build/ |
||||
|
dist/ |
||||
|
*.spec |
@ -0,0 +1,153 @@ |
|||||
|
# 1. 简介 |
||||
|
在网页上同时查看多个服务器的信息(CPU、网络、内存、硬盘、显卡) |
||||
|
|
||||
|
分为服务端和客户端,客户端向服务端发送本机数据,服务端整理所有客户端的数据,**服务端不再需要保存客户端的密钥**。 |
||||
|
|
||||
|
 |
||||
|
|
||||
|
**Tips:** 将鼠标悬浮在`网络`、`CPU占用率`、`GPU使用情况的用户`上时可以查看更详细的信息。 |
||||
|
# 2. 开发环境 |
||||
|
|
||||
|
可在conda中安装虚拟环境,linux和windows都可以。 |
||||
|
```bash |
||||
|
pip install flask flask-cors psutil -i https://pypi.tuna.tsinghua.edu.cn/simple |
||||
|
``` |
||||
|
|
||||
|
# 3. 运行部署 |
||||
|
客户端的机器上 **可能** 还需要通过APT安装一下`gpustat`,最好是`1.1.1`版的,`0.6.0`貌似会出问题。 |
||||
|
|
||||
|
可以使用`pyinstaller`将python程序打包得到客户端和服务端的可执行程序,则不再需要安装运行环境。如果不打包直接用python执行的话就要安装前面的开发环境。 |
||||
|
```bash |
||||
|
pyinstaller --onefile client.py |
||||
|
pyinstaller --onefile server.py |
||||
|
``` |
||||
|
执行命令之后,可以`dist`目录内找到两个可执行文件,将`client`文件放到客户端的合适的地方,`server`放到服务器的合适的地方。客户端指的就是需要获取数据的机器,服务端就是网页所在的机器。 |
||||
|
以及放置对应的`client_config.json`和`server_config.json`。 |
||||
|
|
||||
|
## 3.1. 服务器 |
||||
|
|
||||
|
执行以下命令即可,注意server和json要改为实际的路径。可以用screen或者systemctl来保持后台执行,推荐使用systemctl实现开机自启。 |
||||
|
```bash |
||||
|
/home/lxb/projects/Tool_CheckGPUsWeb/dist/server --cfg /home/lxb/projects/Tool_CheckGPUsWeb/server_config.json |
||||
|
``` |
||||
|
|
||||
|
其中`server_config.json`的内容如下: |
||||
|
|
||||
|
```json |
||||
|
{ |
||||
|
"host": "0.0.0.0", |
||||
|
"port": 15002, |
||||
|
"server_list":["76", "174", "233", "222"], |
||||
|
"note_dict":{ |
||||
|
"76" : "这是一个公告内容" |
||||
|
}, |
||||
|
"api_name": "api" |
||||
|
} |
||||
|
``` |
||||
|
- host:不用改。 |
||||
|
- port:改成合适的端口,服务器记得要开放这个端口。 |
||||
|
- server_list:所查询的服务器名称列表,客户端访问的时候只有下列对应的名称才会被处理。 |
||||
|
- note_dict:公告字典,可以给对应服务器显示公告。 |
||||
|
- api_name:api的名称,保持服务器、客户端和nginx的设置统一即可。 |
||||
|
|
||||
|
修改配置文件之后需要**重启**程序才能生效。 |
||||
|
|
||||
|
## 3.2. 客户端 |
||||
|
|
||||
|
执行以下命令即可,注意client和json要改为实际的路径。可以用screen或者systemctl来保持后台执行,推荐使用systemctl实现开机自启。 |
||||
|
```bash |
||||
|
/home/lxb/projects/ServerInfo-client/client --cfg /home/lxb/projects/ServerInfo-client/client_config.json |
||||
|
``` |
||||
|
|
||||
|
其中`client_config.json`中的内容如下, |
||||
|
```json |
||||
|
{ |
||||
|
"server_url": "http://10.1.16.174:15001", |
||||
|
"title": "174", |
||||
|
"interval": 3.0, |
||||
|
"note": "", |
||||
|
"enable": ["gpu", "cpu", "memory", "storage", "network"], |
||||
|
"storage_list":[ |
||||
|
"/", |
||||
|
"/media/D", |
||||
|
"/media/E", |
||||
|
"/media/F" |
||||
|
], |
||||
|
"api_name": "api" |
||||
|
} |
||||
|
``` |
||||
|
- server_url:访问服务器的路径,即IP+端口,根据情况修改。(按理来说是直接15002,但是用了nginx之后会根据设置将api的请求转发到15002上,nginx监听的15001所以这里就是15001) |
||||
|
- title:客户端本机的名称,只有这个名称在服务器的server_list中才会被处理。(那是不是任一个客户端程序设置到其他的title就会干扰其他服务器数据的显示?**是的**,除了这个title没有做其他验证,只能是人工确保一下每个客户端有单独的title) |
||||
|
- interval:获取信息的间隔 |
||||
|
- note:公告,会与在服务器那边设置的公告合并显示。 |
||||
|
- enable:开启检测的内容,如果不需要检测某个数据删除掉即可,目前只支持`gpu`、`cpu`、`memory`、`storage`、`network`。 |
||||
|
- storage_list:检测的硬盘的路径,需要检测哪条路径就加到这个上面。 |
||||
|
- api_name:与服务器、nginx的设置保持一致即可,不然无法正常访问。 |
||||
|
|
||||
|
修改配置文件之后需要**重启**程序才能生效。 |
||||
|
|
||||
|
## 3.3. 网页部署 |
||||
|
> **注意**,网页部署这一块的文档可能还存在着不少问题,主要是当时是用GPT等工具辅助的搞的,现在再写这个文档已经过了好久细节记不清了。还有nginx这里写的也是用容器,但是好像不用容器的话会更好一些。 |
||||
|
> |
||||
|
> 这部分文档还需要优化。 |
||||
|
|
||||
|
可以使用docker运行一个nginx的容器来简单的部署这个网页。 |
||||
|
首先安装docker,安装完之后可执行命令`docker run -d -p 80:80 -v /home/lxb/nginx_gpus:/usr/share/nginx/html --name nginx_gpus nginx:latest`,注意**按需修改命令**,具体可修改内容如下。 |
||||
|
```bash |
||||
|
docker run -d \ |
||||
|
-p <宿主机上映射的端口>:80 \ |
||||
|
-v <宿主机上数据卷的位置>:/usr/share/nginx/html \ |
||||
|
--name <容器名称> \ |
||||
|
nginx:latest |
||||
|
``` |
||||
|
|
||||
|
将web目录下的内容放到数据卷的对应位置。 |
||||
|
|
||||
|
另外nginx的配置如下: |
||||
|
``` |
||||
|
server { |
||||
|
listen 15001; |
||||
|
listen [::]:15001; |
||||
|
# server_name *; |
||||
|
|
||||
|
location / { |
||||
|
root /usr/share/nginx/html/serverInfo_web; |
||||
|
index index.html index.htm; |
||||
|
try_files $uri $uri/ =404; |
||||
|
} |
||||
|
|
||||
|
location /api/ { |
||||
|
proxy_pass http://localhost:15002; |
||||
|
proxy_set_header Host $host; |
||||
|
proxy_set_header X-Real-IP $remote_addr; |
||||
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; |
||||
|
proxy_set_header X-Forwarded-Proto $scheme; |
||||
|
} |
||||
|
|
||||
|
set $env "internal"; |
||||
|
add_header X-Environment $env; |
||||
|
|
||||
|
} |
||||
|
``` |
||||
|
也就是直接访问服务器的15001端口可以访问网页,然后通过访问对应的`api`接口会转发到15002端口上,也就是flask的端口。 |
||||
|
|
||||
|
最后,有域名的话也可以搞一个反向代理,可参考 [服务器上使用Nginx部署网页+反向代理](http://blog.lxblxb.top/archives/1723257245091)(现在看来这个博客可能写的是有些问题的)。 |
||||
|
|
||||
|
**特别的**,`web/js/script.js`中有如下的代码,是通过`internal`来修改访问的URL,以实现访问内网的时候就采用内网的地址访问服务器,避免在内网的时候访问公网。这部分也要按需的修改一下。如果只是在内网访问则直接让`apiURL`等于内网地址即可。 |
||||
|
```js |
||||
|
// 根据环境变量设置API URL |
||||
|
if (environment === 'internal') { |
||||
|
apiURL = 'http://10.1.16.174:15001'; |
||||
|
} else { |
||||
|
apiURL = 'http://gpus.lxblxb.top'; |
||||
|
} |
||||
|
``` |
||||
|
以及反向代理的nginx那边也需要一些配置,大概是设置下面的内容。对应内网nginx设置的`internal`。 |
||||
|
``` |
||||
|
set $env "public"; |
||||
|
add_header X-Environment $env; |
||||
|
``` |
||||
|
|
||||
|
# 4. 其他 |
||||
|
- `永辉`帮忙搞了一下顶部checkbox布局的问题(在这一版中没有加上checkbox)。 |
||||
|
- 参考`治鹏`的方法加了每张显卡的用户使用的情况。 |
@ -0,0 +1,321 @@ |
|||||
|
from version import version |
||||
|
import threading |
||||
|
import paramiko |
||||
|
import time |
||||
|
import json |
||||
|
import re |
||||
|
|
||||
|
class Connector: |
||||
|
def __init__(self, data_dict : dict, server_cfg : dict, note_dict : dict, lock : threading.Lock, connect_check_interval : float, reconnect_interval : float, multiple_of_timeout : float = 2): |
||||
|
self.data_dict = data_dict |
||||
|
self.tmp_data_dict = dict() |
||||
|
self.server_cfg = server_cfg |
||||
|
self.note_dict = note_dict |
||||
|
self.lock = lock |
||||
|
self.tmp_lock = threading.Lock() |
||||
|
self.connect_check_interval = connect_check_interval |
||||
|
self.reconnect_interval = reconnect_interval |
||||
|
self.multiple_of_timeout = multiple_of_timeout |
||||
|
|
||||
|
def run(self): |
||||
|
# 开启查询线程 |
||||
|
for i, server_data in enumerate(self.server_cfg): |
||||
|
self.tmp_data_dict[server_data['title']] = {} |
||||
|
# self.tmp_data_dict[server_data['title']]['server_data'] = server_data |
||||
|
thread_check = threading.Thread(target=self.__keep_check_one, args=(server_data, self.tmp_data_dict, server_data['title'], self.connect_check_interval, self.reconnect_interval)) |
||||
|
thread_check.daemon = True |
||||
|
thread_check.start() |
||||
|
|
||||
|
# 开启同步数据线程 |
||||
|
thread_transmit = threading.Thread(target=self.__transmit_data, args=(self.connect_check_interval,)) |
||||
|
thread_transmit.daemon = True |
||||
|
thread_transmit.start() |
||||
|
|
||||
|
# 持续获取一个服务器的信息 |
||||
|
def __keep_check_one(self, server: dict, shared_data_list: dict, server_title: str, interval: float, re_connect_time: float=5): |
||||
|
# 处理一下需要检查的存储空间路径 |
||||
|
if not 'storage_list' in server: |
||||
|
server['storage_list'] = [] |
||||
|
if not '/' in server['storage_list']: |
||||
|
server['storage_list'].insert(0, '/') |
||||
|
|
||||
|
re_try_count = 0 |
||||
|
# 循环连接 |
||||
|
while True: |
||||
|
try: |
||||
|
# 建立SSH连接 |
||||
|
client = paramiko.SSHClient() |
||||
|
client.set_missing_host_key_policy(paramiko.AutoAddPolicy()) |
||||
|
client.connect(server['ip'], port=server['port'], username=server['username'], password=server.get('password', None), key_filename=server.get('key_filename', None), timeout=interval*self.multiple_of_timeout) |
||||
|
|
||||
|
with self.tmp_lock: |
||||
|
if 'error_dict' in shared_data_list[server_title]: |
||||
|
shared_data_list[server_title].pop('error_dict') |
||||
|
re_try_count = 0 |
||||
|
|
||||
|
# 循环检测 |
||||
|
keep_run = True |
||||
|
while keep_run: |
||||
|
try: |
||||
|
error_info_dict = dict() |
||||
|
# 网络 信息 |
||||
|
network_info = self.__get_network_info(client, interval*self.multiple_of_timeout, server.get('network_interface_name', None), error_info_dict) |
||||
|
# CPU 信息 |
||||
|
cpu_info = self.__get_cpu_info(client, interval*self.multiple_of_timeout, error_info_dict) |
||||
|
# 内存 信息 |
||||
|
memory_info = self.__get_memory_info(client, interval*self.multiple_of_timeout, error_info_dict) |
||||
|
# 存储空间 信息 |
||||
|
storage_info = self.__get_storage_info(client, interval*self.multiple_of_timeout, server['storage_list'], error_info_dict) |
||||
|
# GPU 信息 |
||||
|
gpu_info = self.__get_gpus_info(client, interval*self.multiple_of_timeout, error_info_dict, ignore_gpu=server.get('ignore_gpu', False)) |
||||
|
|
||||
|
# 记录信息 |
||||
|
with self.tmp_lock: |
||||
|
# 添加公告 |
||||
|
if server_title in self.note_dict: |
||||
|
shared_data_list[server_title]['note'] = self.note_dict[server_title] |
||||
|
shared_data_list[server_title]['interval'] = interval |
||||
|
shared_data_list[server_title]['title'] = server_title |
||||
|
shared_data_list[server_title]['version'] = version |
||||
|
shared_data_list[server_title]['update_time_stamp'] = int(time.time()) |
||||
|
if cpu_info is not None: |
||||
|
shared_data_list[server_title]['cpu'] = cpu_info |
||||
|
if memory_info is not None: |
||||
|
shared_data_list[server_title]['gpu_list'] = gpu_info |
||||
|
if memory_info is not None: |
||||
|
shared_data_list[server_title]['memory'] = memory_info |
||||
|
if storage_info is not None: |
||||
|
shared_data_list[server_title]['storage_list'] = storage_info |
||||
|
if network_info is not None: |
||||
|
shared_data_list[server_title]['network_list'] = [{ |
||||
|
'default' : False, |
||||
|
'name' : server['network_interface_name'], |
||||
|
'in' : network_info['in'], |
||||
|
'out' : network_info['out'], |
||||
|
}] |
||||
|
if len(error_info_dict) > 0: |
||||
|
shared_data_list[server_title]['error_dict'] = error_info_dict |
||||
|
except Exception as e: |
||||
|
keep_run = False |
||||
|
with self.tmp_lock: |
||||
|
shared_data_list[server_title]['error_dict']['connect'] = f'{e}' |
||||
|
if 'gpu_list' in shared_data_list[server_title]: |
||||
|
shared_data_list[server_title].pop('gpu_list') |
||||
|
|
||||
|
time.sleep(interval) |
||||
|
|
||||
|
# 关闭连接 |
||||
|
client.close() |
||||
|
except Exception as e: |
||||
|
if 'error_dict' not in shared_data_list[server_title]: |
||||
|
shared_data_list[server_title]['error_dict'] = dict() |
||||
|
shared_data_list[server_title]['error_dict']['connect'] = f'retry:{re_try_count}, {e}' |
||||
|
time.sleep(re_connect_time) |
||||
|
re_try_count += 1 |
||||
|
|
||||
|
# 持续的将主动获取的服务器信息同步到最终信息中 |
||||
|
def __transmit_data(self, interval): |
||||
|
# 等待一点时间再开始 |
||||
|
time.sleep(interval * 0.5) |
||||
|
while True: |
||||
|
time.sleep(interval) |
||||
|
# 将临时字典中的内容移动到正式数据中 |
||||
|
with self.tmp_lock: |
||||
|
with self.lock: |
||||
|
for k, v in self.tmp_data_dict.items(): |
||||
|
self.data_dict[k] = v |
||||
|
|
||||
|
#region 获取信息的方法 |
||||
|
|
||||
|
def __get_cpu_info(self, client, timeout, info_dict:dict=None): |
||||
|
def get_cpu_temp_via_sysfs(client): |
||||
|
try: |
||||
|
command = r''' |
||||
|
for zone in /sys/class/thermal/thermal_zone*; do |
||||
|
if [ -f "$zone/type" ] && [ -f "$zone/temp" ]; then |
||||
|
type=$(cat "$zone/type") |
||||
|
temp=$(cat "$zone/temp") |
||||
|
echo "$type:$temp" |
||||
|
fi |
||||
|
done |
||||
|
''' |
||||
|
stdin, stdout, stderr = client.exec_command(command) |
||||
|
output = stdout.read().decode().strip() |
||||
|
temperatures = [] |
||||
|
for line in output.splitlines(): |
||||
|
if ':' in line: |
||||
|
type_str, temp_str = line.split(':', 1) |
||||
|
if 'cpu' in type_str.lower() or 'pkg' in type_str.lower(): |
||||
|
try: |
||||
|
temp = int(temp_str.strip()) |
||||
|
temperatures.append(round(temp / 1000, 1)) # 转为摄氏度 |
||||
|
except ValueError: |
||||
|
continue |
||||
|
return temperatures |
||||
|
except Exception as e: |
||||
|
return [] |
||||
|
|
||||
|
def get_cpu_avg_usage_via_procstat(client, interval_sec=0.3): |
||||
|
def read_cpu_stat_line(): |
||||
|
stdin, stdout, _ = client.exec_command("grep '^cpu ' /proc/stat") |
||||
|
line = stdout.read().decode().strip() |
||||
|
parts = list(map(int, line.split()[1:])) |
||||
|
idle = parts[3] + parts[4] # idle + iowait |
||||
|
total = sum(parts) |
||||
|
return idle, total |
||||
|
try: |
||||
|
idle1, total1 = read_cpu_stat_line() |
||||
|
time.sleep(interval_sec) |
||||
|
idle2, total2 = read_cpu_stat_line() |
||||
|
|
||||
|
total_delta = total2 - total1 |
||||
|
idle_delta = idle2 - idle1 |
||||
|
|
||||
|
if total_delta == 0: |
||||
|
return 0.0 |
||||
|
usage = 100.0 * (1.0 - idle_delta / total_delta) |
||||
|
return round(usage, 1) |
||||
|
except Exception as e: |
||||
|
return None |
||||
|
|
||||
|
try: |
||||
|
result = dict() |
||||
|
# 1. 获取 CPU 型号 |
||||
|
stdin, stdout, stderr = client.exec_command("lscpu") |
||||
|
lscpu_output = stdout.read().decode() |
||||
|
model_match = re.search(r"Model name\s*:\s*(.+)", lscpu_output) |
||||
|
if model_match: |
||||
|
result["name"] = model_match.group(1).strip() |
||||
|
else: |
||||
|
# 如果没有找到“Model name”,则尝试匹配“型号名称” |
||||
|
model_name_match_cn = re.search(r'型号名称\s*:\s*(.+)', lscpu_output) |
||||
|
if model_name_match_cn: |
||||
|
result["name"] = model_name_match_cn.group(1).strip() |
||||
|
else: |
||||
|
result["name"] = "未知CPU" |
||||
|
|
||||
|
# 2. 获取 CPU 温度 |
||||
|
result["temperature_list"] = get_cpu_temp_via_sysfs(client) |
||||
|
|
||||
|
# 3. 获取 CPU 占用率 |
||||
|
result["core_avg_occupy"] = get_cpu_avg_usage_via_procstat(client) |
||||
|
result["core_occupy_list"] = [-1] # TODO 用现在这种方法去查询所有核心的消耗比较大,也没什么用处,暂时不查了 |
||||
|
|
||||
|
return result |
||||
|
except paramiko.ssh_exception.SSHException as e: |
||||
|
# ssh 的异常仍然抛出 |
||||
|
raise |
||||
|
except Exception as e: |
||||
|
if info_dict is not None: |
||||
|
info_dict['cpu'] = f'{e}' |
||||
|
return None |
||||
|
|
||||
|
def __get_memory_info(self, client, timeout, info_dict:dict=None): |
||||
|
try: |
||||
|
stdin, stdout, stderr = client.exec_command('free', timeout=timeout) |
||||
|
output = stdout.read().decode().split('\n')[1] |
||||
|
if output == "": |
||||
|
return None |
||||
|
data = output.split() |
||||
|
result = { |
||||
|
"total": int(data[1]), |
||||
|
"used": int(data[2]) |
||||
|
} |
||||
|
|
||||
|
return result |
||||
|
except paramiko.ssh_exception.SSHException as e: |
||||
|
# ssh 的异常仍然抛出 |
||||
|
raise |
||||
|
except Exception as e: |
||||
|
if info_dict is not None: |
||||
|
info_dict['memory'] = f'{e}' |
||||
|
return None |
||||
|
|
||||
|
def __get_storage_info(self, client, timeout, path_list, info_dict:dict=None): |
||||
|
try: |
||||
|
result = [] |
||||
|
for target_path in path_list: |
||||
|
stdin, stdout, stderr = client.exec_command(f'df {target_path} | grep \'{target_path}\'', timeout=timeout) |
||||
|
output = stdout.read().decode() |
||||
|
if output == "": |
||||
|
continue |
||||
|
data = output.split() |
||||
|
tmp_res = { |
||||
|
"path": target_path, |
||||
|
"total": int(data[1]), |
||||
|
"available": int(data[3]) |
||||
|
} |
||||
|
result.append(tmp_res) |
||||
|
return result |
||||
|
except paramiko.ssh_exception.SSHException as e: |
||||
|
# ssh 的异常仍然抛出 |
||||
|
raise |
||||
|
except Exception as e: |
||||
|
if info_dict is not None: |
||||
|
info_dict['storage'] = f'{e}' |
||||
|
return None |
||||
|
|
||||
|
def __get_network_info(self, client, timeout, interface_name, info_dict:dict=None): |
||||
|
try: |
||||
|
if interface_name is None: |
||||
|
return None |
||||
|
stdin, stdout, stderr = client.exec_command(f'ifstat -i {interface_name} 0.1 1', timeout=timeout) |
||||
|
output = stdout.read().decode().split('\n')[2] |
||||
|
data = output.split() |
||||
|
result = { |
||||
|
"in": float(data[0]), |
||||
|
"out": float(data[1]) |
||||
|
} |
||||
|
return result |
||||
|
except paramiko.ssh_exception.SSHException as e: |
||||
|
# ssh 的异常仍然抛出 |
||||
|
raise |
||||
|
except Exception as e: |
||||
|
if info_dict is not None: |
||||
|
info_dict['network'] = f'{e}' |
||||
|
return None |
||||
|
|
||||
|
def __get_gpus_info(self, client, timeout, info_dict:dict=None, ignore_gpu=False): |
||||
|
if ignore_gpu: |
||||
|
return None |
||||
|
|
||||
|
try: |
||||
|
stdin, stdout, stderr = client.exec_command("gpustat --json") |
||||
|
output = stdout.read().decode() |
||||
|
gpus_info = json.loads(output) |
||||
|
|
||||
|
result = [] |
||||
|
for gpu_info in gpus_info['gpus']: |
||||
|
# 处理一下 |
||||
|
gpu_name = gpu_info['name'].replace('NVIDIA ', '').replace('GeForce ', '') |
||||
|
process_list = [] |
||||
|
for process_info in gpu_info.get('processes', []): |
||||
|
cmd = process_info.get('command', '') |
||||
|
if 'full_command' in process_info: |
||||
|
cmd = ' '.join(process_info["full_command"]) |
||||
|
process_list.append({ |
||||
|
"user": process_info.get('username'), |
||||
|
"memory": process_info.get('gpu_memory_usage'), |
||||
|
"cmd": cmd |
||||
|
}) |
||||
|
# 加到list中 |
||||
|
result.append({ |
||||
|
"idx": gpu_info['index'], |
||||
|
"name": gpu_name, |
||||
|
"temperature": gpu_info['temperature.gpu'], |
||||
|
"used_memory": gpu_info['memory.used'], |
||||
|
"total_memory": gpu_info['memory.total'], |
||||
|
"utilization": gpu_info['utilization.gpu'], |
||||
|
"process_list": process_list |
||||
|
}) |
||||
|
|
||||
|
return result |
||||
|
except paramiko.ssh_exception.SSHException as e: |
||||
|
# ssh 的异常仍然抛出 |
||||
|
raise |
||||
|
except Exception as e: |
||||
|
if info_dict is not None: |
||||
|
info_dict['gpu'] = f'{e}' |
||||
|
return None |
||||
|
|
||||
|
#endregion |
@ -1,38 +0,0 @@ |
|||||
from flask import Flask, jsonify |
|
||||
from flask_cors import CORS |
|
||||
import threading |
|
||||
import paramiko |
|
||||
import time |
|
||||
|
|
||||
#region 全局 |
|
||||
|
|
||||
app = Flask(__name__) |
|
||||
CORS(app) |
|
||||
port = 15002 |
|
||||
|
|
||||
#endregion |
|
||||
|
|
||||
#region 接口 |
|
||||
|
|
||||
# 测试用 |
|
||||
@app.route('/') |
|
||||
def hello(): |
|
||||
return 'hi. —— CheckGPUsWeb' |
|
||||
|
|
||||
@app.route('/data', methods=['GET']) |
|
||||
def get_data(): |
|
||||
data = {'name': 'John', 'age': 25, 'city': 'New York'} |
|
||||
return jsonify(data) |
|
||||
|
|
||||
# 开始连接服务器 |
|
||||
def connect_server(): |
|
||||
pass |
|
||||
|
|
||||
#endregion |
|
||||
|
|
||||
# 测试 |
|
||||
def test(): |
|
||||
app.run(debug=True, port=port) |
|
||||
|
|
||||
if __name__ == '__main__': |
|
||||
test() |
|
@ -0,0 +1,221 @@ |
|||||
|
import os |
||||
|
import json |
||||
|
import time |
||||
|
import psutil |
||||
|
import argparse |
||||
|
import requests |
||||
|
import subprocess |
||||
|
from version import version |
||||
|
|
||||
|
# region get data |
||||
|
|
||||
|
# 获取显卡相关信息 |
||||
|
def get_gpus_info(error_dict): |
||||
|
result_list = list() |
||||
|
|
||||
|
try: |
||||
|
gpus_info = json.load(os.popen('gpustat --json')) |
||||
|
for gpu_info in gpus_info['gpus']: |
||||
|
# 处理一下 |
||||
|
gpu_name = gpu_info['name'] |
||||
|
gpu_name = gpu_name.replace('NVIDIA ', '').replace('GeForce ', '') |
||||
|
process_list = list() |
||||
|
for process_info in gpu_info['processes']: |
||||
|
cmd = process_info['command'] |
||||
|
if 'full_command' in process_info: |
||||
|
cmd = ' '.join(process_info["full_command"]) |
||||
|
process_list.append({ |
||||
|
"user": process_info['username'], |
||||
|
"memory": process_info['gpu_memory_usage'], |
||||
|
"cmd": cmd |
||||
|
}) |
||||
|
|
||||
|
# 加到list中 |
||||
|
result_list.append({ |
||||
|
"idx": gpu_info['index'], |
||||
|
"name": gpu_name, |
||||
|
"temperature": gpu_info['temperature.gpu'], |
||||
|
"used_memory": gpu_info['memory.used'], |
||||
|
"total_memory": gpu_info['memory.total'], |
||||
|
"utilization": gpu_info['utilization.gpu'], |
||||
|
"process_list": process_list |
||||
|
}) |
||||
|
except Exception as e: |
||||
|
error_dict['gpu'] = f'{e}' |
||||
|
|
||||
|
return result_list |
||||
|
|
||||
|
# 获取cpu相关信息 |
||||
|
cpu_name = None |
||||
|
def get_cpu_info(error_dict): |
||||
|
result_dict = dict() |
||||
|
|
||||
|
try: |
||||
|
# 获取cpu型号 |
||||
|
global cpu_name |
||||
|
def get_cpu_name(): |
||||
|
if cpu_name == None: |
||||
|
import re |
||||
|
# 执行lscpu命令并获取输出 |
||||
|
result = subprocess.run(['lscpu'], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True) |
||||
|
output = result.stdout |
||||
|
|
||||
|
# 使用正则表达式匹配“Model name”或“型号名称” |
||||
|
model_name_match = re.search(r'Model name\s*:\s*(.+)', output) |
||||
|
if model_name_match: |
||||
|
return model_name_match.group(1).strip() |
||||
|
else: |
||||
|
# 如果没有找到“Model name”,则尝试匹配“型号名称” |
||||
|
model_name_match_cn = re.search(r'型号名称\s*:\s*(.+)', output) |
||||
|
if model_name_match_cn: |
||||
|
return model_name_match_cn.group(1).strip() |
||||
|
else: |
||||
|
return "CPU型号信息未找到" |
||||
|
else: |
||||
|
return cpu_name |
||||
|
cpu_name = get_cpu_name() |
||||
|
|
||||
|
# 获取每个cpu的温度 |
||||
|
temperature_list = list() |
||||
|
temperatures = psutil.sensors_temperatures() |
||||
|
if 'coretemp' in temperatures: |
||||
|
for entry in temperatures['coretemp']: |
||||
|
if entry.label.startswith('Package'): |
||||
|
temperature_list.append(entry.current) |
||||
|
|
||||
|
# 记录信息 |
||||
|
result_dict["name"] = cpu_name |
||||
|
result_dict["temperature_list"] = temperature_list |
||||
|
result_dict["core_avg_occupy"] = psutil.cpu_percent(interval=None, percpu=False) |
||||
|
result_dict["core_occupy_list"] = psutil.cpu_percent(interval=None, percpu=True) |
||||
|
|
||||
|
except Exception as e: |
||||
|
error_dict['cpu'] = f'{e}' |
||||
|
|
||||
|
return result_dict |
||||
|
|
||||
|
# 获取存储相关信息 |
||||
|
def get_storages_info(error_dict, path_list): |
||||
|
result_list = list() |
||||
|
try: |
||||
|
for target_path in path_list: |
||||
|
data = subprocess.run(['df', target_path, '|', 'grep', target_path], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True).stdout |
||||
|
data = data.split('\n')[1].split() |
||||
|
tmp_res = { |
||||
|
"path": target_path, |
||||
|
"total": int(data[1]), |
||||
|
"available": int(data[3]) |
||||
|
} |
||||
|
result_list.append(tmp_res) |
||||
|
except Exception as e: |
||||
|
error_dict['storage'] = f'{e}' |
||||
|
|
||||
|
return result_list |
||||
|
|
||||
|
# 获取内存相关信息 |
||||
|
def get_memory_info(error_dict): |
||||
|
result_dict = dict() |
||||
|
try: |
||||
|
mem = psutil.virtual_memory() |
||||
|
result_dict["total"] = mem.total / 1024 |
||||
|
result_dict["used"] = mem.used / 1024 |
||||
|
except Exception as e: |
||||
|
error_dict['memory'] = f'{e}' |
||||
|
|
||||
|
return result_dict |
||||
|
|
||||
|
# 获取网络相关信息 |
||||
|
last_network_stats = None |
||||
|
last_network_time = None |
||||
|
def get_networks_info(error_dict): |
||||
|
result_list = list() |
||||
|
try: |
||||
|
global last_network_stats |
||||
|
global last_network_time |
||||
|
current_stats = psutil.net_io_counters(pernic=True) |
||||
|
|
||||
|
if last_network_stats is None: |
||||
|
# 第一次检测 |
||||
|
for k in current_stats.keys(): |
||||
|
if k == 'lo': |
||||
|
continue |
||||
|
result_list.append({ |
||||
|
"name": k, |
||||
|
"default": False, |
||||
|
"in": 0, |
||||
|
"out": 0 |
||||
|
}) |
||||
|
else: |
||||
|
time_interval = time.time() - last_network_time |
||||
|
for k in current_stats.keys(): |
||||
|
if k == 'lo': |
||||
|
continue |
||||
|
result_list.append({ |
||||
|
"name": k, |
||||
|
"default": False, |
||||
|
"in": (current_stats[k].bytes_recv - last_network_stats[k].bytes_recv) / time_interval / 1000, |
||||
|
"out": (current_stats[k].bytes_sent - last_network_stats[k].bytes_sent) / time_interval / 1000 |
||||
|
}) |
||||
|
|
||||
|
# 记录信息下次用 |
||||
|
last_network_stats = current_stats |
||||
|
last_network_time = time.time() |
||||
|
except Exception as e: |
||||
|
error_dict['network'] = f'{e}' |
||||
|
|
||||
|
return result_list |
||||
|
|
||||
|
# endregion |
||||
|
|
||||
|
client_cfg = None |
||||
|
|
||||
|
def collect_data(): |
||||
|
result_dict = dict() |
||||
|
error_dict = dict() |
||||
|
|
||||
|
# 根据设置采集信息 |
||||
|
if 'gpu' in client_cfg['enable']: |
||||
|
result_dict['gpu_list'] = get_gpus_info(error_dict) |
||||
|
if 'cpu' in client_cfg['enable']: |
||||
|
result_dict['cpu'] = get_cpu_info(error_dict) |
||||
|
if 'storage' in client_cfg['enable']: |
||||
|
result_dict['storage_list'] = get_storages_info(error_dict, client_cfg['storage_list']) |
||||
|
if 'memory' in client_cfg['enable']: |
||||
|
result_dict['memory'] = get_memory_info(error_dict) |
||||
|
if 'network' in client_cfg['enable']: |
||||
|
result_dict['network_list'] = get_networks_info(error_dict) |
||||
|
|
||||
|
# 记录其他信息 |
||||
|
result_dict['update_time_stamp'] = int(time.time()) |
||||
|
result_dict['error_dict'] = error_dict |
||||
|
result_dict['note'] = client_cfg['note'] |
||||
|
result_dict['title'] = client_cfg['title'] |
||||
|
result_dict['interval'] = client_cfg['interval'] |
||||
|
result_dict['version'] = version |
||||
|
|
||||
|
return result_dict |
||||
|
|
||||
|
def main(): |
||||
|
parser = argparse.ArgumentParser() |
||||
|
parser.add_argument('--cfg', default='client_config.json', type=str, help='the path of config json.') |
||||
|
args = parser.parse_args() |
||||
|
# 加载配置文件 |
||||
|
cfg_path = args.cfg |
||||
|
global client_cfg |
||||
|
with open(cfg_path, 'r') as f: |
||||
|
client_cfg = json.load(f) |
||||
|
|
||||
|
# 持续发送 |
||||
|
send_interval = client_cfg['interval'] |
||||
|
api_name = client_cfg['api_name'] |
||||
|
api_url = client_cfg['server_url'] + f'/{api_name}/update_data' |
||||
|
while True: |
||||
|
data = collect_data() |
||||
|
try: |
||||
|
result = requests.post(api_url, json=data) |
||||
|
except Exception as e: |
||||
|
print(e) |
||||
|
time.sleep(send_interval) |
||||
|
|
||||
|
if __name__ == '__main__': |
||||
|
main() |
@ -0,0 +1,14 @@ |
|||||
|
{ |
||||
|
"server_url": "http://10.1.16.174:15001", |
||||
|
"title": "174", |
||||
|
"interval": 3.0, |
||||
|
"note": "", |
||||
|
"enable": ["gpu", "cpu", "memory", "storage", "network"], |
||||
|
"storage_list":[ |
||||
|
"/", |
||||
|
"/media/D", |
||||
|
"/media/E", |
||||
|
"/media/F" |
||||
|
], |
||||
|
"api_name": "api" |
||||
|
} |
@ -0,0 +1,58 @@ |
|||||
|
{ |
||||
|
"title": "server title", |
||||
|
"update_time_stamp": "1673082950", |
||||
|
"note": "some note", |
||||
|
"interval": 3.0, |
||||
|
"error_dict":{ |
||||
|
"gpu": "some error", |
||||
|
"cpu": "some error" |
||||
|
}, |
||||
|
"gpu_list":[ |
||||
|
{ |
||||
|
"idx": 0, |
||||
|
"name": "RTX 3090", |
||||
|
"temperature": 100, |
||||
|
"used_memory": 1000, |
||||
|
"total_memory": 10240, |
||||
|
"utilization": 34, |
||||
|
"process_list":[ |
||||
|
{ |
||||
|
"user": "lxb", |
||||
|
"memory": 100, |
||||
|
"cmd": "python run.py" |
||||
|
} |
||||
|
] |
||||
|
} |
||||
|
], |
||||
|
"cpu": |
||||
|
{ |
||||
|
"name": "i5 6500", |
||||
|
"temperature_list": [50, 30], |
||||
|
"core_avg_occupy": 31.25, |
||||
|
"core_occupy_list":[ |
||||
|
12, |
||||
|
23, |
||||
|
0, |
||||
|
90 |
||||
|
] |
||||
|
}, |
||||
|
"storage_list":[ |
||||
|
{ |
||||
|
"path": "/media/F", |
||||
|
"available": 211108624, |
||||
|
"total": 5813178480 |
||||
|
} |
||||
|
], |
||||
|
"memory":{ |
||||
|
"total": 1935468, |
||||
|
"used": 1382196 |
||||
|
}, |
||||
|
"network_list":[ |
||||
|
{ |
||||
|
"name": "eth0", |
||||
|
"default": true, |
||||
|
"in": 67.8, |
||||
|
"out": 12.3 |
||||
|
} |
||||
|
] |
||||
|
} |
@ -0,0 +1,62 @@ |
|||||
|
{ |
||||
|
"server_dict":{ |
||||
|
"174":{ |
||||
|
"title": "server title", |
||||
|
"update_time_stamp": "1673082950", |
||||
|
"note": "some note", |
||||
|
"interval": 3.0, |
||||
|
"error_dict":{ |
||||
|
"gpu": "some error", |
||||
|
"cpu": "some error" |
||||
|
}, |
||||
|
"gpu_list":[ |
||||
|
{ |
||||
|
"idx": 0, |
||||
|
"name": "RTX 3090", |
||||
|
"temperature": 100, |
||||
|
"used_memory": 1000, |
||||
|
"total_memory": 10240, |
||||
|
"utilization": 34, |
||||
|
"process_list":[ |
||||
|
{ |
||||
|
"user": "lxb", |
||||
|
"memory": 100, |
||||
|
"cmd": "python run.py" |
||||
|
} |
||||
|
] |
||||
|
} |
||||
|
], |
||||
|
"cpu": |
||||
|
{ |
||||
|
"name": "i5 6500", |
||||
|
"temperature_list": [50, 30], |
||||
|
"core_avg_occupy": 31.25, |
||||
|
"core_occupy_list":[ |
||||
|
12, |
||||
|
23, |
||||
|
0, |
||||
|
90 |
||||
|
] |
||||
|
}, |
||||
|
"storage_list":[ |
||||
|
{ |
||||
|
"path": "/media/F", |
||||
|
"available": 211108624, |
||||
|
"total": 5813178480 |
||||
|
} |
||||
|
], |
||||
|
"memory":{ |
||||
|
"total": 1935468, |
||||
|
"used": 1382196 |
||||
|
}, |
||||
|
"network_list":[ |
||||
|
{ |
||||
|
"name": "eth0", |
||||
|
"default": true, |
||||
|
"in": 67.8, |
||||
|
"out": 12.3 |
||||
|
} |
||||
|
] |
||||
|
} |
||||
|
} |
||||
|
} |
@ -1,28 +0,0 @@ |
|||||
<!DOCTYPE html> |
|
||||
<html lang="en"> |
|
||||
<head> |
|
||||
<meta charset="UTF-8"> |
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0"> |
|
||||
<title>Fetch JSON from Flask</title> |
|
||||
</head> |
|
||||
<body> |
|
||||
<h1>Fetch JSON from Flask Example</h1> |
|
||||
<button onclick="fetchData()">Fetch Data</button> |
|
||||
<div id="output"></div> |
|
||||
|
|
||||
<script> |
|
||||
function fetchData() { |
|
||||
fetch('http://lxblxb.top:15002/data') // 发起 GET 请求到 Flask 服务器的 '/get_data' 路径 |
|
||||
.then(response => response.json()) // 解析 JSON 响应 |
|
||||
.then(data => { |
|
||||
// 处理 JSON 数据 |
|
||||
console.log(data); |
|
||||
document.getElementById('output').innerHTML = '<pre>' + JSON.stringify(data, null, 2) + '</pre>'; |
|
||||
}) |
|
||||
.catch(error => { |
|
||||
console.error('Error fetching data:', error); |
|
||||
}); |
|
||||
} |
|
||||
</script> |
|
||||
</body> |
|
||||
</html> |
|
Before Width: | Height: | Size: 754 KiB After Width: | Height: | Size: 754 KiB |
@ -0,0 +1,85 @@ |
|||||
|
from flask import Flask, jsonify, request |
||||
|
from flask_cors import CORS |
||||
|
from version import version |
||||
|
from active_connector import Connector |
||||
|
import json |
||||
|
import argparse |
||||
|
import threading |
||||
|
|
||||
|
#region 全局 |
||||
|
|
||||
|
app = Flask(__name__) |
||||
|
CORS(app) |
||||
|
server_cfg = None |
||||
|
data_dict = dict() |
||||
|
# 线程锁 |
||||
|
data_lock = threading.Lock() |
||||
|
|
||||
|
parser = argparse.ArgumentParser() |
||||
|
parser.add_argument('--cfg', default='server_config.json', type=str, help='the path of config json.') |
||||
|
args = parser.parse_args() |
||||
|
# 加载配置文件 |
||||
|
cfg_path = args.cfg |
||||
|
with open(cfg_path, 'r') as f: |
||||
|
server_cfg = json.load(f) |
||||
|
api_name = server_cfg['api_name'] |
||||
|
|
||||
|
#endregion |
||||
|
|
||||
|
#region 接口 |
||||
|
|
||||
|
# 测试用 |
||||
|
@app.route(f'/{api_name}') |
||||
|
def hello(): |
||||
|
return 'hi. —— CheckGPUsWeb' |
||||
|
|
||||
|
@app.route(f'/{api_name}/get_data', methods=['GET']) |
||||
|
def get_data(): |
||||
|
with data_lock: |
||||
|
return jsonify(data_dict) |
||||
|
|
||||
|
@app.route(f'/{api_name}/update_data', methods=['POST']) |
||||
|
def receive_data(): |
||||
|
data = request.json |
||||
|
# 如果存在对应标题则更新记录 |
||||
|
if data['title'] in server_cfg['server_list']: |
||||
|
with data_lock: |
||||
|
data_dict['server_dict'][data['title']] = data |
||||
|
# 合并显示信息 |
||||
|
if data['title'] in server_cfg['note_dict']: |
||||
|
client_note = data_dict['server_dict'][data['title']]['note'] |
||||
|
server_note = server_cfg['note_dict'][data['title']] |
||||
|
note = server_note if client_note == '' \ |
||||
|
else server_note + '\n' + client_note |
||||
|
data_dict['server_dict'][data['title']]['note'] = note |
||||
|
return jsonify({"status": "success"}) |
||||
|
|
||||
|
#endregion |
||||
|
|
||||
|
def init(): |
||||
|
with data_lock: |
||||
|
data_dict['server_dict'] = dict() |
||||
|
data_dict['version'] = version |
||||
|
for server_name in server_cfg['server_list']: |
||||
|
if server_name in server_cfg['note_dict']: |
||||
|
data_dict['server_dict'][server_name] = dict() |
||||
|
data_dict['server_dict'][server_name]['note'] = server_cfg['note_dict'][server_name] |
||||
|
else: |
||||
|
data_dict['server_dict'][server_name] = None |
||||
|
|
||||
|
def main(): |
||||
|
init() |
||||
|
|
||||
|
# 主动连接 |
||||
|
if 'connect_server' in server_cfg and len(server_cfg['connect_server']) > 0: |
||||
|
connector = Connector(data_dict['server_dict'], server_cfg['connect_server'], server_cfg['note_dict'], data_lock, server_cfg['connect_check_interval'], server_cfg['reconnect_interval']) |
||||
|
connector.run() |
||||
|
print('开启主动服务器主动连接 : ' + '、'.join([s['title'] for s in server_cfg['connect_server']])) |
||||
|
else: |
||||
|
print('未设置主动连接的服务器') |
||||
|
|
||||
|
# flask |
||||
|
app.run(debug=False, host=server_cfg['host'], port=server_cfg['port']) |
||||
|
|
||||
|
if __name__ == '__main__': |
||||
|
main() |
@ -0,0 +1,27 @@ |
|||||
|
{ |
||||
|
"host": "0.0.0.0", |
||||
|
"port": 15002, |
||||
|
"server_list":["76", "174", "233", "222"], |
||||
|
"note_dict":{ |
||||
|
"76" : "test1", |
||||
|
"SERVER_76" : "test2" |
||||
|
}, |
||||
|
"api_name": "api", |
||||
|
|
||||
|
"reconnect_interval" : 10, |
||||
|
"connect_check_interval" : 3, |
||||
|
"connect_server" : [ |
||||
|
{ |
||||
|
"title": "SERVER_76", |
||||
|
"ip": "lxblxb.top", |
||||
|
"port": 66666, |
||||
|
"username": "lxb", |
||||
|
"key_filename": "/home/lxb/.ssh/id_rsa", |
||||
|
"network_interface_name": "eno2", |
||||
|
"storage_list": [ |
||||
|
"/media/D", |
||||
|
"/media/F" |
||||
|
] |
||||
|
} |
||||
|
] |
||||
|
} |
@ -0,0 +1 @@ |
|||||
|
version = "0.2.2.20250626_beta" |
@ -0,0 +1,103 @@ |
|||||
|
#header-container { |
||||
|
font-size: 32px; |
||||
|
font-weight: bold; |
||||
|
padding-left: 20px; |
||||
|
padding-top: 10px; |
||||
|
/* padding-bottom: 5px; */ |
||||
|
} |
||||
|
|
||||
|
/* 设置html和body的高度为100% */ |
||||
|
html, body { |
||||
|
height: 100%; |
||||
|
margin: 0; |
||||
|
padding: 0; |
||||
|
} |
||||
|
|
||||
|
/* 使用flexbox布局 */ |
||||
|
body { |
||||
|
display: flex; |
||||
|
flex-direction: column; |
||||
|
} |
||||
|
|
||||
|
/* 主要内容区域,flex-grow: 1使其占据剩余空间 */ |
||||
|
.content { |
||||
|
flex-grow: 1; |
||||
|
/* padding: 20px; */ |
||||
|
} |
||||
|
|
||||
|
|
||||
|
/* Footer样式 */ |
||||
|
footer { |
||||
|
background-color: #f1f1f1; |
||||
|
color: rgb(172, 172, 172); |
||||
|
text-align: center; |
||||
|
padding: 10px 0; |
||||
|
} |
||||
|
|
||||
|
.card { |
||||
|
padding: 5px 10px; |
||||
|
margin: 5px; |
||||
|
border-radius: 8px; |
||||
|
box-shadow: 0px 1px 10px rgba(0, 0, 0, 0.3); |
||||
|
width: 300px; |
||||
|
display: inline-block; |
||||
|
vertical-align: top; |
||||
|
margin: 12px; |
||||
|
} |
||||
|
|
||||
|
.note-info { |
||||
|
border-style: solid; |
||||
|
border-width: 4px; |
||||
|
border-color: #a10000; |
||||
|
border-radius: 8px; |
||||
|
padding: 6px 10px; |
||||
|
margin-top: 4px; |
||||
|
margin-bottom: 6px; |
||||
|
} |
||||
|
|
||||
|
.server-name { |
||||
|
background-color: rgb(0, 0, 0); |
||||
|
color: white; |
||||
|
border-radius: 8px; |
||||
|
padding: 6px 10px; |
||||
|
font-size: 26px; |
||||
|
margin-top: 4px; |
||||
|
margin-bottom: 6px; |
||||
|
} |
||||
|
|
||||
|
.gpu-info { |
||||
|
background-color: aqua; |
||||
|
border-style: solid; |
||||
|
border-width: 1px; |
||||
|
border-color: #ccc; |
||||
|
border-radius: 8px; |
||||
|
margin-top: 5px; |
||||
|
padding: 4px 8px; |
||||
|
margin-bottom: 12px; |
||||
|
background-color: #f9f9f9; |
||||
|
box-shadow: 0 2px 5px rgba(0, 0, 0, 0.1); |
||||
|
} |
||||
|
|
||||
|
.process-item { |
||||
|
color: rgb(26, 92, 247); |
||||
|
font-weight: bold; |
||||
|
} |
||||
|
|
||||
|
/*占用状态*/ |
||||
|
.state-free { |
||||
|
color: green; |
||||
|
} |
||||
|
.state-occupy { |
||||
|
color: red; |
||||
|
} |
||||
|
.state-light-occupy { |
||||
|
color: orange; |
||||
|
} |
||||
|
.state-super-occupy { |
||||
|
color: rgb(255, 0, 0); |
||||
|
font-weight: bold; |
||||
|
font-size: 20px; |
||||
|
background-color: rgb(255, 255, 0); |
||||
|
border-radius: 8px; |
||||
|
padding: 1px 6px; |
||||
|
} |
@ -0,0 +1,27 @@ |
|||||
|
<!DOCTYPE html> |
||||
|
<html lang="en"> |
||||
|
|
||||
|
<head> |
||||
|
<meta charset="UTF-8"> |
||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0"> |
||||
|
<title>服务器信息</title> |
||||
|
<link rel="stylesheet" href="./css/style_1.css"> |
||||
|
</head> |
||||
|
|
||||
|
<body> |
||||
|
<div class="content"> |
||||
|
<div id="header-container"> |
||||
|
服务器信息 |
||||
|
</div> |
||||
|
<div id="server-data"> |
||||
|
</div> |
||||
|
</div> |
||||
|
|
||||
|
<footer> |
||||
|
<p>项目源码在<a href="http://git.lxblxb.top/lxb/Tool_CheckGPUsWeb/src/branch/v2">这里</a>,有问题可联系<span title="xiongbin_lin@163.com">lxb</span>。</p> |
||||
|
</footer> |
||||
|
|
||||
|
<script src="./js/script.js"></script> |
||||
|
</body> |
||||
|
|
||||
|
</html> |
@ -0,0 +1,291 @@ |
|||||
|
// 判断内网还是公网
|
||||
|
let apiURL = ''; |
||||
|
fetch('/index.html') // 随便请求一个资源
|
||||
|
.then(response => { |
||||
|
// 获取X-Environment响应头
|
||||
|
const environment = response.headers.get('X-Environment'); |
||||
|
|
||||
|
// 根据环境变量设置API URL
|
||||
|
if (environment === 'internal') { |
||||
|
apiURL = 'http://10.1.16.174:15001'; |
||||
|
} else { |
||||
|
apiURL = 'http://gpus.lxblxb.top'; |
||||
|
} |
||||
|
|
||||
|
console.log('访问地址: ' + apiURL); |
||||
|
}) |
||||
|
|
||||
|
// 请求服务器获取数据
|
||||
|
function fetchData() { |
||||
|
fetch(apiURL + '/api/get_data') |
||||
|
// 获取服务器和显卡数据
|
||||
|
.then(response => response.json()) // 解析 JSON 响应
|
||||
|
.then(data => { |
||||
|
// 处理 JSON 数据
|
||||
|
// console.log(data);
|
||||
|
displayServerData(data); // 调用显示数据的函数
|
||||
|
}) |
||||
|
.catch(error => { |
||||
|
// console.error('Error fetching data:', error);
|
||||
|
displayError(error + " (多半是没有正确连接服务器端,可能是没开、网络错误)"); |
||||
|
}); |
||||
|
} |
||||
|
|
||||
|
// 显示错误
|
||||
|
function displayError(err_info){ |
||||
|
let serverDataContainer = document.getElementById('server-data'); |
||||
|
serverDataContainer.innerHTML = ''; // 清空容器
|
||||
|
|
||||
|
let errDiv = document.createElement('div'); |
||||
|
errDiv.classList.add('error-info'); |
||||
|
errDiv.innerText = err_info; |
||||
|
serverDataContainer.appendChild(errDiv); |
||||
|
} |
||||
|
|
||||
|
// 将KB转为合适的格式
|
||||
|
function parse_data_unit(num, fixedLen=2){ |
||||
|
if (num < 1024){ |
||||
|
return num.toFixed(fixedLen) + " KB"; |
||||
|
} |
||||
|
|
||||
|
num /= 1024; |
||||
|
if (num < 1024){ |
||||
|
return num.toFixed(fixedLen) + " MB"; |
||||
|
} |
||||
|
|
||||
|
num /= 1024; |
||||
|
if (num < 1024){ |
||||
|
return num.toFixed(fixedLen) + " GB"; |
||||
|
} |
||||
|
|
||||
|
num /= 1024; |
||||
|
if (num < 1024){ |
||||
|
return num.toFixed(fixedLen) + " TB"; |
||||
|
} |
||||
|
} |
||||
|
|
||||
|
|
||||
|
function add_bar(serverCard){ |
||||
|
let bar = document.createElement('hr'); |
||||
|
serverCard.appendChild(bar); |
||||
|
} |
||||
|
|
||||
|
// 添加服务器信息的元素
|
||||
|
function displayServerData(data){ |
||||
|
let serverDataContainer = document.getElementById('server-data'); |
||||
|
serverDataContainer.innerHTML = ''; // 清空容器
|
||||
|
|
||||
|
let serverDataDict = data['server_dict']; |
||||
|
|
||||
|
// 创建每一个服务器的信息
|
||||
|
for (let serverTitle in serverDataDict){ |
||||
|
let serverCard = document.createElement('div'); |
||||
|
serverCard.className = 'card'; |
||||
|
|
||||
|
// 标题
|
||||
|
let serverName = document.createElement('div'); |
||||
|
serverName.className = 'server-name'; |
||||
|
serverName.textContent = serverTitle; |
||||
|
serverCard.appendChild(serverName); |
||||
|
|
||||
|
serverData = serverDataDict[serverTitle]; |
||||
|
// 如果没有数据则跳过
|
||||
|
if (serverData == null){ |
||||
|
let errText = document.createElement('div'); |
||||
|
errText.className = 'error-text'; |
||||
|
errText.textContent = "No data."; |
||||
|
serverCard.appendChild(errText); |
||||
|
serverDataContainer.appendChild(serverCard); |
||||
|
continue; |
||||
|
} |
||||
|
|
||||
|
// 添加公告
|
||||
|
if ('note' in serverData && serverData['note'] != ''){ |
||||
|
let noteInfo = document.createElement('div'); |
||||
|
noteInfo.className = 'note-info'; |
||||
|
|
||||
|
noteInfo.innerHTML = '<div style="text-align: center;"><strong>公告</strong></div>' + serverData['note']; |
||||
|
|
||||
|
serverCard.appendChild(noteInfo); |
||||
|
} |
||||
|
|
||||
|
// 判断时间
|
||||
|
let lastTime = new Date(serverData['update_time_stamp'] * 1000); |
||||
|
let timeFromUpdate = Date.now() - lastTime; |
||||
|
if (timeFromUpdate > serverData['interval'] * 1000 * 4){ |
||||
|
let errText = document.createElement('div'); |
||||
|
errText.className = 'error-text'; |
||||
|
errText.textContent = "长时间未更新,上次更新时间: " + lastTime.toLocaleString(); |
||||
|
serverCard.appendChild(errText); |
||||
|
serverDataContainer.appendChild(serverCard); |
||||
|
continue; |
||||
|
}else if (timeFromUpdate > serverData['interval'] * 1000 * 2.5){ |
||||
|
serverName.textContent = serverTitle + " - Not update -"; |
||||
|
} |
||||
|
|
||||
|
// 网速
|
||||
|
if ('network_list' in serverData){ |
||||
|
let networkInfo = document.createElement('div'); |
||||
|
networkInfo.className = 'network-info'; |
||||
|
|
||||
|
// todo 暂时采用所有网卡总和的方法
|
||||
|
let inSum = 0; |
||||
|
let outSum = 0; |
||||
|
let tmpTitle = ""; |
||||
|
serverData.network_list.forEach(function(network){ |
||||
|
inSum += network['in']; |
||||
|
outSum += network['out']; |
||||
|
tmpTitle += network['name'] + " in: " + parse_data_unit(network['in']) + "/s out: " + parse_data_unit(network['out']) + "/s\n"; |
||||
|
}); |
||||
|
|
||||
|
let inStr = parse_data_unit(inSum); |
||||
|
let outStr = parse_data_unit(outSum); |
||||
|
|
||||
|
networkInfo.innerHTML += "<strong> 网络 : </strong> <span title=\"" + tmpTitle + "\">in:" + inStr + "/s out:" + outStr + "/s</span><br>"; |
||||
|
|
||||
|
serverCard.appendChild(networkInfo); |
||||
|
// 分割线
|
||||
|
add_bar(serverCard); |
||||
|
} |
||||
|
|
||||
|
// CPU
|
||||
|
if ('cpu' in serverData){ |
||||
|
let cpuInfo = document.createElement('div'); |
||||
|
cpuInfo.className = 'cpu-info'; |
||||
|
|
||||
|
temperature_list_str = ""; |
||||
|
serverData.cpu['temperature_list'].forEach(function(v){ |
||||
|
temperature_list_str += v + " ℃ "; |
||||
|
}); |
||||
|
cpuInfo.innerHTML = "<strong>" + serverData.cpu['name'] + "</strong><br>" + |
||||
|
"<strong>温度 : </strong>" + temperature_list_str + "<br>" + |
||||
|
"<strong>占用率 : </strong><span title=\"" + serverData.cpu['core_occupy_list'] + "\">" + serverData.cpu['core_avg_occupy'] + "%"; |
||||
|
|
||||
|
serverCard.appendChild(cpuInfo); |
||||
|
// 分割线
|
||||
|
add_bar(serverCard); |
||||
|
} |
||||
|
|
||||
|
// 内存
|
||||
|
if ('memory' in serverData){ |
||||
|
let memoryInfo = document.createElement('div'); |
||||
|
memoryInfo.className = 'memory-info'; |
||||
|
let totalNum = serverData.memory.total |
||||
|
let usedNum = serverData.memory.used |
||||
|
let totalMem = parse_data_unit(totalNum); |
||||
|
let usedMem = parse_data_unit(usedNum); |
||||
|
let tmpClass = "state-free"; |
||||
|
if (usedNum / totalNum > 0.95) |
||||
|
tmpClass = "state-super-occupy"; |
||||
|
else if (usedNum / totalNum > 0.8) |
||||
|
tmpClass = "state-occupy"; |
||||
|
else if (usedNum / totalNum > 0.6) |
||||
|
tmpClass = "state-light-occupy"; |
||||
|
memoryInfo.innerHTML += "<strong> 内存 : </strong> <span class=\"" + tmpClass + "\">" + usedMem + " / " + totalMem + "</span><br>"; |
||||
|
serverCard.appendChild(memoryInfo); |
||||
|
// 分割线
|
||||
|
add_bar(serverCard); |
||||
|
} |
||||
|
|
||||
|
// 存储空间
|
||||
|
if ('storage_list' in serverData){ |
||||
|
let storageInfo = document.createElement('div'); |
||||
|
storageInfo.className = 'storage-info'; |
||||
|
|
||||
|
for (let i = 0; i < serverData.storage_list.length; i++) { |
||||
|
let targetPath = serverData.storage_list[i].path; |
||||
|
let totalNum = serverData.storage_list[i].total |
||||
|
let availableNum = serverData.storage_list[i].available |
||||
|
let totalStorage = parse_data_unit(totalNum); |
||||
|
let availableStorage = parse_data_unit(totalNum - availableNum); |
||||
|
let tmpClass = "state-free"; |
||||
|
if (availableNum / totalNum < 0.01) |
||||
|
tmpClass = "state-super-occupy"; |
||||
|
else if (availableNum / totalNum < 0.1) |
||||
|
tmpClass = "state-occupy"; |
||||
|
else if (availableNum / totalNum < 0.3) |
||||
|
tmpClass = "state-light-occupy"; |
||||
|
storageInfo.innerHTML += '<strong>' + targetPath + " :</strong> <span title=\"剩余可用:" + parse_data_unit(availableNum) + "\" class=\"" + tmpClass |
||||
|
+ "\">" + availableStorage + " / " + totalStorage + "</span><br>"; |
||||
|
} |
||||
|
|
||||
|
serverCard.appendChild(storageInfo); |
||||
|
// 分割线
|
||||
|
add_bar(serverCard); |
||||
|
} |
||||
|
|
||||
|
// gpu
|
||||
|
if ('gpu_list' in serverData){ |
||||
|
serverData.gpu_list.forEach(function(gpu){ |
||||
|
let gpuInfo = document.createElement('div'); |
||||
|
gpuInfo.className = 'gpu-info'; |
||||
|
|
||||
|
let markFree = '<span class="state-free"> 空闲</span>'; |
||||
|
let markLightOccupy = '<span class="state-light-occupy"> 占用</span>'; |
||||
|
let markOccupy = '<span class="state-occupy"> 占用</span>'; |
||||
|
let tmpMark = markFree; |
||||
|
let memory_used_ratio = gpu.used_memory / gpu.total_memory; |
||||
|
if (memory_used_ratio > 0.25 && gpu.utilization > 50){ |
||||
|
tmpMark = markOccupy; |
||||
|
} |
||||
|
else if (memory_used_ratio > 0.25 || gpu.utilization > 50){ |
||||
|
tmpMark = markLightOccupy; |
||||
|
}else{ |
||||
|
tmpMark = markFree; |
||||
|
} |
||||
|
gpuInfo.innerHTML = '<strong>' + gpu.idx + ' - ' + gpu.name + tmpMark + '</strong><br>' |
||||
|
+ '温度: ' + gpu.temperature + '°C<br>' |
||||
|
+ '显存: ' + gpu.used_memory + ' / ' + gpu.total_memory + " MB" + '<br>' |
||||
|
+ '利用率: ' + gpu.utilization + '%'; |
||||
|
|
||||
|
// 添加进程信息
|
||||
|
let processInfo = document.createElement('div'); |
||||
|
processInfo.classList.add('process-info'); |
||||
|
processInfo.innerHTML = "使用情况: "; |
||||
|
|
||||
|
gpu.process_list.sort((a, b) => b.memory - a.memory); |
||||
|
gpu.process_list.forEach(function(item, index){ |
||||
|
if (item.memory > 40) |
||||
|
processInfo.innerHTML += `<span class="process-item" title="${item.cmd}">${item.user} (${item.memory}) </span>`; |
||||
|
}); |
||||
|
gpuInfo.appendChild(processInfo); // 将用户信息添加到GPU信息中
|
||||
|
|
||||
|
serverCard.appendChild(gpuInfo); |
||||
|
}); |
||||
|
// 分割线
|
||||
|
add_bar(serverCard); |
||||
|
} |
||||
|
|
||||
|
// 错误信息
|
||||
|
if ('error_dict' in serverData){ |
||||
|
let errorInfo = document.createElement('div'); |
||||
|
errorInfo.className = 'storage-info'; |
||||
|
|
||||
|
if (Object.keys(serverData.error_dict).length > 0){ |
||||
|
for (let k in serverData.error_dict){ |
||||
|
errorInfo.innerHTML += '<strong>' + k + " :</strong>" + serverData.error_dict[k] + "<br>"; |
||||
|
} |
||||
|
|
||||
|
serverCard.appendChild(errorInfo); |
||||
|
// 分割线
|
||||
|
add_bar(serverCard); |
||||
|
} |
||||
|
} |
||||
|
|
||||
|
// 删除最后的分割线
|
||||
|
if (serverCard.lastElementChild && serverCard.lastElementChild.tagName === 'HR') { |
||||
|
serverCard.removeChild(serverCard.lastElementChild); |
||||
|
} |
||||
|
|
||||
|
// 单个服务器信息作为child加入
|
||||
|
serverDataContainer.appendChild(serverCard); |
||||
|
} |
||||
|
} |
||||
|
|
||||
|
// TODO test
|
||||
|
// fetchData()
|
||||
|
// 页面加载时获取数据并定时刷新
|
||||
|
document.addEventListener('DOMContentLoaded', function() { |
||||
|
fetchData(); |
||||
|
setInterval(fetchData, 4000); // 每4秒刷新一次数据
|
||||
|
}); |
Loading…
Reference in new issue