Skip to content

[Bug/Help] portal:多节点集群中只有一个节点有GPU资源,web界面识别错误 #1495

Description

@397325475

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

发生了什么 | What happened

7节点集群中只有一个节点有4张GPU资源,web界面识别的是28张GPU,当把这个有GPU的节点单独一个分区后,识别的GPU数正常,但是新建交互式任务的时候只能勾选GPU,不能设置CPU资源

期望结果 | What did you expect to happen

系统在一个分区内,可以正常识别GPU的数量

之前运行正常吗? | Did this work before?

之前没有这样试过,第一次遇到这个问题

复现方法 | Steps To Reproduce

1、slurm集群其中一个节点有GPU
2、web登录首页显示GPU数量不对

运行环境 | Environment

- OS:centos7.9
- Scheduler:slurm 22.05.8
- Docker:26.1.4
- Docker-compose: V2.7.0
- SCOW cli: 1.6.4
- SCOW: v1.6.4
- Adapter:slurm-adapter v1.6

备注 | Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions