Skip to content

Commit

Permalink
update: quick-start and apps usage
Browse files Browse the repository at this point in the history
  • Loading branch information
Cloudac7 committed Oct 31, 2024
1 parent 3a3fbb5 commit 636784c
Show file tree
Hide file tree
Showing 30 changed files with 835 additions and 146 deletions.
Binary file removed docs/_images/app1.png
Binary file not shown.
Binary file removed docs/_images/app10.png
Binary file not shown.
Binary file removed docs/_images/app11.png
Binary file not shown.
Binary file removed docs/_images/app12.png
Binary file not shown.
Binary file removed docs/_images/app13.png
Binary file not shown.
Binary file removed docs/_images/app14.png
Binary file not shown.
Binary file removed docs/_images/app15.png
Binary file not shown.
Binary file removed docs/_images/app2.png
Binary file not shown.
Binary file removed docs/_images/app3.png
Binary file not shown.
Binary file removed docs/_images/app4.png
Binary file not shown.
Binary file removed docs/_images/app5.png
Binary file not shown.
Binary file removed docs/_images/app6.png
Binary file not shown.
Binary file removed docs/_images/app9.png
Binary file not shown.
21 changes: 10 additions & 11 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,19 @@

由于受水平和时间所限,错误和不妥之处在所难免,欢迎指出错误和改进意见,我们将尽力完善。

## 目录

本使用文档包含以下主要内容:

1. [平台简介](introduction/index.md):介绍平台资源
2. [重大更新公告](introduction/updates.md): 汇总智算中心重大更新相关公告信息
3. [开户流程](introduction/register.md): 介绍开户流程
4. [用户登录与文件传输](usage/login.md):如何登录集群和进行文件传输
5. [分区(队列)管理](usage/partition.md):分区和队列的设置和收费标准
6. [SCOW算力平台](usage/scow.md):SCOW平台的使用说明
7. [Slurm作业调度系统](slurm/index.md):详细介绍Slurm的各项功能和使用方法
8. [应用软件及脚本](./usage/app.md):如何使用平台上的应用软件和编写脚本
9. [注意事项](./information/notes.md):使用平台时需要注意的事项
10. [故障排查](./information/troubleshooting.md):用户对可能遇到的故障进行排查的流程
11. [常见问题](./information/faq.md): 用户经常遇到的问题汇总
3. [快速上手](usage/quick-start.md): 用户快速上手流程
4. [开户流程](introduction/register.md): 介绍开户流程
5. [用户登录与文件传输](usage/login.md):如何登录集群和进行文件传输
6. [分区(队列)管理](usage/partition.md):分区和队列的设置和收费标准
7. [SCOW算力平台](usage/scow.md):SCOW平台的使用说明
8. [Slurm作业调度系统](slurm/index.md):详细介绍Slurm的各项功能和使用方法
9. [应用软件及脚本](./usage/apps/index.md):如何使用平台上的应用软件和作业脚本示例
10. [注意事项](./information/notes.md):使用平台时需要注意的事项
11. [故障排查](./information/troubleshooting.md):用户对可能遇到的故障进行排查的流程
12. [常见问题](./information/faq.md): 用户经常遇到的问题汇总

希望本目录能帮助您快速找到所需信息,提升使用体验。
6 changes: 3 additions & 3 deletions docs/introduction/register.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

嘉庚智算中心支持内部用户以课题组/项目组为单位开立“用户组”账户,一个课题组/项目组的“用户组”账户下可以开立多个“用户”账户。“用户组”账户下的“用户”账户将共享“用户组”账户内的充值金额。

内部用户申请开立“用户组”账户或在“用户组”账户新增“用户”账户需将以下两项材料的扫描件通过厦门大学的邮箱发送至 [hpc@xmu.edu.cn](mailto:hpc@xmu.edu.cn),并抄送经费负责人:
内部用户申请开立“用户组”账户或在“用户组”账户新增“用户”账户需将以下两项材料的扫描件通过厦门大学的邮箱发送至 [ikkemhpc@xmu.edu.cn](mailto:ikkemhpc@xmu.edu.cn),并抄送经费负责人:

1. **账户申请表**: 请先填写相关信息,而后请用户在“申请人”处签名,经费负责人在“单位/项目组负责人”处签名。
2. **用户承诺书**: 请新开立“用户”账户的用户在“承诺人”处签名。
Expand All @@ -34,7 +34,7 @@ sbatch: error: Job submit/allocate failed: Invalid account or account/partition

### 续费

有续费需求的用户可发送邮箱至 [hpc@xmu.edu.cn](mailto:hpc@xmu.edu.cn) 邮箱或在微信服务群内告知,智算中心会主动联系,配合办理续费业务。
有续费需求的用户可发送邮箱至 [ikkemhpc@xmu.edu.cn](mailto:ikkemhpc@xmu.edu.cn) 邮箱或在微信服务群内告知,智算中心会主动联系,配合办理续费业务。

## 初始密码与重置

Expand All @@ -47,7 +47,7 @@ sbatch: error: Job submit/allocate failed: Invalid account or account/partition

## 销户

用户可用申请时的邮箱向 [hpc@xmu.edu.cn](mailto:hpc@xmu.edu.cn) 提出用户账户注销申请。注销申请发出前建议用户做好数据备份,避免因账户注销造成损失。
用户可用申请时的邮箱向 [ikkemhpc@xmu.edu.cn](mailto:ikkemhpc@xmu.edu.cn) 提出用户账户注销申请。注销申请发出前建议用户做好数据备份,避免因账户注销造成损失。

!!! warning 特别提醒
为提高资源的利用效率,嘉庚智算中心会对一年以内未使用的账户进行销户处理。
128 changes: 0 additions & 128 deletions docs/usage/app.md

This file was deleted.

70 changes: 70 additions & 0 deletions docs/usage/apps/abaqus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Abaqus

ABAQUS是一种有限元素法软件,用于机械、土木、电子等行业的结构和场分析。

!!! info
Abaqus 是商业授权软件,需要申请权限后使用。

```bash title="/public/slurmscript_demo/abaqus.slurm"
#!/bin/bash
#SBATCH --nodes=2 #节点数量
#SBATCH --ntasks-per-node=8 #每个节点使用的核心数量
#SBATCH --error=%j.err
#SBATCH --output=%j.out
#SBATCH --account=[budget] # Account name
#SBATCH --partition=cpu # Partition name
#SBATCH --qos=[qos] # QOS name

CURDIR=`pwd`
rm -rf $CURDIR/nodelist.$SLURM_JOB_ID
NODES=`scontrol show hostnames $SLURM_JOB_NODELIST`
for i in $NODES
do
echo "$i:$SLURM_NTASKS_PER_NODE" >> $CURDIR/nodelist.$SLURM_JOB_ID
done
echo $SLURM_NPROCS

echo "process will start at : "
date
echo "++++++++++++++++++++++++++++++++++++++++"

##setting environment for abaqus-2019
export PATH=/public/software/abaqus/abaqus-2019/DassaultSystemes/SIMULIA/Commands/:$PATH

cd $CURDIR
rm -rf *.lck*
rm -rf $CURDIR/nodefile
np=$SLURM_NPROCS
nu=$SLURM_NNODES
cpuspernode=$SLURM_NTASKS_PER_NODE
echo $cpuspernode
echo $nu
echo $np

for i in $NODES
do
echo "$i" >> $CURDIR/nodefile
done

pie="'"
machinelist=$(awk '{if( NR != '$nu' ) printf "['$pie'"$0"'$pie',"'$cpuspernode'"],"} {if(NR=='$nu') printf "['$pie'"$0"'$pie', "'$cpuspernode'"]"}' nodefile)
echo "mp_host_list=[$machinelist]"
echo "mp_rsh_command='ssh -n -l %U %H %C'" > abaqus_v6.env
echo "mp_host_list=[$machinelist]" >> abaqus_v6.env

export MPI_IB_STRINGS=mlx5_0:1
export MPIRUN_OPTIONS="-prot"


unset SLURM_GTIDS
inputfile=abaqus_suanli.inp
abaqus job=ABAQUS cpus=$SLURM_NPROCS input=$inputfile interactive ask_delete=off > ./log


echo "++++++++++++++++++++++++++++++++++++++++"
echo "processs will sleep 30s"
sleep 30
echo "process end at : "
date
rm -rf $CURDIR/nodelist.$SLURM_JOB_ID
```
29 changes: 29 additions & 0 deletions docs/usage/apps/amber.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# amber

```bash title="/public/slurmscript_demo/amber-intel.slurm"
#!/bin/bash -l
#SBATCH --job-name=mpi_job_test # Job name
#SBATCH --output=testSlurmJob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=testSlurmJob.%j.err # Stderr (%j expands to jobId)
#SBATCH -N 2 # Maximum number of node
#SBATCH --ntasks-per-node=16 # Maximum number CPUs of each node
#SBATCH --account=[budget] # Account name
#SBATCH --partition=cpu # Partition name
#SBATCH --qos=[qos] # QOS name

module load intel/oneapi2021.1
module load amber/20

srun hostname >./hostfile
echo $SLURM_NTASKS
echo "Date = $(date)"
echo "Hostname = $(hostname -s)"
echo "Working Directory = $(pwd)"
echo ""
echo "Number of Nodes Allocated = $SLURM_JOB_NUM_NODES"
echo "Number of Tasks Allocated = $SLURM_NTASKS"
echo "Number of Cores/Task Allocated = $SLURM_CPUS_PER_TASK"
echo $SLURM_NPROCS

mpirun -machinefile hostfile -np $SLURM_NTASKS pmemd.MPI -O -i mdin -o mdout -p prmtop -c inpcrd
```
33 changes: 33 additions & 0 deletions docs/usage/apps/comsol.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# COMSOL

!!! info
COMSOL 是商业授权软件,需要申请权限后使用。

```bash title="/public/slurmscript_demo/comsol.slurm"
#!/bin/bash
#SBATCH --job-name="COMSOL"
#SBATCH --output=testSlurmJob.%j.out # Stdout (%j expands to jobId)
#SBATCH --error=testSlurmJob.%j.err # Stderr (%j expands to jobId)
#SBATCH -N 2 # Maximum number of node
#SBATCH --ntasks-per-node=1 # Maximum number CPUs of each node
#SBATCH --account=[budget] # Account name
#SBATCH --partition=cpu # Partition name
#SBATCH --qos=[qos] # QOS name

# Set Comsol ENV
module load comsol/5.6
module load intel/2020.2

srun hostname >./hostfile
echo $SLURM_NTASKS
echo "Date = $(date)"
echo "Hostname = $(hostname -s)"
echo "Working Directory = $(pwd)"
echo ""
echo "Number of Nodes Allocated = $SLURM_JOB_NUM_NODES"
echo "Number of Tasks Allocated = $SLURM_NTASKS"
echo "Number of Cores/Task Allocated = $SLURM_CPUS_PER_TASK"
echo $SLURM_NPROCS

comsol batch -nnhost 1 -np $SLURM_NTASKS -inputfile test.mph -outputfile outtest.mph -batchlog in.log
```
34 changes: 34 additions & 0 deletions docs/usage/apps/cp2k.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# CP2K

CP2K 是可用于DFT计算和分子动力学模拟的强大软件包。它支持多种计算方法,包括密度泛函理论(DFT)、半经验方法和经典力场。CP2K 以其高效的并行计算能力和灵活的输入文件格式而闻名,适用于从小分子到大规模材料系统的模拟。用户可以利用 CP2K 进行能量计算、几何优化、分子动力学模拟等多种任务。

## 嘉庚智算上的CP2K

!!! failure
目前由于升级后 OpenMPI 的已知问题,目前集群上的 CP2K 仅 `cp2k/2024.3``cp2k/2024.3-generic` 版本可用。
前者由于 Core Dump 等原因未能通过 Regtest(但无数值issue),后者由于17个测试任务的数值不匹配亦未能通过。

因此请各位用户在使用时注意自己的结果,我们正在积极解决升级后的集群与 CP2K 的兼容性问题。
如对可靠性有较高要求,推荐使用 [CP2K 官方 Singularity 容器镜像](https://github.com/cp2k/cp2k-containers#apptainer-singularity)。

```bash title="/public/slurmscript_demo/cp2k-2024.3.slurm"
#!/bin/bash
#SBATCH --nodes=1 # 节点数量
#SBATCH --ntasks-per-node=64 # 每个节点核心数量
#SBATCH --job-name=hello # 作业名称
#SBATCH --output=%j.out # 正常日志输出 (%j 参数值为 jobId)
#SBATCH --error=%j.err # 错误日志输出 (%j 参数值为 jobId)
#SBATCH --account=[budget] # Account name
#SBATCH --partition=cpu # Partition name
#SBATCH --qos=[qos] # QOS name
#SBATCH --mem=251G # use full memory of node to avoid OOM

##############################################
# Software Envrironment #
##############################################
module load cp2k/2024.3
##############################################
# Run job #
##############################################
mpirun cp2k.psmp cp2k.inp >> output
```
Loading

0 comments on commit 636784c

Please sign in to comment.