Skip to content

Commit 036b9d5

Browse files
authored
Update quick_start.md
1 parent 0e7ef5c commit 036b9d5

File tree

1 file changed

+86
-0
lines changed

1 file changed

+86
-0
lines changed

Diff for: docs/quick_start.md

+86
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Quick start
2+
3+
In this page you can find a quick start guide on how to use IEETA cluster (Pleiades).
4+
5+
## 1. Access IEETA cluster (Pleiades)
6+
7+
Access the cluster via SSH using the credentials provided to you by email. If you do not have access yet, please refer to the `how_to_access.md` page.
8+
9+
```bash
10+
11+
```
12+
13+
By default, upon logging in, you will land on our **login** node in your home directory, which is located at `/data/home`. This is a network storage partition visible to all cluster nodes.
14+
15+
The **login** node is where you should prepare your code in order to submit jobs to run on the **worker** nodes of the cluster. The worker nodes are equipped with powerful resources. Currently, we have:
16+
17+
- **CPU nodes**: Nodes with a high amount of RAM and faster CPUs. *Currently not added to the cluster yet*
18+
- **GPU nodes**: Nodes equipped with GPUs and more modest CPU/RAM configurations.
19+
20+
For more information about each node check the [nodes page](detail_material/nodes.md).
21+
22+
## 2. Prepare your software environment
23+
24+
The next step is to prepare your environment to run/build your application. We recommend using a virtual environment so that you can install any package locally. First, load the Python module.
25+
26+
```bash
27+
$ module load python
28+
```
29+
Then create and activate your virtual environment.
30+
31+
```bash
32+
$ python -m venv virtual-venv
33+
$ source virtual-venv/bin/activate
34+
```
35+
You can then install your package dependencies with pip.
36+
```bash
37+
(virtual-venv)$ pip install --upgrade pip
38+
(virtual-venv)$ pip install torch transformers
39+
```
40+
41+
## 3. Create your SLURM job script
42+
43+
After setting up your runtime environment, you should create a SLURM job script to submit your job. For example:
44+
45+
```bash
46+
#!/bin/bash
47+
#SBATCH --job-name=trainer # create a short name for your job
48+
#SBATCH --output="trainer-%j.out" # %j will be replaced by the slurm jobID
49+
#SBATCH --nodes=1 # node count
50+
#SBATCH --ntasks=1 # total number of tasks across all nodes
51+
#SBATCH --cpus-per-task=2 # cpu-cores per task (>1 if multi-threaded tasks)
52+
#SBATCH --gres=gpu:1 # number of gpus per node
53+
#SBATCH --mem=4G # Total amount of RAM requested
54+
55+
source /virtual-venv/bin/activate # If you have your venv activated when you submit the job, then you do not need to activate/deactivate
56+
57+
python your_trainer_script.py
58+
59+
deactivate
60+
```
61+
The script is made of two parts:
62+
1. Specification of the resources needed and some job information;
63+
2. Comands that will be executed on the destination node.
64+
65+
As an example, in the first part of the script, we define the job name, the output file and the requested resources (1 GPU, 2 CPUs and 4GB RAM). Then, in the second part, we define the tasks of the job.
66+
67+
By default since no partition was defined the job will run under the default partitaion that in this cluster is the gpu partition, you can check which partitions and nodes are available with:
68+
69+
```bash
70+
$ sinfo
71+
```bash
72+
73+
## 4. Submit the job
74+
75+
To submit the job, you should run the following command:
76+
77+
```bash
78+
79+
$ sbatch script_trainer.sh
80+
Submitted batch job 144
81+
82+
You can check the job status using the following command:
83+
84+
```bash
85+
$ squeue
86+
```bash

0 commit comments

Comments
 (0)