Skip to content

Commit 50a0518

Browse files
authored
Add usage on aurora (#406)
* update spark to 3.3.3 Signed-off-by: minmingzhu <minming.zhu@intel.com> * add usage doc * update * update * Update Spark job submit tool usage on aurora.md --------- Signed-off-by: minmingzhu <minming.zhu@intel.com>
1 parent 5449079 commit 50a0518

1 file changed

Lines changed: 276 additions & 0 deletions

File tree

Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
**Spark Job Submit Tool (SJST)** is a tool to submit Spark jobs on **Aurora**. SJST enables [OAP MLlib](https://github.com/oap-project/oap-mllib), which takes advantage of [Intel® oneAPI Data Analytics Library (oneDAL)](https://github.com/oneapi-src/oneDAL) and [Intel® oneAPI Collective Communications Library (oneCCL)](https://github.com/oneapi-src/oneCCL) to implement highly optimized machine learning algorithms. It can get most out of CPU and GPU capabilities and take efficient communication patterns in multi-node multi-GPU clusters.
2+
3+
The following is a quick start for SJST with an example about how to run K-Means program with GPU, and a detailed configuration description.
4+
5+
- [Quick Start](#quick-start)
6+
- [Introduction To SJST](#introduction-to-sjst)
7+
- [Create Spark Configuration Files](#create-spark-configuration-files)
8+
- [Submit Spark Job](#submit-spark-job)
9+
- [Submit Spark Job (interactive model)](#submit-spark-job-interactive-model)
10+
- [Check Your Job Status](#check-your-job-status)
11+
- [Check Output of Your Job](#check-output-of-your-job)
12+
- [Command Line Help](#command-line-help)
13+
- [Bash Mode](#bash-mode)
14+
- [Interactive Mode](#interactive-mode)
15+
- [Configurations](#configurations)
16+
- [env_aurora.sh](#env_aurorash)
17+
- [env_local.sh](#env_localsh)
18+
- [Example Configuration](#example-configuration)
19+
20+
## Quick Start
21+
### Introduction To SJST
22+
SJST is available at /lus/flare/projects/Aurora_deployment/spark/spark-job. The component of SJST is listed in the table below.
23+
```text
24+
spark-job
25+
├── bin/ // The scripts that setup spark cluster and submit jobs
26+
├── conf-to-your-submit-dir/ // Configurations about workers and executors
27+
│ // You'll mainly work with this directory.
28+
├── example/ // Example for testing and reference
29+
├── jars/ // Files for loading OAP MLlib
30+
├── LICENSE
31+
├── README.docx
32+
└── README.txt
33+
34+
```
35+
36+
The following sections give a detailed example about how to submit a job.
37+
38+
### Create Spark Configuration Files
39+
The spark-job/conf-to-your-submit-dir/ directory contains configuration files you want. You need to make a copy in your working directory.
40+
41+
For example:
42+
```shell
43+
$ mkdir ~/spark_work_home
44+
$ cp /lus/flare/projects/Aurora_deployment/spark/spark-job/conf-to-your-submit-dir/* ~/spark_work_home
45+
```
46+
47+
### Submit Spark Job
48+
After setup configurations, you can use the scripts in spark-job/bin/ to submit jobs from your working directory. Please make sure your configuration files are placed properly.
49+
50+
Here are the steps to submit the dense K-Means job with 2 nodes to queue “lustre_scaling” from Aurora login node. The data path given below is a 192G CSV file from DAOS
51+
52+
```shell
53+
$ cd ~/spark_work_home
54+
# Submit a Spark job with 2 nodes to read data from DAOS and run K-Means example.
55+
$ /lus/flare/projects/Aurora_deployment/spark/spark-job/bin/submit-spark.sh -A Aurora_deployment -l walltime=2:00:00 -l select=2 -l filesystems=flare:daos_user -q lustre_scaling /lus/flare/projects/Aurora_deployment/spark/spark-job/example/kmeans-pyspark.py daos://Intel2/hadoop_fs/HiBench/Kmeans/Input/36000000
56+
# Submit a Spark job with 2 nodes to read data from Lustre Filesystem and run K-Means example.
57+
$ /lus/flare/projects/Aurora_deployment/spark/spark-job/bin/submit-spark.sh -A Aurora_deployment -l walltime=2:00:00 -l select=2 -l filesystems=flare:daos_user -q lustre_scaling /lus/flare/projects/Aurora_deployment/spark/spark-job/example/kmeans-pyspark.py file:///lus/flare/projects/Aurora_deployment/spark/DataRoot/HiBench/Kmeans/Input/36000000
58+
59+
```
60+
After submitted successfully, you can get your job ID which is generated by Aurora. It is 27643.amn-0001 in this example. The following shows a successful submission:
61+
62+
```shell
63+
# Submitting job: /lus/flare/projects/Aurora_deployment/spark/spark-job/example/kmeans-pyspark.py daos://Intel/hadoop_fs2/HiBench/Kmeans/Input/36000000
64+
debug 1286330.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
65+
debug -A Aurora_deployment -l select=2 -l walltime=2:00:00 -l filesystems=flare:daos_user -q lustre_scaling -v SPARKJOB_SCRIPTS_DIR=/lus/flare/projects/Aurora_deployment/spark/spark-job/bin,SPARKJOB_CONFIG_DIR=/home/damon/spark_work_home,SPARKJOB_INTERACTIVE=0,SPARKJOB_SCRIPTMODE=0,SPARKJOB_OUTPUT_DIR=/home/damon/spark_work_home,SPARKJOB_SEPARATE_MASTER=0,SPARKJOB_OAPML=1,SPARKJOB_DAOS=1,SPARKJOB_ARG=/lus/flare/projects/Aurora_deployment/spark/spark-job/example/kmeans-pyspark.py^daos://Intel2/hadoop_fs/HiBench/Kmeans/Input/36000000 -o /home/damon/spark_work_home -e /home/damon/spark_work_home /lus/flare/projects/Aurora_deployment/spark/spark-job/bin/start-spark.sh
66+
# Submitted
67+
SPARKJOB_JOBID=1286330.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
68+
```
69+
If the submission failed, the output will be like the following:
70+
```shell
71+
# Submitting job: /lus/flare/projects/Aurora_deployment/spark/spark-job/example/kmeans-pyspark.py daos://Intel2/hadoop_fs/HiBench/Kmeans/Input/36000000
72+
qsub: would exceed queue generic's per-user limit of jobs in 'Q' state
73+
debug
74+
75+
debug -A Aurora_deployment -l select=2 -l walltime=2:00:00 -l filesystems=flare:daos_user -q lustre_scaling -v SPARKJOB_SCRIPTS_DIR=/lus/flare/projects/Aurora_deployment/spark/spark-job/bin,SPARKJOB_CONFIG_DIR=/home/damon/spark_work_home,SPARKJOB_INTERACTIVE=0,SPARKJOB_SCRIPTMODE=0,SPARKJOB_OUTPUT_DIR=/home/damon/spark_work_home,SPARKJOB_SEPARATE_MASTER=0,SPARKJOB_OAPML=1,SPARKJOB_DAOS=1,SPARKJOB_ARG=/lus/flare/projects/Aurora_deployment/spark/spark-job/example/kmeans-pyspark.py^daos://Intel2/hadoop_fs/HiBench/Kmeans/Input/36000000 -o /home/damon/spark_work_home -e /home/damon/spark_work_home /lus/flare/projects/Aurora_deployment/spark/spark-job/bin/start-spark.sh
76+
77+
# Submitting failed.
78+
```
79+
80+
### Submit Spark Job (interactive model)
81+
After setup configurations, you can use the scripts in spark-job/bin/ to submit jobs from your working directory. Please make sure your configuration files are placed properly.
82+
83+
Here are the steps to enter the interactive interface with 2 nodes to queue “lustre_scaling” from Aurora login node.
84+
```shell
85+
$ cd ~/spark_work_home
86+
$ /lus/flare/projects/Aurora_deployment/spark/spark-job/bin/submit-spark.sh -A Aurora_deployment -l walltime=2:00:00 -l select=2 -l filesystems=flare:daos_user -q lustre_scaling -I
87+
```
88+
After submitted successfully, you can get your job ID which is generated by Aurora. It is 67690.amn-0001 in this example. The following shows a successful submission:
89+
```shell
90+
Submitting an interactive job and wait for at most 1800 sec.
91+
debug 67690.amn-0001
92+
debug -A Aurora_deployment -l select=2 -l walltime=2:00:00 -l filesystems=flare:daos_user -q lustre_scaling -v SPARKJOB_SCRIPTS_DIR=/lus/flare/projects/Aurora_deployment/spark/spark-job/bin,SPARKJOB_CONFIG_DIR=/home/damon/spark_work_home,SPARKJOB_INTERACTIVE=1,SPARKJOB_SCRIPTMODE=0,SPARKJOB_OUTPUT_DIR=/home/damon/spark_work_home,SPARKJOB_SEPARATE_MASTER=0,SPARKJOB_OAPML=1,SPARKJOB_DAOS=1,SPARKJOB_ARG= -o /home/damon/spark_work_home -e /home/damon/spark_work_home /lus/flare/projects/Aurora_deployment/spark/spark-job/bin/start-spark.sh
93+
# Submitted
94+
SPARKJOB_JOBID=67690.amn-0001
95+
need spark job host: 0
96+
Waiting for Spark to launch by checking /home/damon/spark_work_home/67690.amn-0001
97+
# Spark is now running (SPARKJOB_JOBID=67690.amn-0001) on:
98+
# x1921c1s2b0n0.hostmgmt2000.cm.americas.sgi.com
99+
# x1921c1s5b0n0.hostmgmt2000.cm.americas.sgi.com
100+
declare -x SPARK_MASTER="spark://x1921c1s2b0n0.hostmgmt.cm.americas.sgi.com:7077"
101+
# Spawning bash on host: x1921c1s2b0n0
102+
Adding oap mllib to additional class path
103+
need spark job host: 1
104+
105+
sourced /lus/flare/projects/Aurora_deployment/spark/spark-job/bin/env_gpu.sh
106+
need spark job host: 1
107+
loading DAOS module
108+
sourced /home/damon/spark_work_home/env_aurora.sh
109+
sourced /lus/flare/projects/Aurora_deployment/spark/spark-job/bin/env_spark_daos.sh
110+
GPU Options: --conf spark.oap.mllib.device=GPU
111+
sourced /home/damon/spark_work_home/env_local.sh
112+
```
113+
After entering the interactive interface, you can run command line.
114+
```shell
115+
$ spark-submit /lus/flare/projects/Aurora_deployment/spark/spark-job/example/kmeans-pyspark.py daos://Intel2/hadoop_fs/HiBench/Kmeans/Input/36000000
116+
```
117+
118+
### Check Your Job Status
119+
After submitting jobs, Aurora will allocate resources and schedule your jobs. You can check your job with command qstat.
120+
121+
For the example above, the output will be like this:
122+
```shell
123+
// The job is waiting for resources.
124+
$ qstat 27643.amn-0001
125+
Job id Name User Time Use S Queue
126+
---------------- ---------------- ---------------- -------- - -----
127+
27643.amn-0001 start-spark.sh damon 0 Q lustre_scaling
128+
129+
// The job is running.
130+
$ qstat 27643.amn-0001
131+
Job id Name User Time Use S Queue
132+
---------------- ---------------- ---------------- -------- - -----
133+
27643.amn-0001 start-spark.sh damon 0 R lustre_scaling
134+
135+
// The job is end.
136+
$ qstat 27643.amn-0001
137+
Job id Name User Time Use S Queue
138+
---------------- ---------------- ---------------- -------- - -----
139+
27643.amn-0001 start-spark.sh damon 0 E lustre_scaling
140+
```
141+
To check your job status periodically, you can use watch command. After the job ends, you can use Ctrl-C to stop "watch" process.
142+
```shell
143+
$ watch -n 2 qstat 27643.amn-0001
144+
Job id Name User Time Use S Queue
145+
---------------- ---------------- ---------------- -------- - -----
146+
27643.amn-0001 start-spark.sh damon 0 Q lustre_scaling
147+
```
148+
149+
### Check Output of Your Job
150+
The output of your job is stored in your working directory and here is an example.
151+
```text
152+
spark_work_home
153+
├── 27643.amn-0001/
154+
│ └── conf/ // Spark conf like spark-env.sh, spark-default.xml and slaves
155+
│ └── logs/ // Spark master and worker logs
156+
│ └── app-20230301080732-0000/ // After Spark submitted successfully, you will get application ID that contains stderr and stdout log for each executor.
157+
├── 27643.amn-0001.ER // All stderr output for your job.
158+
└── 27643.amn-0001.OU // All stdout output for your job.
159+
```
160+
161+
The results of dense K-Means example are at the end of **27643.amn-0001.OU**.
162+
163+
### Command Line Help
164+
submit-spark.sh provides a few arguments. You can get detailed info about the arguments and examples with below command.
165+
```shell
166+
$ /lus/flare/projects/Aurora_deployment/spark/spark-job/bin/submit-spark.sh -h
167+
Spark Job Submit Tool v1.0.3 for Aurora
168+
169+
Usage:
170+
submit-spark.sh [options] JOBFILE [arguments ...]
171+
172+
JOBFILE can be:
173+
script.py pyspark scripts
174+
run-example examplename run Spark examples, like "SparkPi"
175+
--class classname example.jar run Spark job "classname" wrapped in example.jar
176+
shell-script.sh when option "-s" is specified
177+
178+
Required options:
179+
-l walltime=<v> Max run time, e.g., -l walltime=30:00 for running 30 minutes at most
180+
-l select=<v> Job node count, e.g., -l select=2 for requesting 2 nodes
181+
-l filesystems=<v> Filesystem type, e.g., -l filesystems=flare:daos_user for requesting flare and daos filesystems
182+
-q QUEUE Queue name
183+
184+
Optional options:
185+
-A PROJECT Allocation name
186+
-o OUTPUTDIR Directory for job output files (default: current dir)
187+
-s Enable shell script mode
188+
-y Spark master uses a separate node
189+
-I Start an interactive ssh session
190+
-b Prefer Spark built-in mllib to default OAP mllib
191+
-n Run without DAOS loaded
192+
-h Print this help messages
193+
Example:
194+
submit-spark.sh -A Aurora_deployment -l walltime=60 -l select=2 -l filesystems=flare:daos_user -q workq kmeans-pyspark.py daos://pool0/cont1/kmeans/input/libsvm 10
195+
submit-spark.sh -A Aurora_deployment -l walltime=30 -l select=1 -l filesystems=flare:daos_user -q workq -o output-dir run-example SparkPi
196+
submit-spark.sh -A Aurora_deployment -I -l walltime=30 -l select=2 -l filesystems=flare:daos_user -q workq --class com.intel.jlse.ml.KMeansExample example/jlse-ml-1.0-SNAPSHOT.jar daos://pool0/cont1/kmeans/input/csv csv 10
197+
submit-spark.sh -A Aurora_deployment -l walltime=30 -l select=1 -l filesystems=flare:daos_user -q workq -s example/test_script.sh job.log
198+
```
199+
#### Bash Mode:
200+
```shell
201+
$ /lus/flare/projects/Aurora_deployment/spark/spark-job/bin/submit-spark.sh -A Aurora_deployment -l walltime=2:00:00 -l select=1 -l filesystems=flare:daos_user -q lustre_scaling -s test_script.sh
202+
```
203+
#### Interactive Mode:
204+
```shell
205+
$ /lus/flare/projects/Aurora_deployment/spark/spark-job/bin/submit-spark.sh -A Aurora_deployment -l walltime=2:00:00 -l select=1 -l filesystems=flare:daos_user -q lustre_scaling
206+
$ ./test_script.sh
207+
```
208+
209+
## Configurations
210+
211+
This section introduces how to configure SJST and make Spark work as you want. By default, SJST provides **env_aurora.sh** and **env_local.sh** that already configured resources properly for Aurora. If you want to customize resources, the following information may help you.
212+
213+
### env_aurora.sh
214+
From here you can change Spark worker's CPU, memory and GPU resources.
215+
| Configuration Option | Description |
216+
|----------------------|-------------|
217+
| SPARK_WORKER_CORES | Maximum number of cores for each worker |
218+
| SPARK_WORKER_MEMORY | Maximum memory for each worker |
219+
| GPU_RESOURCE_FILE | Path to resources file which is used to find various resources while worker starting up |
220+
| GPU_WORKER_AMOUNT | Amount of GPUs resource each worker to use. |
221+
### env_local.sh
222+
From here you can change Spark and OAP MLlib configurations. For more Spark configuration details, you can refer to Spark configuration docs.
223+
| Configuration Option | Description |
224+
|----------------------|-------------|
225+
| spark.oap.mllib.device | Select compute device as CPU or GPU. default value is GPU. (Currently OAP MLlib Jar only supports GPU, CPU support will be added in the future releases.) |
226+
| spark.executor.cores | The number of cores to use on each executor. |
227+
| spark.driver.memory | Amount of memory to use for the driver process. |
228+
| spark.executor.memory | Amount of memory to use per executor process. |
229+
| spark.executor.instances | If `--num-executors` (or `spark.executor.instances`) is set and larger than this value, it will be used as the initial number of executors. |
230+
| spark.worker.resourcesFile | Path to resource file which is used to find various resources while worker starting up. default value is $GPU_RESOURCE_FILE |
231+
| spark.worker.resource.gpu.amount | Amount of a particular resource to use on the worker. default value is $GPU_WORKER_AMOUNT |
232+
| spark.executor.resource.gpu.amount | Amount of GPUs resource for each executor |
233+
| spark.task.resource.gpu.amount | Amount of GPUs resource for each task |
234+
| spark.driver.extraJavaOptions | A string of extra JVM options to pass to driver. |
235+
| spark.executor.extraJavaOptions | A string of extra JVM options to pass to executors. |
236+
| spark.executor.extraClassPath | Extra classpath entries to prepend to the classpath of executors. OAP MLlib Jar path is added here. |
237+
| spark.driver.extraClassPath | Extra classpath entries to prepend to the classpath of the driver. OAP MLlib Jar path is added here. |
238+
| spark.eventLog.enabled | Whether to log Spark events. |
239+
| spark.eventLog.dir | Base directory in which Spark events are logged. |
240+
241+
## Example Configuration
242+
Here is an example that prioritizes making the dense K-Means example most efficient.
243+
244+
Suppose that we have a cluster as bellow.
245+
| Cluster | Parameters |
246+
|----------------------|-------------|
247+
| Node | 3 |
248+
| Memory per node | 1024G |
249+
| Physical cores | 104 |
250+
| Logical cores | 208 |
251+
| GPUs | 12 |
252+
253+
We want to ensure each spark executor corresponds to one GPU which is most efficient. Therefore, we want to launch 12 executors and make each executor get enough resources to run the dense K-Means example. We will set 96 physical cores and 840G memory for the worker (leaving some cores and memory for the OS). Meanwhile we set each executor uses 8(96/12) logical cores and 70G (840G/12) memory. Then we come up with the configuration files as bellow.
254+
255+
#### env_aurora.sh
256+
```shell
257+
# set Spark Worker resources
258+
export SPARK_WORKER_CORES=96 # Maximum number of cores that each worker can get
259+
export SPARK_WORKER_MEMORY=840G # Maximum memory that each worker can get
260+
261+
# set GPU options
262+
export GPU_RESOURCE_FILE=$SPARKJOB_CONFIG_DIR/gpuResourceFile_aurora.json # Path to resources file which is used to find various resources while worker starting up
263+
export GPU_WORKER_AMOUNT=12
264+
265+
```
266+
267+
#### env_local.sh
268+
```shell
269+
spark.driver.memory 20g # each driver has 20g memory
270+
spark.executor.cores 8 # each executor has 8 (96/12) cores
271+
spark.executor.memory 70g # each executor has 70g (840G/12) memory
272+
spark.worker.resource.gpu.amount 12 # each node has 1 worker and each worker has 12 GPUs
273+
spark.executor.resource.gpu.amount 1 # each executor has 1 GPU
274+
spark.executor.instances 36 # 36 (12 x 3) executors in total
275+
spark.task.resource.gpu.amount 0.125 # Each task uses 1/8 of a GPU. This value should be determined based on your **target task concurrency**. For example, if each node has 12 GPUs and you want 96 concurrent GPU tasks per node, then spark.task.resource.gpu.amount = 12 / 96 = 0.125. In this case, 8 tasks share each GPU, allowing better parallelism for lightweight stages like preprocessing.
276+
```

0 commit comments

Comments
 (0)