-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.qmd
135 lines (94 loc) · 4.96 KB
/
README.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
title: "A simple(r) demo R workflow to run OpenMalaria"
format: gfm
execute:
eval: false
---
This is a demo workflow to run OpenMalaria from within RStudio, based on [Aurélien's repo](https://github.com/acavelan/OpenMalariaRWorkflow). The purpose of this repo is to be instructional so someone new to OpenMalaria can run a relatively simple workflow within RStudio, but not as simple as the wiki-provided:
```{bash}
./OpenMalaria --scenario example_scenario.xml --output output.txt
```
In other words, we are running several scenarios with some combination of parameters instead of only putting in the one scenario into OpenMalaria.
This README will walk you through:
- Creating an SSH tunnel to set up an RStudio container on Pawsey
- Running the workflow on the RStudio container
## Set up RStudio
Open the script [`start_rstudio_mac.sh`](https://github.com/chitrams/demo-simple-om/blob/main/start_rstudio_mac.sh) if you use a Mac, or [`start_rstudio_linux.sh`](https://github.com/chitrams/demo-simple-om/blob/main/start_rstudio_linux.sh) if you're on Linux . This script will create an SSH tunnel to Setonix on your 8787 port.
What you need to change in this script is the USERNAME on line 4.
Then run:
```{bash}
# Change the file permission so you can execute the script:
chmod +x start_rstudio_mac.sh
# After the file permission has been changed, run:
./start_rstudio_mac.sh
```
You only need to run `chmod +x start_rstudio_mac.sh` once. Every other time you want to start RStudio on Pawsey you can simply execute it by using the command `./start_rstudio_mac.sh` in your terminal.
## Run the workflow
When you open launch.R, what you need to change are in the first 33 lines. I will walk you through them one by one.
### Parameters to change
```{R}
# OpenMalaria
om = list(
version = 47,
path = "/software/projects/pawsey1104/GROUP/OpenMalaria/OM_schema47/OpenMalaria_47",
exe = "/software/projects/pawsey1104/GROUP/OpenMalaria/OM_schema47/OpenMalaria_47/openmalaria_47.0.sif"
)
```
This bit of code specifies the path to the OpenMalaria support files and the executable. The support files that must be accessible to run OpenMalaria are `densities.csv` and `scenario_47.xsd` in the instance of version 47. The executable in this instance is a Singularity Image File that Aurélien has built, `openmalaria_47.0.sif`.
For any version changes, we will be updating the executables within that `GROUP` folder.
```{R}
# run scenarios, extract the data, or both
do = list(
run = TRUE,
extract = TRUE,
example = FALSE
)
```
Here is where it gets a bit gnarly. When you first run the scenarios, you want to set `run = TRUE` and everything else to `FALSE`. This is because we can't yet access Slurm from within RStudio. I will walk you through the run vs extract steps in the next sub-section. Before I get to that, there are a few more parameters I need to explain.
```{R}
experiment = 'local-experiment' # name of the experiment folder
```
This is the name of your folder. Change it to whatever you wish. I've set mine to the name `local-experiment`. I've added `local*/` to the .gitignore file as well because these output folders tend to be quite large.
The next few lines after that are the malaria parameters. You can change these as you wish; I've selected the ones in the current script for a relatively quick run (less than five minutes on a local machine without parallelisation).
### Running the workflow on an HPC
To run the scenarios, set the parameters in launch.R to the following:
```{R}
# run scenarios, extract the data, or both
do = list(
run = TRUE,
extract = FALSE,
example = FALSE
)
```
Then click on the 'Source' button to execute the script. It will come up with an error; this is expected! What you do now is open to a separate terminal window and do the following:
```{bash}
# Log on to setonix
# Navigate to the folder where the outputs are saved
cd experiment-folder
# Change file permissions so you can execute the script
chmod +x start_array_job.sh
# Execute the script
sbatch start_array_job.sh
```
OpenMalaria will then run on your XML files and save all the output in the `txt/` folder.
Now go back to your RStudio IDE and change the parameters in launch.R to the following:
```{R}
# run scenarios, extract the data, or both
do = list(
run = FALSE, # This is now FALSE!
extract = TRUE, # This is now TRUE!
example = FALSE
)
```
click on the 'Source' button again to execute the script. You should now have `output.csv` and `scenarios.csv` files in your output folder. Well done!
### Option to run the workflow locally
To run the workflow locally, go to the last part of the `if (do$run == TRUE) { ... }` part of the script. Navigate to the last section:
```{R}
message("Running scenarios...")
run_HPC(scenarios, experiment, om, NTASKS)
# run_local(scenarios, experiment, om)
```
then uncomment the `run_local()` line and comment out the `run_HPC()` line.
## Technical notes
- I run MacOS 14.5.