This is a demo workflow to run OpenMalaria from within RStudio, based on Aurélien’s repo. The purpose of this repo is to be instructional so someone new to OpenMalaria can run a relatively simple workflow within RStudio, but not as simple as the wiki-provided:
./OpenMalaria --scenario example_scenario.xml --output output.txt
In other words, we are running several scenarios with some combination of parameters instead of only putting in the one scenario into OpenMalaria.
This README will walk you through:
- Creating an SSH tunnel to set up an RStudio container on Pawsey
- Running the workflow on the RStudio container
Open the script
start_rstudio_mac.sh
if you use a Mac, or
start_rstudio_linux.sh
if you’re on Linux . This script will create an SSH tunnel to Setonix on
your 8787 port.
What you need to change in this script is the USERNAME on line 4.
Then run:
# Change the file permission so you can execute the script:
chmod +x start_rstudio_mac.sh
# After the file permission has been changed, run:
./start_rstudio_mac.sh
You only need to run chmod +x start_rstudio_mac.sh
once. Every other
time you want to start RStudio on Pawsey you can simply execute it by
using the command ./start_rstudio_mac.sh
in your terminal.
When you open launch.R, what you need to change are in the first 33 lines. I will walk you through them one by one.
# OpenMalaria
om = list(
version = 47,
path = "/software/projects/pawsey1104/GROUP/OpenMalaria/OM_schema47/OpenMalaria_47",
exe = "/software/projects/pawsey1104/GROUP/OpenMalaria/OM_schema47/OpenMalaria_47/openmalaria_47.0.sif"
)
This bit of code specifies the path to the OpenMalaria support files and
the executable. The support files that must be accessible to run
OpenMalaria are densities.csv
and scenario_47.xsd
in the instance of
version 47. The executable in this instance is a Singularity Image File
that Aurélien has built, openmalaria_47.0.sif
.
For any version changes, we will be updating the executables within that
GROUP
folder.
# run scenarios, extract the data, or both
do = list(
run = TRUE,
extract = TRUE,
example = FALSE
)
Here is where it gets a bit gnarly. When you first run the scenarios,
you want to set run = TRUE
and everything else to FALSE
. This is
because we can’t yet access Slurm from within RStudio. I will walk you
through the run vs extract steps in the next sub-section. Before I get
to that, there are a few more parameters I need to explain.
experiment = 'local-experiment' # name of the experiment folder
This is the name of your folder. Change it to whatever you wish. I’ve
set mine to the name local-experiment
. I’ve added local*/
to the
.gitignore file as well because these output folders tend to be quite
large.
The next few lines after that are the malaria parameters. You can change these as you wish; I’ve selected the ones in the current script for a relatively quick run (less than five minutes on a local machine without parallelisation).
To run the scenarios, set the parameters in launch.R to the following:
# run scenarios, extract the data, or both
do = list(
run = TRUE,
extract = FALSE,
example = FALSE
)
Then click on the ‘Source’ button to execute the script. It will come up with an error; this is expected! What you do now is open to a separate terminal window and do the following:
# Log on to setonix
ssh [email protected]
# Navigate to the folder where the outputs are saved
cd experiment-folder
# Change file permissions so you can execute the script
chmod +x start_array_job.sh
# Execute the script
sbatch start_array_job.sh
OpenMalaria will then run on your XML files and save all the output in
the txt/
folder.
Now go back to your RStudio IDE and change the parameters in launch.R to the following:
# run scenarios, extract the data, or both
do = list(
run = FALSE, # This is now FALSE!
extract = TRUE, # This is now TRUE!
example = FALSE
)
click on the ‘Source’ button again to execute the script. You should now
have output.csv
and scenarios.csv
files in your output folder. Well
done!
To run the workflow locally, go to the last part of the
if (do$run == TRUE) { ... }
part of the script. Navigate to the last
section:
message("Running scenarios...")
run_HPC(scenarios, experiment, om, NTASKS)
# run_local(scenarios, experiment, om)
then uncomment the run_local()
line and comment out the run_HPC()
line.
- I run MacOS 14.5.