Skip to content

Commit a932ea1

Browse files
authored
replace the snow example for MPI with R
1 parent db99a41 commit a932ea1

File tree

1 file changed

+75
-111
lines changed
  • mkdocs-project-dir/docs/Software_Guides

1 file changed

+75
-111
lines changed

mkdocs-project-dir/docs/Software_Guides/R.md

Lines changed: 75 additions & 111 deletions
Original file line numberDiff line numberDiff line change
@@ -127,151 +127,115 @@ qsub run-R.sh
127127

128128
## Example multi-node parallel job using Rmpi and snow
129129

130-
This script uses Rmpi and snow to allow it to run across multiple nodes using MPI.
130+
This script uses Rmpi and doMPI to allow it to run across multiple nodes using MPI.
131+
132+
To try this example, save this job script and the following R script (`doMPI_example.R`) in your home directory.
131133

132134
```
133135
#!/bin/bash -l
134136
135-
# Example jobscript to run an R MPI parallel job
137+
# Batch script to run an MPI parallel R job using the doMPI package
138+
# with the upgraded software stack under SGE with OpenMPI.
139+
140+
# R Version 4.4.2 with doMPI/Rmpi
136141
137142
# Request ten minutes of wallclock time (format hours:minutes:seconds).
138143
#$ -l h_rt=0:10:0
139144
140145
# Request 1 gigabyte of RAM per process.
141146
#$ -l mem=1G
142147
143-
# Request 15 gigabytes of TMPDIR space per node (default is 10 GB)
148+
# Set tmpfs to 15 gigabyte of TMPDIR space (default is 10 GB)
149+
# Remove this for clusters without temporary filesystems, e.g. Kathleen
144150
#$ -l tmpfs=15G
145151
146152
# Set the name of the job.
147-
#$ -N snow_monte_carlo
153+
#$ -N doMPI-Ex1-12
148154
149-
# Select the MPI parallel environment with 32 processes
150-
#$ -pe mpi 32
155+
# Select the MPI parallel environment with 12 processes, the maximum possible
156+
# on Myriad would be 36. On Kathleen, request at least 41 processes.
157+
#$ -pe mpi 12
151158
152-
# Set the working directory to somewhere in your scratch space. This is
153-
# necessary because the compute nodes cannot write to your $HOME
154-
# NOTE: this directory must exist.
159+
# Set the working directory. In this case, we use the home directory.
155160
# Replace "<your_UCL_id>" with your UCL user ID
156-
#$ -wd /home/<your_UCL_id>/Scratch/R_output
161+
#$ -wd /home/<your_UCL_id>
157162
158-
# Load the R module
159163
module -f unload compilers mpi gcc-libs
160-
module load r/recommended
164+
module load r/r-4.4.2_bc-3.20
161165
162-
# Copy example files in to the working directory (not necessary if already there)
163-
cp ~/R/Examples/snow_example.R .
164-
cp ~/R/Examples/monte_carlo.R .
166+
# Run our MPI job. GERun is a wrapper that launches MPI jobs on UCL clusters.
165167
166-
# Run our MPI job. GERun is our wrapper for mpirun, which launches MPI jobs
167-
gerun RMPISNOW < snow_example.R > snow.out.${JOB_ID}
168+
gerun Rscript doMPI_example.R
168169
```
169-
The output file is saved in `$HOME/Scratch/R_examples/snow/snow.out.${JOB_ID}`.
170+
The output is saved in `$HOME/doMPI-Ex1-12.o${JOB_ID}`.
170171

171-
If your jobscript is called `run-R-snow.sh` then your job submission command would be:
172+
If your jobscript is called `run-R-doMPI.sh` then your job submission command would be:
172173
```
173-
qsub run-R-snow.sh
174+
qsub run-R-doMPI.sh
174175
```
175176

176177
### Example R script using Rmpi and snow
177178

178-
This R script has been written to use Rmpi and snow and can be used with the above jobscript. It is `snow_example.R` above.
179+
This R script has been written to use Rmpi and doMPI and can be used with the above jobscript. It is `doMPI_example.R` above.
179180

180181
```
181-
#Load the snow and random number package.
182-
library(snow)
183-
library(Rmpi)
184-
185-
# This example uses the already installed LEcuyers RNG library(rlecuyer)
186-
library(rlecuyer)
187-
188-
# Set up our input/output
189-
source('./monte_carlo.R')
190-
sink('./monte_carlo_output.txt')
191-
192-
# Get a reference to our snow cluster that has been set up by the RMPISNOW
193-
# script.
194-
cl <- getMPIcluster ()
195-
196-
# Display info about each process in the cluster
197-
print(clusterCall(cl, function() Sys.info()))
198-
199-
# Load the random number package on each R process
200-
clusterEvalQ (cl, library (rlecuyer))
201-
202-
# Generate a seed for the pseudorandom number generator, unique to each
203-
# processor in the cluster.
204-
205-
#Uncomment below line for default (unchanging) random number seed.
206-
#clusterSetupRNG(cl, type = 'RNGstream')
207-
208-
#The lines below set up a time-based random number seed. Note that
209-
#this only demonstrates the virtues of changing the seed; no guarantee
210-
#is made that this seed is at all useful. Comment out if you uncomment
211-
#the above line.
212-
s <- sum(strtoi(charToRaw(date()), base = 32))
213-
clusterSetupRNGstream(cl, seed=rep(s,6))
214-
215-
#Choose which of the following blocks best fit your own needs.
216-
217-
# BLOCK 1
218-
# Set up the input to our Monte Carlo function.
219-
# Input is identical across the batch, only RNG seed has changed.
220-
# For this example, both clusters will roll one die.
221-
222-
nrolls <- 2
223-
print("Roll the dice once...")
224-
output <- clusterCall(cl, monte_carlo, nrolls)
225-
output
226-
print("Roll the dice again...")
227-
output <- clusterCall(cl, monte_carlo, nrolls)
228-
output
229-
230-
# Output should show the results of two rolls of a six-sided die.
231-
232-
#BLOCK 2
233-
# Input is different for each processor
234-
print("Second example: coin flip plus 3 dice")
235-
input <- array(1:2) # Set up array of inputs, with each entry
236-
input[1] <- 1 # corresponding to one processor.
237-
input[2] <- 3
238-
parameters <- array(1:2) # Set up inputs that will be used by each cluster.
239-
parameters[1] <- 2 # These will be passed to monte_carlo as its
240-
parameters[2] <- 6 # second argument.
241-
output <- clusterApply(cl, input, monte_carlo, parameters)
242-
243-
# Output should show the results of a coin flip and the roll of three
244-
# six-sided die.
245-
246-
# Output the output.
247-
output
248-
249-
inputStrings <- array(1:2)
250-
inputStrings[1] <- 'abc'
251-
inputStrings[2] <- 'def'
252-
output <- clusterApply(cl, inputStrings, paste, 'foo')
253-
output
254-
255-
#clusterEvalQ(cl, sinkWorkerOutput("snow_monte_carlo.out"))
256-
257-
# Clean up the cluster and release the relevant resources.
258-
stopCluster(cl)
259-
sink()
260-
261-
mpi.quit()
262-
```
182+
# This example uses one of the historic datasets from the HistData package
183+
# and is from Princeton University
263184
264-
This is `monte_carlo.R` which is called by `snow_example.R`:
265-
```
266-
monte_carlo <- function(x, numsides=6){
267-
streamname <- .lec.GetStreams ()
268-
dice <- .lec.uniform.int(streamname[1], n = 1, a=1, b=numsides)
269-
outp <- sum(dice)
270-
return(outp)
185+
# load the Rmpi, doMPI and HistData packages - already installed on UCL clusters.
186+
187+
library (Rmpi)
188+
library (doMPI)
189+
library (HistData)
190+
191+
# This is Galton's data mapping the height of sons to their fathers
192+
# ~900 rows of 2 columns
193+
data (Galton)
194+
195+
# Set up the cluster
196+
197+
cl <- startMPIcluster ()
198+
registerDoMPI (cl)
199+
200+
# Splitting the Galton data frame (mapped to df) into
201+
# units of 100 rows max.
202+
203+
df <- Galton
204+
n <- 100
205+
nr <- nrow (df)
206+
207+
# uses rep to specify the break points without having to manually call each
208+
split_df <- split (df, rep(1:ceiling (nr/n), each=n, length.out=nr))
209+
210+
# We might want to know the ratio of the parent's height vs the child's
211+
212+
# foreach takes parameters and passes them to the MPI worker processes
213+
# using the dimension of the parameter with the longest length (only one here)
214+
# .combine= specifies a function that will be used to combine the results, i.e.
215+
# cbind, rbind, c, etc.
216+
217+
df$results <- foreach(i=1:length(split_df), .combine='rbind') %dopar% {
218+
219+
# We take the split lists of data frame, add a column called $ratio
220+
# and assign the result just as we would with a non-parallelized operation
221+
222+
result <- split_df[[i]]$parent/split_df[[i]]$child
223+
as.data.frame (result)
224+
225+
# this result gets rbind-ed together as a column on our df.
271226
}
227+
228+
# Take a look at the df we got back that we could continue working on if we wanted to
229+
head (df)
230+
231+
# close the cluster to properly free up the MPI resources so GE can see that
232+
# the job has finished.
233+
234+
closeCluster (cl)
235+
Rmpi::mpi.quit ()
272236
```
273237

274-
This example was based on [SHARCNET's Using R and MPI](https://web.archive.org/web/20190107091729/https://www.sharcnet.ca/help/index.php/Using_R_and_MPI).
238+
This example was based on Princeton's [Using R on the Research Computing Clusters](https://github.com/PrincetonUniversity/HPC_R_Workshop/blob/6ddac56324021277f163789f7f501fa82d92deca/04_doMPI/04_doMPI.R) repository.
275239

276240
## Using your own R packages
277241

0 commit comments

Comments
 (0)