Parallel CV in sjSDM_cv: sub_tune_samples not passed correctly to worker nodes #155

jannebor · 2025-02-04T16:14:37Z

When running sjSDM_cv in parallel (using parallel::parLapply), it appears that the sub_tune_samples object is not passed to each worker. Instead, each worker seems to handle tune_samples in tune_func() with t always being 1 (i.e. nrow(sub_tune_samples)).
Line 230-235:

for(i in 1:length(unique(blocks_run))){
      ind = blocks_run == unique(blocks_run)[i]
      sub_tune_samples = tune_samples[ind, ]
      
      result_list[[i]] = parallel::parLapply(cl, 1:nrow(sub_tune_samples), tune_func)
    }

As a result, it seems to me that the models are trained with identical tuning parameters within each cross-validation fold, leading to identical CV results within each fold regardless of the specified tuning grid.

Reproducible example:

set.seed(42)
community <- simulate_SDM(sites = 100, species = 10, env = 3, se = TRUE)
Env <- community$env_weights
Occ <- community$response
SP <- matrix(rnorm(200, 0, 0.3), 100, 2) # spatial coordinates (no effect on species occurences)

tune_cv = sjSDM::sjSDM_cv(Y = Occ,
                        env = Env,
                        CV = 5,
                        tune = "random",
                        tune_steps = 10L,
                        step_size = 5L,
                        device = "cpu",
                        n_cores = 10L,
                        sampling = 100L,
                        learning_rate = 0.01,
                        iter = 100L)

tune_cv

In sjSDM_cv I suggest adding tune_samples as additional variable to tune_func() as a quick-fix.

Line 118:

tune_func = function(t, tune_samples){...}

and when applying the function:
Line 215-217:

for(t in 1:nrow(tune_samples)){
      result[[t]] = tune_func(t, tune_samples)
    }

Line 230-235:

for(i in 1:length(unique(blocks_run))){
      ind = blocks_run == unique(blocks_run)[i]
      sub_tune_samples = tune_samples[ind, ]
      message(i)
      result_list[[i]] = parallel::parLapply(cl, 1:nrow(sub_tune_samples), tune_func, tune_samples = sub_tune_samples)
    }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel CV in sjSDM_cv: sub_tune_samples not passed correctly to worker nodes #155

Parallel CV in sjSDM_cv: sub_tune_samples not passed correctly to worker nodes #155

jannebor commented Feb 4, 2025

Parallel CV in sjSDM_cv: sub_tune_samples not passed correctly to worker nodes #155

Parallel CV in sjSDM_cv: sub_tune_samples not passed correctly to worker nodes #155

Comments

jannebor commented Feb 4, 2025