Skip to content

[JOSS] Comments on the functionality #175

Closed
@rcannood

Description

@rcannood

This issue pertains to my review of the simChef's functionality as part of my review of the JOSS submission of this tool at openjournals/joss-reviews#6156.

I apologize for being quite critical about certain statements in the paper (discussed below), but I do think that explicitly showcasing each of the topics below would be of immense value to an inexperienced user wishing to perform the kinds of studies allowed by simChef.

Functionality

Have the functional claims of the software been confirmed?

The paper states:

While simChef’s core functionality focuses on computability (C) – encompassing efficient usage of computational resources, ease of user interaction, reproducibility, and documentation – we emphasize the importance of predictability (P) and stability (S) in data science simulations.

The documentation also states:

Flexible and seamless integration with distributed computation, caching, checkpointing, and debugging tools

There is too little information on how to enable distributed computation, reproducibility, caching, checkpointing and debugging when using simChef.

I found some information in Setting Up Your Simulation Study that suggests you can use hpc = TRUE and init_renv = TRUE to enable distributed computations and reproducibility. However, at this stage I find it hard to argue that the package seamlessly integrates with any of these concepts, since they are not discussed enough in the documentation.

Would it be possible to create separate articles in the documentation to showcase how to set up distributed computation, set up reproducibility, how to use caching & checkpointing, and how to debug runs? Or would you have an alternative solution?

I'm certain that you as developers know exactly how to do this with simChef, but currently the documentation does little for novice simChef users to learn how to do any of these things from scratch.

In the context of reproducibility I would expect the rendered report to contain information on software versions of the used environments.

W.r.t. caching and debugging tools I'd expect to be able to see the execution time, CPU usage, memory usage and error messages when using an HPC as backend. What happens when one of the executions fails? How can I debug what went wrong during a failed run?

Performance

If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

This ties in with the previous comment. The paper mentions efficient usage of computational resources, but some simulation studies will require using an HPC to be able to run the analysis in a decent time frame.

Checklist

  • Functionality - Showcase distributed computation
  • Functionality - Showcase reproducibility with renv
  • Functionality - Showcase debugging tools when something goes wrong

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions