Skip to content

Commit f1974f8

Browse files
committed
Looking into seeds, chat with Tom about it, deleted NormalParams and UniformParams
1 parent dc0f403 commit f1974f8

10 files changed

+38954
-38997
lines changed

evaluation/posts/2024_05_23/index.qmd

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ But can't decide how best to display it - or if it is worth trying to display it
180180

181181
Updated the files accordingly.
182182

183-
### 14.19- Reproduction
183+
### 14.19-15.12 Reproduction
184184

185185
Original:
186186

@@ -271,6 +271,42 @@ However, this definitely has **not** fixed the issue! Still varying - and I woul
271271

272272
<mark>do we need to focus on the interpretations and whether they hold? but we didn't really want to do that as that is not our focus? but in this case, is it, if we want to know if we've reproduced, but can't actually necessarily get the exact same results due to randomness?</mark>
273273

274+
<mark>look at tom's more recent examples where he has added seeds</mark>
275+
276+
<mark>test! idea is to check that you are getting the same results between runs</mark>
277+
278+
### 15.23- Reproduction
279+
280+
::: {.callout-tip}
281+
## Random seeds
282+
283+
At the moment, I would describe this model as reusable but not reproducible. It was really relatively quick to get the code up and running and see similar results to the paper. But in terms of getting it to match up to the paper, it is pretty much impossible, although I will try to get their via setting random seeds then running it lots of times to try and get a close match.
284+
285+
This is important for STARS framework improvement - that controlling randomness is important for reproducibility. It can also be handy for someone reusing a model, as they may wish to reproduce just to verify that its running properly for them.
286+
287+
And so for each of the studies, if this is a recurring thing that comes up, its seeing where and how to add random seeds in different models and languages, to enable reproducibility.
288+
:::
289+
290+
From this [Stack Overflow post](https://stackoverflow.com/questions/59105921/why-is-numpy-random-seed-not-remaining-fixed-but-randomstate-is-when-run-in-para), I'm suspcious that perhaps the issue is that I am setting the random state as 1 and 2, which (a) would imply it's making it the same between each run, but (b) all using the same stream in parallel processing. But it's set using `RandomState`.
291+
292+
Trying to google around use of seeds with parallel processing.
293+
294+
Had a chat with Tom about it and he suggested:
295+
296+
* He pointed out that NormalParams is not being used, and that it would need to be setting a seed in the class Normal() when you use it in Scenario - e.g. extra parameter at end of here -
297+
* `requiring_inpatient_random: Distribution = Uniform(0.0, 1.0)`
298+
* `time_pos_before_inpatient: Distribution = Uniform(3,7)`
299+
* Good example of how set up, would recommend this - https://pythonhealthdatascience.github.io/stars-simpy-example-docs/content/02_model_code/04_model.html#distribution-classes
300+
* LLM model generation of seeds
301+
302+
Need seperate random number streams for each time make a distribution to use it.
303+
304+
I thought best option is to switch to using it how it is uses in the [treat-sim model docs](https://pythonhealthdatascience.github.io/stars-simpy-example-docs/content/02_model_code/04_model.html#distribution-classes), as focus here is just modifying code to allow it to reproduce each run.
305+
306+
So next things I did -
307+
308+
* Delete the NormalParams and UniformParams classes as not used - checked if still run fine which it did.
309+
274310
## Timings
275311

276312
```{python}
@@ -286,7 +322,9 @@ used_to_date = 73
286322
times = [
287323
('10.24', '11.00'),
288324
('12.10', '12.16'),
289-
('12.19', '12.29')]
325+
('12.19', '12.29'),
326+
('13.26', '13.52'),
327+
('14.19', '15.12')]
290328
# --------------------------------------------------------------
291329
292330
FMT = '%H.%M'
@@ -333,6 +371,7 @@ Protocol:
333371
* Still feels unclear on when we are setting up the website (showing article, showing code). Decision I have made from trying to display the code is that actually, the simplest and clearest thing is to let people explore the code themselves (just direct them to the right folder on the GitHub), whilst for the article, it takes one minute to embed the PDFs, so just have that step (plus adding link to where the scripts are) when upload the articles, and call it a day.
334372
* To do: move download sources from logbook to original study page (and modify as appropraite in protocol)
335373
* Add suggestion to save outputs as go, as and when appropriate, as it's helpful to be able to include images in the logbook, for example. So perhaps, copying images from output into the logbook folder images. **Yes.** I've started copying over and storing within the logbook folder, and just focusisng on e.g. the figure I was looking at and not copying over all the data associated.
374+
* Can I ask for advice on issues with reproduction from rest of team? Would presume so, and that include that in timing and record what is discussed and said.
336375

337376
Thoughts as reading through code:
338377

reproduction/example.ipynb

Lines changed: 4 additions & 25 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)