Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture profiling statistics #1110

Merged
merged 48 commits into from
Nov 27, 2023
Merged

Conversation

willGraham01
Copy link
Collaborator

@willGraham01 willGraham01 commented Sep 15, 2023

Almost closes issue #686 - we just need to decide on the parameters that we should pass to the scale_run script itself, and to capture the extra statistics that are listed in the issue itself.

ipysession files are no longer produced

This alleviates the issue of filesizes of the .pyisession outputs. They are no longer pushed to the profiling repository - instead we push the stats.json files (discussed below) and optionally the rendered HTML files of the session results.

The HTML outputs are significantly smaller than the raw .pyisession outputs (of the order 10/100s kBs rather than 100s/1000s MBs).

Profiling captures additional information on top of the profiling output

The profiling runs are now setup to output a .stats.json file upon completion, which captures information from the profiling session as well as additional information about the simulation itself. The output file is just a json file and can be parsed as such. Currently we are capturing:

  • Start time of the profiling session
  • Duration of the profiling session
  • CPU time used by the profiled program
  • Information about the final state of the population DataFrame from the simulation:
    • Number of rows and columns
    • Size in MBs
    • Number of times it was extended
  • Disk I/O stats:
    • Number of read/writes
    • Size (in MB) of read/writes
    • Time spent reading/writing from disk
  • The name of the HTML file produced, provided it was produced

There is scope to include additional statistics that are of interest, which can be done by adding additional fields to the dictionaries that are produced in the record_XXX_statistics functions.

The run_profiling.py script can also be passed the --html flag to also produce a HTML file containing the results of the profiling session (as rendered by pyinstrument) if we so choose.
Additionally, run_profiling.py can also be called with an --additional-stats flag, which can be used to pass shell or workflow variables to the program which will then also be included in the .stats.json output. For example,

python run_profiling.py --additional-stats foo="bar bar" "fave number"=5

will result in the following key/value pairs appearing in the output:

"foo": "bar bar",
"fave number": "5",

Sample output:

Run on commit 3f3507a23483fa5e373d9b518968aefb8c84cbff, manually passing the sha and trigger keys.

tox -vv -e profile -- --additional-stats \
          sha=3f3507a23483fa5e373d9b518968aefb8c84cbff \
          trigger="manual trigger"

producing the .stats.json file:

{
  "sha": "3f3507a23483fa5e373d9b518968aefb8c84cbff",
  "trigger": "manual test",
  "html_output": "manual_test_0_3f3507a23483fa5e373d9b518968aefb8c84cbff.html",
  "start_time": 1695219849.2109582,
  "duration": 425.1599588394165,
  "cpu_time": 424.63759634,
  "pop_df_rows": 51000,
  "pop_df_cols": 448,
  "pop_df_mem_mb": 107.226341,
  "pop_df_times_extended": 1,
  "disk_reads": 96,
  "disk_writes": 6097,
  "disk_read_MB": 5.726208,
  "disk_write_MB": 441.458688,
  "disk_read_s": 0.031,
  "disk_write_s": 2.343
}

@willGraham01 willGraham01 force-pushed the wgraham/capture-profiling-stats branch 2 times, most recently from 65bcd66 to a9b3c55 Compare September 15, 2023 15:21
@willGraham01 willGraham01 changed the base branch from master to wgraham/profiling-trigger-on-comment September 18, 2023 08:59
@willGraham01 willGraham01 changed the base branch from wgraham/profiling-trigger-on-comment to master September 18, 2023 08:59
@willGraham01 willGraham01 marked this pull request as ready for review September 18, 2023 12:37
@willGraham01 willGraham01 force-pushed the wgraham/capture-profiling-stats branch 3 times, most recently from 274a55a to 3f3507a Compare September 20, 2023 14:04
@willGraham01 willGraham01 force-pushed the wgraham/capture-profiling-stats branch from 177b3a4 to 72d9cd6 Compare September 21, 2023 07:52
Resolve conflicts in requirements/dev.in and
regenerate requirements/dev.txt using pip-compile
@matt-graham
Copy link
Collaborator

matt-graham commented Nov 20, 2023

I've updated branch to change default values of profiling run parameters to be for 5 years, 50k initial population in mode 2, and now have the arguments to scale_run saved to profiling results directory as a JSON file along with the actual simulation outputs (by default only log output, with logging level set to WARNING to keep minimal). I've also done a bit of refactoring to make the run_profiling script a bit more flexible for using at the command line by exposing some of the key simulation parameters (e.g. simulation period, initial population) and added back an option (disabled by default) to save raw .pyisession output for when running locally. From my perspective assuming tests pass this is now good to go in. @tamuri tagging you in case you want to review given I've made a series of changes on top of Will's; if not then we can merge this if tests pass.

@matt-graham matt-graham merged commit e7cb080 into master Nov 27, 2023
@matt-graham matt-graham deleted the wgraham/capture-profiling-stats branch November 27, 2023 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants