Capture profiling statistics #1110

willGraham01 · 2023-09-15T11:20:49Z

Almost closes issue #686 - we just need to decide on the parameters that we should pass to the scale_run script itself, and to capture the extra statistics that are listed in the issue itself.

`ipysession` files are no longer produced

This alleviates the issue of filesizes of the .pyisession outputs. They are no longer pushed to the profiling repository - instead we push the stats.json files (discussed below) and optionally the rendered HTML files of the session results.

The HTML outputs are significantly smaller than the raw .pyisession outputs (of the order 10/100s kBs rather than 100s/1000s MBs).

Profiling captures additional information on top of the profiling output

The profiling runs are now setup to output a .stats.json file upon completion, which captures information from the profiling session as well as additional information about the simulation itself. The output file is just a json file and can be parsed as such. Currently we are capturing:

Start time of the profiling session
Duration of the profiling session
CPU time used by the profiled program
Information about the final state of the population DataFrame from the simulation:
- Number of rows and columns
- Size in MBs
- Number of times it was extended
Disk I/O stats:
- Number of read/writes
- Size (in MB) of read/writes
- Time spent reading/writing from disk
The name of the HTML file produced, provided it was produced

There is scope to include additional statistics that are of interest, which can be done by adding additional fields to the dictionaries that are produced in the record_XXX_statistics functions.

The run_profiling.py script can also be passed the --html flag to also produce a HTML file containing the results of the profiling session (as rendered by pyinstrument) if we so choose.
Additionally, run_profiling.py can also be called with an --additional-stats flag, which can be used to pass shell or workflow variables to the program which will then also be included in the .stats.json output. For example,

python run_profiling.py --additional-stats foo="bar bar" "fave number"=5

will result in the following key/value pairs appearing in the output:

"foo": "bar bar",
"fave number": "5",

Sample output:

Run on commit 3f3507a23483fa5e373d9b518968aefb8c84cbff, manually passing the sha and trigger keys.

tox -vv -e profile -- --additional-stats \
          sha=3f3507a23483fa5e373d9b518968aefb8c84cbff \
          trigger="manual trigger"

producing the .stats.json file:

{
  "sha": "3f3507a23483fa5e373d9b518968aefb8c84cbff",
  "trigger": "manual test",
  "html_output": "manual_test_0_3f3507a23483fa5e373d9b518968aefb8c84cbff.html",
  "start_time": 1695219849.2109582,
  "duration": 425.1599588394165,
  "cpu_time": 424.63759634,
  "pop_df_rows": 51000,
  "pop_df_cols": 448,
  "pop_df_mem_mb": 107.226341,
  "pop_df_times_extended": 1,
  "disk_reads": 96,
  "disk_writes": 6097,
  "disk_read_MB": 5.726208,
  "disk_write_MB": 441.458688,
  "disk_read_s": 0.031,
  "disk_write_s": 2.343
}

src/scripts/profiling/run_profiling.py

during profiling.

Co-authored-by: Matt Graham <[email protected]>

…tats

Resolve conflicts in requirements/dev.in and regenerate requirements/dev.txt using pip-compile

Also wrap ignore_warnings into scale_run function

Only work when within a package

Decrease chance of keyword arg name collisions Handle missing value in command-line args Remove unnecessary logic for dealing with missing args Correct type annotations

matt-graham · 2023-11-20T18:39:37Z

I've updated branch to change default values of profiling run parameters to be for 5 years, 50k initial population in mode 2, and now have the arguments to scale_run saved to profiling results directory as a JSON file along with the actual simulation outputs (by default only log output, with logging level set to WARNING to keep minimal). I've also done a bit of refactoring to make the run_profiling script a bit more flexible for using at the command line by exposing some of the key simulation parameters (e.g. simulation period, initial population) and added back an option (disabled by default) to save raw .pyisession output for when running locally. From my perspective assuming tests pass this is now good to go in. @tamuri tagging you in case you want to review given I've made a series of changes on top of Will's; if not then we can merge this if tests pass.

willGraham01 force-pushed the wgraham/capture-profiling-stats branch 2 times, most recently from 65bcd66 to a9b3c55 Compare September 15, 2023 15:21

willGraham01 changed the base branch from master to wgraham/profiling-trigger-on-comment September 18, 2023 08:59

willGraham01 changed the base branch from wgraham/profiling-trigger-on-comment to master September 18, 2023 08:59

willGraham01 mentioned this pull request Sep 18, 2023

Tracking profiling run results #686

Closed

willGraham01 marked this pull request as ready for review September 18, 2023 12:37

matt-graham reviewed Sep 18, 2023

View reviewed changes

src/scripts/profiling/run_profiling.py Outdated Show resolved Hide resolved

src/scripts/profiling/run_profiling.py Outdated Show resolved Hide resolved

willGraham01 force-pushed the wgraham/capture-profiling-stats branch 3 times, most recently from 274a55a to 3f3507a Compare September 20, 2023 14:04

willGraham01 requested a review from matt-graham September 20, 2023 15:20

willGraham01 and others added 15 commits September 21, 2023 08:48

Add placeholder function for recording stats

ea9c0f2

scale_run returns the simulation object

096e670

Record population dataframe size, memory usage in MB, and times extended

e664d83

Sort include

449c7cf

Force ignore of profiling results directory, and force typing of stats

d7a38a1

Avoid list values in stats file

dfdcbc5

Isort pass number 2

4dcc53e

Add psutil to requirements to capture disk IO

f613785

during profiling.

Apply suggestions from code review

e3d76ab

Co-authored-by: Matt Graham <[email protected]>

Capture disk statistics too

9a0d3b6

Capture disk I/O status

6bc54e7

Alter output format to optionally include HTML, and default-include s…

c467e91

…tats

Avoid creation of pyisession file, manage through stats file

b3e5226

Lint file

8e9aca7

Obey the linter

72d9cd6

willGraham01 force-pushed the wgraham/capture-profiling-stats branch from 177b3a4 to 72d9cd6 Compare September 21, 2023 07:52

isort hook

0381781

willGraham01 mentioned this pull request Sep 21, 2023

Update to build website without lossless pyisession files. UCL/TLOmodel-profiling#15

Merged

Merge branch 'master' into wgraham/capture-profiling-stats

b5d789b

Resolve conflicts in requirements/dev.in and regenerate requirements/dev.txt using pip-compile

matt-graham added 24 commits November 20, 2023 09:31

Inline help string and use default arg formatter

f2027b6

Simplify paths and sim outputs to profiling directory

c607463

Use relative import

5682dcc

Use passed directory rather than constant

5f0ae13

Factor out saving arguments to JSON

2f0c377

Also wrap ignore_warnings into scale_run function

Use ignore_warnings argument

afce507

Use relative import

f049c92

Change save args function call signature

6688373

Refactor run_profiling output writing

b10e544

Revert relative imports

0b40296

Only work when within a package

Refactoring of run_profiling

3f04f30

Decrease chance of keyword arg name collisions Handle missing value in command-line args Remove unnecessary logic for dealing with missing args Correct type annotations

Make logging of population checksum optional

0db27cf

Change profiling run arguments

14f86e4

Correct scale_run type annotations

f29e7cc

Add option to save raw profiler output

ca63ff2

Expose key simulation params from run_profiling

d38cea2

Parse key-value pairs directly in parse_args

88486db

Rename CLI argument for consistency with function

04a2e89

Unpack CLI args in to function call

19825d6

Add basic check to key-value parsing

81debe4

Use variable to set profiling results directory in workflow

d7d4be6

Exclude profiling_results from distributions

44dbd84

Remove whitespace to satisfy isort

fa8f73d

Merge branch 'master' into wgraham/capture-profiling-stats

7bb8012

matt-graham added 3 commits November 22, 2023 17:59

Ensure html_output Path converted to string to avoid JSON error

14c2dd6

Increase scheduled profiling job timeout

f5a203f

Merge branch 'master' into wgraham/capture-profiling-stats

c3f60a3

matt-graham merged commit e7cb080 into master Nov 27, 2023

matt-graham deleted the wgraham/capture-profiling-stats branch November 27, 2023 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture profiling statistics #1110

Capture profiling statistics #1110

willGraham01 commented Sep 15, 2023 •

edited

Loading

matt-graham commented Nov 20, 2023 •

edited

Loading

Capture profiling statistics #1110

Capture profiling statistics #1110

Conversation

willGraham01 commented Sep 15, 2023 • edited Loading

ipysession files are no longer produced

Profiling captures additional information on top of the profiling output

Sample output:

matt-graham commented Nov 20, 2023 • edited Loading

willGraham01 commented Sep 15, 2023 •

edited

Loading

`ipysession` files are no longer produced

matt-graham commented Nov 20, 2023 •

edited

Loading