Skip to content

Inconsistent performance @ production #498

@MTCam

Description

@MTCam

This is an issue-in-the-making. First, the automated timings on Lassen are catching some inconsistent results:

Screen Shot 2021-08-30 at 6 56 25 AM

Note that after about last Friday - the timing results begin to vary quite a bit between runs (not normal for this code).

Update: The issue seems to have been resolved by switching to the batch queue, suggesting that the problem was bad nodes or bad devices in the debug queue.

The issue does not appear to be connected to any particular Lassen node; spikes were observed on both lassen34 and lassen36 from the debug queue.

  • program capture (@inducer)
    • Added stdout capture to timing data
    • arraycontext branch to capture the pytato program
    • TODO: capture pytato program during timing runs
  • code history checks
    • MIRGE-Com level development did not seem to cause this: observed the spikes with historical versions of MIRGE-Com
    • TODO: Sub-packages development still needs to be checked
  • TODO: small example to see if it can be quickly reproduced (turn-around time is about 30 minutes for the nozzle-proper).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions