Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordinate Sampler Thread across a Checkpoint/Restore #18541

Merged
merged 2 commits into from
Dec 13, 2023

Conversation

dsouzai
Copy link
Contributor

@dsouzai dsouzai commented Nov 30, 2023

  • Coordinate the Sampler Thread across a Checkpoint/Restore
  • Reset the Start and Elapsed Time in the Restore Hook

Part of #16853

@dsouzai dsouzai added comp:jit criu Used to track CRIU snapshot related work labels Nov 30, 2023
@dsouzai dsouzai mentioned this pull request Nov 30, 2023
30 tasks
@dsouzai
Copy link
Contributor Author

dsouzai commented Nov 30, 2023

@mpirvu could you please review?

@mpirvu mpirvu self-assigned this Nov 30, 2023
@dsouzai
Copy link
Contributor Author

dsouzai commented Dec 8, 2023

We will need to update the documentation regarding -XsamplingExpirationTime, but because the CRIU feature hasn't GA'd yet, it should be fairly straightforward. Given that 0.43 has already split, this is something we'll need to do for 0.44.

@mpirvu good for review again. Testing shows everything behaves as expected:

#CHECKPOINT RESTORE: Preparing for checkpoint
#CHECKPOINT RESTORE: Preparing to compile methods for checkpoint
...
#CHECKPOINT RESTORE: Done compiling methods for checkpoint
#CHECKPOINT RESTORE: Preparing to suspend threads for checkpoint
#CHECKPOINT RESTORE: Finished suspending threads for checkpoint
#CHECKPOINT RESTORE: Suspending Sampler Thread for Checkpoint
#CHECKPOINT RESTORE: Ready for checkpoint
#CHECKPOINT RESTORE: Preparing for restore
#CHECKPOINT RESTORE: Start and elapsed time: startTime=1245049424, elapsedTime=  2740
#CHECKPOINT RESTORE: Reset start and elapsed time: startTime=1245074324, elapsedTime=  2740
#CHECKPOINT RESTORE: Resuming Sampler Thread from Checkpoint
#CHECKPOINT RESTORE: Resetting Sampling Thread Lifetime State
#CHECKPOINT RESTORE: Ready for restore

Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have a couple of comments inline.
Also, there is the potential for the checkpoint to wait for an entire sampler period. If the checkpointing thread sends the interrupt when the sampler is not sleeping, the sampler will continue to run in its loop and then start sleeping. It will pick up the checkpoint intention after the sleep is over. Given that the sleep time is ~10 ms, this should not be a big problem.

doc/compiler/control/OptionsPostRestore.md Outdated Show resolved Hide resolved
runtime/compiler/control/HookedByTheJit.cpp Show resolved Hide resolved
@mpirvu
Copy link
Contributor

mpirvu commented Dec 12, 2023

jenkins test sanity all jdk17

@dsouzai
Copy link
Contributor Author

dsouzai commented Dec 12, 2023

I believe AIX failure is #8625

@dsouzai
Copy link
Contributor Author

dsouzai commented Dec 13, 2023

jenkins test sanity aix jdk17

@mpirvu mpirvu merged commit c51c0a0 into eclipse-openj9:master Dec 13, 2023
@dsouzai dsouzai deleted the coordsampler branch April 3, 2024 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jit criu Used to track CRIU snapshot related work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants