Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger GC when code cache is exhausted #21059

Open
tajila opened this issue Feb 3, 2025 · 11 comments
Open

Trigger GC when code cache is exhausted #21059

tajila opened this issue Feb 3, 2025 · 11 comments

Comments

@tajila
Copy link
Contributor

tajila commented Feb 3, 2025

We have seen cases in customer workloads where apps that generate a lot of classes cause code cache exhaustion. When this occurs no new compilations can succeed and this may cause degradation in application performance.

One way to work around this is to increase the size of the code cache, but this may not be the best approach.

An alternative could be to trigger a class unloading GC when the code cache is exhausted. There are a few ways of going about this:

  1. JIT sets a flag once the last code cache is allocated. The GC then does class-unloading on the next GC.

  2. GC monitors the amount of free memory in the code cache segments, once its below a threshold class unloading is triggered.

@tajila
Copy link
Contributor Author

tajila commented Feb 3, 2025

@mpirvu
Copy link
Contributor

mpirvu commented Feb 3, 2025

J9CodeCacheManager has this method:

   /**
    * @brief Answers whether the total code cache capacity is nearly used up.
    *        This code is executed by compilation thread and by application
    *        threads when trying to induce a profiling compilation. Acquires
    *        codeCacheList.mutex.
    *
    * @return true if capacity nearly reached; false otherwise.
    */
   bool almostOutOfCodeCache();

It's main purpose was to avoid profiling compilation when the code cache is running very low because we don't want to be stuck in profiling mode forever.
This flag is set once and never reset.
It is set when all code caches have been already allocated and when none of the code caches have a "hole" bigger than 256 KB.

I can change the code to add a similar flag in J9JITConfig. GC can check this flag in the code that decides whether class unloading should be performed.

@dmitripivkine
Copy link
Contributor

There is a dilemma:

  • Aggressive Global GC can be issued immediately
  • Alternatively, set Class Unloading event to trigger Concurrent GC Kickoff. Also it forces Class Unloading for the next GC cycle. So, every time we execute Global GC Class Unloading is going to be selected. Every time we are about to execute Local GC it is going to be percolated to Global GC. The catch for this option there is going to be time interval between Concurrent GC Kickoff and actual STW Global GC cycle. This time can be prolonged, so code cache exhaustion can not be cleared immediately.

@mpirvu
Copy link
Contributor

mpirvu commented Feb 6, 2025

I analyzed a pair of verboseLog+GClog. Between the moment the code cache manager declares that it's almost out of code cache space and the moment it declares that all code caches are full there are 24 seconds. Between these two points I see 4 scavenge operations taking place. If these scavenge operations are upgraded to GlobalGC+classUnloading, we might be able to clear some space in the code cache.

@mpirvu
Copy link
Contributor

mpirvu commented Feb 7, 2025

I have created #21083 to record the lowCodeCacheFreeSpace situation in J9JITConfig.

@dmitripivkine
Copy link
Contributor

I have created #21083 to record the lowCodeCacheFreeSpace situation in J9JITConfig.

Great. I will add GC part of the code as soon as this PR is merged.

@dmitripivkine
Copy link
Contributor

Is it right conceptually that if lowCodeCacheFreeSpace is set once it is not going to be reset again? It means every Local GC is going to percolate to Global GC since it set. Should it be reset if class unloading helped and low code cache free space condition is not true any more?

@mpirvu
Copy link
Contributor

mpirvu commented Feb 7, 2025

I think that for these kind of applications the code caches are going to become again close to full very soon, so doing only GlobalGC might not be a bad idea. We'll have to test it.
The scenario that could become problematic is when the application simply uses too many classes/methods and code caches become completely full. In that case GlobalGCs have no advantage.

@amicic
Copy link
Contributor

amicic commented Feb 10, 2025

We have to be careful and not let Global GCs occur too often. Even the former case you describe is not much less dangerous then the latter and can potentially cause more performance degradation than what would suboptimal compiled code otherwise do.

Once we perform class unloading (based on which JIT should reset this flag) we could perhaps record how effective it was (in terms of relative drop of code cache occupancy) and take it also into account when deciding when to raise the flag again. Easiest for GC, of course, is that JIT keeps that complexity for itself, but perhaps some other things other than code cash occupancy could be taken into account (for example, time or number of local GCs or number of classes loaded since the last class-unloading was performed).

@mpirvu
Copy link
Contributor

mpirvu commented Feb 10, 2025

I do plan to investigate the behavior of this application further if I can get my hands on it. The obvious goal is to determine whether unloading classes under the direction of the JIT is effective in freeing code cache space.
In case I cannot get my hands on the application the next best thing is to give the customer a private build. If it comes to that could you deliver a change to trigger a global GC based on the JITConfig flag under an option (i.e. not as default behavior)? Thanks

@dmitripivkine
Copy link
Contributor

dmitripivkine commented Feb 10, 2025

Do we have GC verbose log? Do we know frequency of Global GC? Do we know frequency of class unloading events?
May be there is a simpler way to investigate the problem.
For example try -Xalwaysclassgc with Gencon, or, ultimately with Optthruput.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants