You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2025-02-21-new-write-barriers.md
+16-16Lines changed: 16 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -13,8 +13,8 @@ Another sources with more information are the corresponding [draft JEP](https://
13
13
14
14
G1 is an incremental garbage collector. During garbage collection, a large part of time can be spent on trying to find references to live objects in the areas of the application heap (just *heap* in the following) to evacuate them. The most simple and probably slowest way would be to look through (scan) the entire heap that is not going to be evacuated for such references. The stop-the-world Hotspot collectors, Serial, Parallel and G1 in principle employ a fairly old technique called Card Marking [[Hölzle93](https://bibliography.selflanguage.org/_static/write-barrier.pdf)] to limit the area to scan for references during the garbage collection pause.
15
15
16
-
In its attempt to better keep pause times, G1 extended this card marking mechanism:
17
-
* concurrent to the application G1 re-examines (**refines**) the cards marked by the application and classifies them. This classification helps in the garbage collection pause to only need to scan cards important for that particular garbage collection.
16
+
In an attempt to better keep pause time goals, G1 extended this card marking mechanism:
17
+
* concurrent to the application, G1 re-examines (**refines**) the cards marked by the application and classifies them. This classification helps in the garbage collection pause to only need to scan cards important for that particular garbage collection.
18
18
* extra code compiled into the application (**write barriers**) removes unnecessary card marks, reducing the amount of cards to scan further.
19
19
20
20
This comes at additional cost as the next few sections will show.
@@ -29,10 +29,10 @@ When the application modifies an object reference, additional code compiled into
29
29
30
30
Figure 1 above shows an example execution of a hypothetical assignment of the field `a` in an object `x` of type `X` with a value of `y`. After writing the value into the field, the write barrier code (to be exact, *post write barrier* code, i.e. code added after setting the value) marks the card.
31
31
32
-
This is where the Serial and Parallel garbage collectors stop at: they let the application accumulate card marks until garbage collection occurs. At that time all of the heap corresponding to the marked cards is scanned for references into the evacuated area. In most applications this is okay effort: the amount of unique cards that need to be scanned during garbage collection is very limited. In other applications scanning the heap corresponding to cards (**scanning the cards**) can take a very significant amount of total garbage collection time.
32
+
This is where the Serial and Parallel garbage collectors stop at: they let the application accumulate card marks until garbage collection occurs. At that time all of the heap corresponding to the marked cards is scanned for references into the evacuated area. In most applications this is okay effort: the number of unique cards that need to be scanned during garbage collection is very limited. In other applications, scanning the heap corresponding to cards (**scanning the cards**) can take a very significant amount of total garbage collection time.
33
33
34
34
G1 tries to reduce this amount of card scanning in the garbage collection pause by several means. The first is using extra garbage collection threads running concurrent to the application clearing, re-examining and classifying card marks because:
35
-
* references are often written over and over again between garbage collections. A card mark caused by a reference write may, by the time the next garbage collection occurs, not contain any interesting reference any more.
35
+
* references are often written over and over again between garbage collections. A card mark caused by a reference write may, by the time the next garbage collection occurs, not contain any interesting reference anymore.
36
36
* by classifying card marks according to where they originate from, it is possible to only scan marked cards that are relevant for this particular garbage collection during the garbage collection.
37
37
38
38
Figure 2, 3 and 4 give details about this re-examination (refinement) process.
@@ -41,13 +41,13 @@ Figure 2, 3 and 4 give details about this re-examination (refinement) process.
41
41
42
42
First, Figure 2 shows that in addition to the actual card mark mentioned above, the write barrier stores (**enqueues**) the card location (in this case `0xabc`) in an internal buffer (**refinement buffer**) shown in green so that the re-examination garbage collector threads (**refinement threads**) can later easily find them again.
43
43
44
-
There is an artificial delay based on available time in the pause for scanning cards and the rate the application generates new marked cards between card mark and refinement. This delay helps decreasing the overhead of an application repeatedly marking the same cards, avoiding that the same cards will be repeatedly enqueued for refinement (as they are already marked). The delay also increases the probability that the references in the card itself are not interesting any more.
44
+
There is an artificial delay based on available time in the pause for scanning cards and the rate the application generates new marked cards between card mark and refinement. This delay helps decreasing the overhead of an application repeatedly marking the same cards, avoiding that the same cards will be repeatedly enqueued for refinement (as they are already marked). The delay also increases the probability that the references in the card itself are not interesting anymore.
45
45
46
46
In the next step, shown in Figure 3, refinement threads will pick up previously enqueued cards for re-examination. In this case, the card at location `0xdef` will be refined. The figure also shows the **remembered sets** in light blue: for every area to evacuate, G1 stores the set of interesting card locations for this area. In this figure every area has such a remembered set attached to it, but there may not be one currently for some. Areas may also be discontiguous in the heap ([JDK-8343782](https://bugs.openjdk.org/browse/JDK-8343782) and [JDK-8336086](https://bugs.openjdk.org/browse/JDK-8336086)). The refinement thread also unmarks the card before looking at the corresponding contents of the Java heap.
Figure 4 finally shows the step where the refinement threads put the examined card (`0xdef`) into the remembered sets of the areas for which that card contains an interesting reference at this point in time. Since the heap covered by a card may contain multiple interesting references, multiple remembered sets may receive that particular card location.
50
+
Finally, Figure 4 shows the step where the refinement threads put the examined card (`0xdef`) into the remembered sets of the areas for which that card contains an interesting reference at this point in time. Since the heap covered by a card may contain multiple interesting references, multiple remembered sets may receive that particular card location.
51
51
52
52
{:style="display:block; margin-left:auto; margin-right:auto"}
53
53
@@ -75,14 +75,14 @@ This sounds straightforward, but unfortunately there is some complication that F
Additionally, concurrent refining of cards can be expensive, particularly if there are no extra processing resources available. So the G1 barrier contains extra filtering code to avoid card marks that are generated by reference writes that make no difference for garbage collection.
78
+
Additionally, concurrent refining of cards can be expensive, particularly if there are no extra processing resources available. So, the G1 barrier contains extra filtering code to avoid card marks that are generated by reference writes that make no difference for garbage collection.
79
79
80
80
The G1 write barrier will not mark a card if
81
-
* the reference assignment write a reference that is not interesting, not crossing areas.
81
+
* the reference assignment writes a reference that is not interesting, not crossing areas.
82
82
* the code assigns a `null` value: these do not generate links between objects, so the corresponding marked cards are unnecessary.
83
83
* the card is already marked, which means that it is already scheduled for refinement.
84
84
85
-
Figure 6 compares the sizes of the resulting, directly inlined part of the G1 write barrier (there is an additional part not shown here which is executed somewhat rarely) on the left with the whole Serial and Parallel GC write barrier on the right ([[Protopopovs23](https://ssw.jku.at/Teaching/MasterTheses/Protopopovs/Thesis.pdf)]).
85
+
Figure 6 compares the sizes of the resulting write barriers, directly inlined part of the G1 write barrier (there is an additional part not shown here which is executed somewhat rarely) on the left with the whole Serial and Parallel GC write barrier on the right ([[Protopopovs23](https://ssw.jku.at/Teaching/MasterTheses/Protopopovs/Thesis.pdf)]).
@@ -139,7 +139,7 @@ The final current post write barrier for G1 reduces to the filters and the actua
139
139
140
140
Line (1) to (3) implement the filters. They are almost the same as before, with a slightly different condition for the check due to a memory optimization.
141
141
142
-
Without the filters, there were some regressions compared to the original write barrier with the filters; the filters also decrease the number of cards that have not been scanned during the garbage collection pause, and the amount of cards to be re-examined, so they were kept for now.
142
+
Without the filters, there were some regressions compared to the original write barrier with the filters; the filters also decrease the number of cards that have not been scanned during the garbage collection pause, and the number of cards to be re-examined, so they were kept for now.
143
143
144
144
Line (5) actually marks the card with a "Dirty" color value.
145
145
@@ -148,7 +148,7 @@ The original card marking paper uses two different values for card table entries
148
148
***clean** - the card does not contain an interesting reference.
149
149
***dirty** - the card may contain an interesting reference.
150
150
***already-scanned** - used during garbabge collection to indicate that this card has already been scanned.
151
-
***to-collection-set** - the card may contain an interesting reference to the heap areas that are going to be collected in the next garbage collection (the **collection set**, hence the name). This collection set always contains the young generation.
151
+
***to-collection-set** - the card may contain an interesting reference to the areas of the heap that are going to be collected in the next garbage collection (the **collection set**, hence the name). This collection set always contains the young generation.
152
152
153
153
Refinement can skip scanning these cards because it will always be scanned during garbage collection because G1 always collects the young generation. Adding this card to the remembered sets is not needed, it would actually be duplicate information.
154
154
@@ -161,7 +161,7 @@ The last two card colors are new. The use of the to-collection-set color explain
161
161
162
162
The goal of the card table switching process is to make sure that all threads in the system agree on that the refinement table is now the application card table and the other way around to avoid the problematic situation described earlier in Figure 5.
163
163
164
-
The process is initiated by a special background thread, the refinement control thread. It regularly estimates whether the currently estimated amount of cards at the start of the next garbage collection would exceed the allowed number of not re-examined cards given card examination rate. If a refinement round is necessary, it also calculates the number of refinement worker threads, which do the actual work, needed to complete before the garbage collection.
164
+
The process is initiated by a special background thread, the refinement control thread. It regularly estimates whether the currently estimated number of cards at the start of the next garbage collection would exceed the allowed number of not re-examined cards given card examination rate. If a refinement round is necessary, it also calculates the number of refinement worker threads, which do the actual work, needed to complete before the garbage collection.
165
165
166
166
This refinement round consists of
167
167
@@ -192,7 +192,7 @@ Any of these steps may be interrupted by a safepoint, which may be a garbage col
192
192
193
193
Refinement heuristics try to avoid having garbage collection interrupt refinement. In this case, the refinement table is all unmarked at the start of the garbage collection, and all the not-yet examined marked cards are on the main card table where the following card table scan phase expects them. No further action except putting the remembered sets of areas to be collected on the main card table must be taken to be able to search for marked cards efficiently on the card table.
194
194
195
-
Previously G1 had information about the location of all marked cards, they were either in the remembered sets, or in the refinement buffers to refine cards. Based on this, G1 could create a more detailed map of where marked cards were located, and only search those areas for marked cards instead of searching the whole card table. However searching for marked cards is linear access to a relatively little area of memory, so very fast.
195
+
Previously G1 had information about the location of all marked cards, they were either in the remembered sets, or in the refinement buffers to refine cards. Based on this, G1 could create a more detailed map of where marked cards were located, and only search those areas for marked cards instead of searching the whole card table. However, searching for marked cards is linear access to a relatively little area of memory, so very fast.
196
196
197
197
The absence of more precise location information for marked cards is also offset by not needing to calculate this information.
198
198
@@ -205,7 +205,7 @@ In this case the G1 garbage collector there is a new `Merge Refinement Table` ph
1. (Optionally) **Snapshot the heap** as above, if the refinement had been interrupted in phase 1 of the process.
208
-
1.**Merge the refinement table** into the card table. This steps combines card marks from both card tables into the main card table. This is a logical or of both cards. All marks on the refinement table are removed.
208
+
1.**Merge the refinement table** into the card table. This step combines card marks from both card tables into the main card table. This is a logical or of both cards. All marks on the refinement table are removed.
209
209
1.**Calculate statistics** as above.
210
210
211
211
The reason why the refinement table needs to be completely unmarked at the start of the garbage collection is that G1 uses it to collect card marks containing interesting references for objects evacuated during the garbage collection in the heap areas the objects are evacuated to. This is similar to previously used extra refinement buffers to store those.
@@ -238,7 +238,7 @@ The optimization to color these remembered set entries specially keeps duplicate
238
238
239
239
In some applications these memory reductions completely offset the additional card table memory usage, but this is fairly rare. Particularly applications that did not have large remembered sets for the young generation, which are mostly very throughput-oriented applications, show the above mentioned additional memory usage.
240
240
241
-
The refinement table is only required if the applications needs to do any refinement. So the refinement table could be allocated lazily, i.e. only if there is some refinement. There is a large overlap between such applications and above very throughput-oriented applications. This is not implemented in the current version.
241
+
The refinement table is only required if the application needs to do any refinement. So, the refinement table could be allocated lazily, i.e. only if there is some refinement. There is a large overlap between such applications and above very throughput-oriented applications. This is not implemented in the current version.
242
242
243
243
### Latency, Pause Times
244
244
@@ -252,7 +252,7 @@ The cards created during garbage collection do not need to be redirtied, so that
252
252
253
253
This change removes the need for a large part of G1's write barrier using a dual card table approach to avoid fine-grained synchronization, increasing throughput significantly for applications.
254
254
255
-
Overall I'm quite satisfied with the change - after many years thinking about and prototyping solutions to the problem without introducing some "G1 throughput mode" that would have huge implications on maintainability (basically another garbage collector) or making G1 unnecessarily complex this seems a very good solution, taking the advantages of these throughput barriers without too many drawbacks.
255
+
Overall, I'm quite satisfied with the change - after many years thinking about and prototyping solutions to the problem without introducing some "G1 throughput mode" that would have huge implications on maintainability (basically another garbage collector) or making G1 unnecessarily complex this seems a very good solution, taking the advantages of these throughput barriers without too many drawbacks.
256
256
257
257
A lot of people helped with this change, my thanks.
0 commit comments