Skip to content

Commit 14c7caa

Browse files
committed
runtime: add 24 byte allocation size class
This CL introduces a 24 byte allocation size class which fits 3 pointers on 64 bit and 6 pointers on 32 bit architectures. Notably this new size class fits a slice header on 64 bit architectures exactly while previously a 32 byte size class would have been used for allocating a slice header on the heap. The main complexity added with this CL is that heapBitsSetType needs to handle objects that aren't 16-byte aligned but contain more than a single pointer on 64-bit architectures. Due to having a non 16 byte aligned size class on 32 bit a h.shift of 2 is now possible which means a heap bitmap byte might only be partially written. Due to this already having been possible on 64 bit before the heap bitmap code only needed minor adjustments for 32 bit doublecheck code paths. Note that this CL changes the slice capacity allocated by append for slice growth to a target capacity of 17 to 24 bytes. On 64 bit architectures the capacity of the slice returned by append([]byte{}, make([]byte, 24)...)) is 32 bytes before and 24 bytes after this CL. Depending on allocation patterns of the specific Go program this can increase the number of total alloctions as subsequent appends to the slice can trigger slice growth earlier than before. On the other side if the slice is never appended to again above its capacity this will lower heap usage by 8 bytes. This CL changes the set of size classes reported in the runtime.MemStats.BySize array due to it being limited to a total of 61 size classes. The new 24 byte size class is now included and the 20480 byte size class is not included anymore. Fixes #8885 name old time/op new time/op delta Template 196ms ± 3% 194ms ± 2% ~ (p=0.247 n=10+10) Unicode 85.6ms ±16% 88.1ms ± 1% ~ (p=0.165 n=10+10) GoTypes 673ms ± 2% 668ms ± 2% ~ (p=0.258 n=9+9) Compiler 3.14s ± 6% 3.08s ± 1% ~ (p=0.243 n=10+9) SSA 6.82s ± 1% 6.76s ± 1% -0.87% (p=0.006 n=9+10) Flate 128ms ± 7% 127ms ± 3% ~ (p=0.739 n=10+10) GoParser 154ms ± 3% 153ms ± 4% ~ (p=0.730 n=9+9) Reflect 404ms ± 1% 412ms ± 4% +1.99% (p=0.022 n=9+10) Tar 172ms ± 4% 170ms ± 4% ~ (p=0.065 n=10+9) XML 231ms ± 4% 230ms ± 3% ~ (p=0.912 n=10+10) LinkCompiler 341ms ± 1% 339ms ± 1% ~ (p=0.243 n=9+10) ExternalLinkCompiler 1.72s ± 1% 1.72s ± 1% ~ (p=0.661 n=9+10) LinkWithoutDebugCompiler 221ms ± 2% 221ms ± 2% ~ (p=0.529 n=10+10) StdCmd 18.4s ± 3% 18.2s ± 1% ~ (p=0.515 n=10+8) name old user-time/op new user-time/op delta Template 238ms ± 4% 243ms ± 6% ~ (p=0.661 n=9+10) Unicode 116ms ± 6% 113ms ± 3% -3.37% (p=0.035 n=9+10) GoTypes 854ms ± 2% 848ms ± 2% ~ (p=0.604 n=9+10) Compiler 4.10s ± 1% 4.11s ± 1% ~ (p=0.481 n=8+9) SSA 9.49s ± 1% 9.41s ± 1% -0.92% (p=0.001 n=9+10) Flate 149ms ± 6% 151ms ± 7% ~ (p=0.481 n=10+10) GoParser 189ms ± 2% 190ms ± 2% ~ (p=0.497 n=9+10) Reflect 511ms ± 2% 508ms ± 2% ~ (p=0.211 n=9+10) Tar 215ms ± 4% 212ms ± 3% ~ (p=0.105 n=10+10) XML 288ms ± 2% 288ms ± 2% ~ (p=0.971 n=10+10) LinkCompiler 559ms ± 4% 557ms ± 1% ~ (p=0.968 n=9+10) ExternalLinkCompiler 1.78s ± 1% 1.77s ± 1% ~ (p=0.055 n=8+10) LinkWithoutDebugCompiler 245ms ± 3% 245ms ± 2% ~ (p=0.684 n=10+10) name old alloc/op new alloc/op delta Template 34.8MB ± 0% 34.4MB ± 0% -0.95% (p=0.000 n=9+10) Unicode 28.6MB ± 0% 28.3MB ± 0% -0.95% (p=0.000 n=10+10) GoTypes 115MB ± 0% 114MB ± 0% -1.02% (p=0.000 n=10+9) Compiler 554MB ± 0% 549MB ± 0% -0.86% (p=0.000 n=9+10) SSA 1.28GB ± 0% 1.27GB ± 0% -0.83% (p=0.000 n=10+10) Flate 21.8MB ± 0% 21.6MB ± 0% -0.87% (p=0.000 n=8+10) GoParser 26.7MB ± 0% 26.4MB ± 0% -0.97% (p=0.000 n=10+9) Reflect 75.0MB ± 0% 74.1MB ± 0% -1.18% (p=0.000 n=10+10) Tar 32.6MB ± 0% 32.3MB ± 0% -0.94% (p=0.000 n=10+7) XML 41.5MB ± 0% 41.2MB ± 0% -0.90% (p=0.000 n=10+8) LinkCompiler 105MB ± 0% 104MB ± 0% -0.94% (p=0.000 n=10+10) ExternalLinkCompiler 153MB ± 0% 152MB ± 0% -0.69% (p=0.000 n=10+10) LinkWithoutDebugCompiler 63.7MB ± 0% 63.6MB ± 0% -0.13% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 336k ± 0% 336k ± 0% +0.02% (p=0.002 n=10+10) Unicode 332k ± 0% 332k ± 0% ~ (p=0.447 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% +0.01% (p=0.001 n=10+10) Compiler 4.92M ± 0% 4.92M ± 0% +0.01% (p=0.000 n=10+10) SSA 11.9M ± 0% 11.9M ± 0% +0.02% (p=0.000 n=9+10) Flate 214k ± 0% 214k ± 0% +0.02% (p=0.032 n=10+8) GoParser 270k ± 0% 270k ± 0% +0.02% (p=0.004 n=10+9) Reflect 877k ± 0% 877k ± 0% +0.01% (p=0.000 n=10+10) Tar 313k ± 0% 313k ± 0% ~ (p=0.075 n=9+10) XML 387k ± 0% 387k ± 0% +0.02% (p=0.007 n=10+10) LinkCompiler 455k ± 0% 456k ± 0% +0.08% (p=0.000 n=10+9) ExternalLinkCompiler 670k ± 0% 671k ± 0% +0.06% (p=0.000 n=10+10) LinkWithoutDebugCompiler 113k ± 0% 113k ± 0% ~ (p=0.149 n=10+10) name old maxRSS/op new maxRSS/op delta Template 34.1M ± 1% 34.1M ± 1% ~ (p=0.853 n=10+10) Unicode 35.1M ± 1% 34.6M ± 1% -1.43% (p=0.000 n=10+10) GoTypes 72.8M ± 3% 73.3M ± 2% ~ (p=0.724 n=10+10) Compiler 288M ± 3% 295M ± 4% ~ (p=0.393 n=10+10) SSA 630M ± 1% 622M ± 1% -1.18% (p=0.001 n=10+10) Flate 26.0M ± 1% 26.2M ± 2% ~ (p=0.493 n=10+10) GoParser 28.6M ± 1% 28.5M ± 2% ~ (p=0.256 n=10+10) Reflect 55.5M ± 2% 55.4M ± 1% ~ (p=0.436 n=10+10) Tar 33.0M ± 1% 32.8M ± 2% ~ (p=0.075 n=10+10) XML 38.7M ± 1% 39.0M ± 1% ~ (p=0.053 n=9+10) LinkCompiler 164M ± 1% 164M ± 1% -0.27% (p=0.029 n=10+10) ExternalLinkCompiler 174M ± 0% 173M ± 0% -0.33% (p=0.002 n=9+10) LinkWithoutDebugCompiler 137M ± 0% 136M ± 2% ~ (p=0.825 n=9+10) Change-Id: I9ecf2a10024513abef8fbfbe519e44e0b29b6167 Reviewed-on: https://go-review.googlesource.com/c/go/+/242258 Trust: Martin Möhrmann <[email protected]> Trust: Michael Knyszek <[email protected]> Run-TryBot: Martin Möhrmann <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> Reviewed-by: Keith Randall <[email protected]>
1 parent c0c396b commit 14c7caa

File tree

4 files changed

+174
-89
lines changed

4 files changed

+174
-89
lines changed

src/runtime/mbitmap.go

+96-16
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,9 @@
3030
// indicates scanning can ignore the rest of the allocation.
3131
//
3232
// The 2-bit entries are split when written into the byte, so that the top half
33-
// of the byte contains 4 high bits and the bottom half contains 4 low (pointer)
34-
// bits.
35-
// This form allows a copy from the 1-bit to the 4-bit form to keep the
36-
// pointer bits contiguous, instead of having to space them out.
33+
// of the byte contains 4 high (scan) bits and the bottom half contains 4 low
34+
// (pointer) bits. This form allows a copy from the 1-bit to the 4-bit form to
35+
// keep the pointer bits contiguous, instead of having to space them out.
3736
//
3837
// The code makes use of the fact that the zero value for a heap
3938
// bitmap means scalar/dead. This property must be preserved when
@@ -816,6 +815,12 @@ func (s *mspan) countAlloc() int {
816815
func heapBitsSetType(x, size, dataSize uintptr, typ *_type) {
817816
const doubleCheck = false // slow but helpful; enable to test modifications to this code
818817

818+
const (
819+
mask1 = bitPointer | bitScan // 00010001
820+
mask2 = bitPointer | bitScan | mask1<<heapBitsShift // 00110011
821+
mask3 = bitPointer | bitScan | mask2<<heapBitsShift // 01110111
822+
)
823+
819824
// dataSize is always size rounded up to the next malloc size class,
820825
// except in the case of allocating a defer block, in which case
821826
// size is sizeof(_defer{}) (at least 6 words) and dataSize may be
@@ -844,11 +849,12 @@ func heapBitsSetType(x, size, dataSize uintptr, typ *_type) {
844849
h := heapBitsForAddr(x)
845850
ptrmask := typ.gcdata // start of 1-bit pointer mask (or GC program, handled below)
846851

847-
// Heap bitmap bits for 2-word object are only 4 bits,
848-
// so also shared with objects next to it.
849-
// This is called out as a special case primarily for 32-bit systems,
850-
// so that on 32-bit systems the code below can assume all objects
851-
// are 4-word aligned (because they're all 16-byte aligned).
852+
// 2-word objects only have 4 bitmap bits and 3-word objects only have 6 bitmap bits.
853+
// Therefore, these objects share a heap bitmap byte with the objects next to them.
854+
// These are called out as a special case primarily so the code below can assume all
855+
// objects are at least 4 words long and that their bitmaps start either at the beginning
856+
// of a bitmap byte, or half-way in (h.shift of 0 and 2 respectively).
857+
852858
if size == 2*sys.PtrSize {
853859
if typ.size == sys.PtrSize {
854860
// We're allocating a block big enough to hold two pointers.
@@ -865,7 +871,7 @@ func heapBitsSetType(x, size, dataSize uintptr, typ *_type) {
865871
*h.bitp &^= (bitPointer | bitScan | (bitPointer|bitScan)<<heapBitsShift) << h.shift
866872
*h.bitp |= (bitPointer | bitScan) << h.shift
867873
} else {
868-
// 2-element slice of pointer.
874+
// 2-element array of pointer.
869875
*h.bitp |= (bitPointer | bitScan | (bitPointer|bitScan)<<heapBitsShift) << h.shift
870876
}
871877
return
@@ -886,6 +892,70 @@ func heapBitsSetType(x, size, dataSize uintptr, typ *_type) {
886892
*h.bitp &^= (bitPointer | bitScan | ((bitPointer | bitScan) << heapBitsShift)) << h.shift
887893
*h.bitp |= uint8(hb << h.shift)
888894
return
895+
} else if size == 3*sys.PtrSize {
896+
b := uint8(*ptrmask)
897+
if doubleCheck {
898+
if b == 0 {
899+
println("runtime: invalid type ", typ.string())
900+
throw("heapBitsSetType: called with non-pointer type")
901+
}
902+
if sys.PtrSize != 8 {
903+
throw("heapBitsSetType: unexpected 3 pointer wide size class on 32 bit")
904+
}
905+
if typ.kind&kindGCProg != 0 {
906+
throw("heapBitsSetType: unexpected GC prog for 3 pointer wide size class")
907+
}
908+
if typ.size == 2*sys.PtrSize {
909+
print("runtime: heapBitsSetType size=", size, " but typ.size=", typ.size, "\n")
910+
throw("heapBitsSetType: inconsistent object sizes")
911+
}
912+
}
913+
if typ.size == sys.PtrSize {
914+
// The type contains a pointer otherwise heapBitsSetType wouldn't have been called.
915+
// Since the type is only 1 pointer wide and contains a pointer, its gcdata must be exactly 1.
916+
if doubleCheck && *typ.gcdata != 1 {
917+
print("runtime: heapBitsSetType size=", size, " typ.size=", typ.size, "but *typ.gcdata", *typ.gcdata, "\n")
918+
throw("heapBitsSetType: unexpected gcdata for 1 pointer wide type size in 3 pointer wide size class")
919+
}
920+
// 3 element array of pointers. Unrolling ptrmask 3 times into p yields 00000111.
921+
b = 7
922+
}
923+
924+
hb := b & 7
925+
// Set bitScan bits for all pointers.
926+
hb |= hb << wordsPerBitmapByte
927+
// First bitScan bit is always set since the type contains pointers.
928+
hb |= bitScan
929+
// Second bitScan bit needs to also be set if the third bitScan bit is set.
930+
hb |= hb & (bitScan << (2 * heapBitsShift)) >> 1
931+
932+
// For h.shift > 1 heap bits cross a byte boundary and need to be written part
933+
// to h.bitp and part to the next h.bitp.
934+
switch h.shift {
935+
case 0:
936+
*h.bitp &^= mask3 << 0
937+
*h.bitp |= hb << 0
938+
case 1:
939+
*h.bitp &^= mask3 << 1
940+
*h.bitp |= hb << 1
941+
case 2:
942+
*h.bitp &^= mask2 << 2
943+
*h.bitp |= (hb & mask2) << 2
944+
// Two words written to the first byte.
945+
// Advance two words to get to the next byte.
946+
h = h.next().next()
947+
*h.bitp &^= mask1
948+
*h.bitp |= (hb >> 2) & mask1
949+
case 3:
950+
*h.bitp &^= mask1 << 3
951+
*h.bitp |= (hb & mask1) << 3
952+
// One word written to the first byte.
953+
// Advance one word to get to the next byte.
954+
h = h.next()
955+
*h.bitp &^= mask2
956+
*h.bitp |= (hb >> 1) & mask2
957+
}
958+
return
889959
}
890960

891961
// Copy from 1-bit ptrmask into 2-bit bitmap.
@@ -1079,7 +1149,7 @@ func heapBitsSetType(x, size, dataSize uintptr, typ *_type) {
10791149
// word must be set to scan since there are pointers
10801150
// somewhere in the object.
10811151
// In all following words, we set the scan/dead
1082-
// appropriately to indicate that the object contains
1152+
// appropriately to indicate that the object continues
10831153
// to the next 2-bit entry in the bitmap.
10841154
//
10851155
// We set four bits at a time here, but if the object
@@ -1095,12 +1165,22 @@ func heapBitsSetType(x, size, dataSize uintptr, typ *_type) {
10951165
b >>= 4
10961166
nb -= 4
10971167

1098-
case sys.PtrSize == 8 && h.shift == 2:
1168+
case h.shift == 2:
10991169
// Ptrmask and heap bitmap are misaligned.
1170+
//
1171+
// On 32 bit architectures only the 6-word object that corresponds
1172+
// to a 24 bytes size class can start with h.shift of 2 here since
1173+
// all other non 16 byte aligned size classes have been handled by
1174+
// special code paths at the beginning of heapBitsSetType on 32 bit.
1175+
//
1176+
// Many size classes are only 16 byte aligned. On 64 bit architectures
1177+
// this results in a heap bitmap position starting with a h.shift of 2.
1178+
//
11001179
// The bits for the first two words are in a byte shared
11011180
// with another object, so we must be careful with the bits
11021181
// already there.
1103-
// We took care of 1-word and 2-word objects above,
1182+
//
1183+
// We took care of 1-word, 2-word, and 3-word objects above,
11041184
// so this is at least a 6-word object.
11051185
hb = (b & (bitPointer | bitPointer<<heapBitsShift)) << (2 * heapBitsShift)
11061186
hb |= bitScan << (2 * heapBitsShift)
@@ -1113,7 +1193,7 @@ func heapBitsSetType(x, size, dataSize uintptr, typ *_type) {
11131193
*hbitp |= uint8(hb)
11141194
hbitp = add1(hbitp)
11151195
if w += 2; w >= nw {
1116-
// We know that there is more data, because we handled 2-word objects above.
1196+
// We know that there is more data, because we handled 2-word and 3-word objects above.
11171197
// This must be at least a 6-word object. If we're out of pointer words,
11181198
// mark no scan in next bitmap byte and finish.
11191199
hb = 0
@@ -1248,12 +1328,12 @@ Phase4:
12481328
// Handle the first byte specially if it's shared. See
12491329
// Phase 1 for why this is the only special case we need.
12501330
if doubleCheck {
1251-
if !(h.shift == 0 || (sys.PtrSize == 8 && h.shift == 2)) {
1331+
if !(h.shift == 0 || h.shift == 2) {
12521332
print("x=", x, " size=", size, " cnw=", h.shift, "\n")
12531333
throw("bad start shift")
12541334
}
12551335
}
1256-
if sys.PtrSize == 8 && h.shift == 2 {
1336+
if h.shift == 2 {
12571337
*h.bitp = *h.bitp&^((bitPointer|bitScan|(bitPointer|bitScan)<<heapBitsShift)<<(2*heapBitsShift)) | *src
12581338
h = h.next().next()
12591339
cnw -= 2

src/runtime/mksizeclasses.go

+3-3
Original file line numberDiff line numberDiff line change
@@ -110,8 +110,8 @@ func makeClasses() []class {
110110
align = 256
111111
} else if size >= 128 {
112112
align = size / 8
113-
} else if size >= 16 {
114-
align = 16 // required for x86 SSE instructions, if we want to use them
113+
} else if size >= 32 {
114+
align = 16 // heap bitmaps assume 16 byte alignment for allocations >= 32 bytes.
115115
}
116116
}
117117
if !powerOfTwo(align) {
@@ -157,7 +157,7 @@ func makeClasses() []class {
157157
}
158158
}
159159

160-
if len(classes) != 67 {
160+
if len(classes) != 68 {
161161
panic("number of size classes has changed")
162162
}
163163

src/runtime/mstats.go

+4
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,10 @@ type mstats struct {
7878
nfree uint64
7979
}
8080

81+
// Add an uint32 for even number of size classes to align below fields
82+
// to 64 bits for atomic operations on 32 bit platforms.
83+
_ [1 - _NumSizeClasses%2]uint32
84+
8185
// Statistics below here are not exported to MemStats directly.
8286

8387
last_gc_nanotime uint64 // last gc (monotonic time)

0 commit comments

Comments
 (0)