Java GC, HotSpot's CMS promotion buffers
2012-12-31 15:45
323 查看
Recently, I have unfairly blamed promotion local allocation buffers (PLAB) for fragmentation of old space using concurrent mark sweep garbage
collector. I was very wrong. In this article, I'm going to explain how PLABs really work with all details.
garbage collectors in HotSpot JVM) is a stop-the-world copy collection. CMS may use multiple threads for young collection, each of these threads may need to allocate space for objects being copied either in survivor or old space. PLABs are required to avoid
competition of threads for shared data structures managing free memory. Each thread have one PLAB for survival space and one for old space. Free memory in survivor space are continuous, so do survivor PLABs, which are simply continuous blocks. On other hand,
free memory in old space (using CMS collector) is fragmented and managed via sophisticated dictionary or free chunks ...
chunks of free space are grouped by size.В If available, free chunk of exact required size will be used to serve allocation request. If chunks of given size are exhausted, memory manager will split larger chunk into several smaller to satisfy demand. Consecutive
free chunk can also be coalesced to create larger ones (coalescence is made along with sweeping during concurrent GC cycle). This splitting/coalesce logic is controlled by complex heuristics and chunk demand per size statistics.
by statistics. Following JVM flag will enabled verbose reporting of old space PLAB sizing (too verbose for production though).
-XX:+PrintOldPLAB
At the beginning of each young collection we will see following lines in GC log
6.347: [ParNew ...
...
0[10]: 722/5239/897
0[12]: 846/5922/987
0[14]: 666/5100/850
...
1[12]: 229/3296/987
1[14]: 2/2621/850
1[16]: 69/1812/564
1[18]: 247/1160/290
...
[10]: 905
[12]: 1002
[14]: 865
[16]: 567
...
First lines are statistics from each scavenger (young collector) thread in following format:
<tid>[<chunk size>]:<num_retire>/<num_blocks>/<blocks_to_claim>
tid - GC thread ID,
chunk size - chunk size in heap words,
num_retire - number of free chunks in PLAB at the end of young GC,
num_blocks - number of chunks allocated from FLS to PLAB during young GC,
blocks_to_claim - desired number of blocks to refill PLAB.
Next few lines show estimated number of chunks (per size) to be preallocated (per GC thread) at beginning of next young collection.
[<chunk size>]: <blocks_to_claim>
GC is used). If resizing of old PLAB is not disabled by -XX:-ResizeOldPLABoption, then desired PLAB size will be adjusted after each young GC.
Ideal desired number per chunk size is calculated by following formula:
block_to_claimideal = MIN(-XX:CMSOldPLABMax, MAX(-XX:CMSOldPLABMin, num_blocks / (-XX:ParallelGCThreads • -XX:CMSOldPLABNumRefills)))
,but effective value is exponentially smoothed over time
blocks_to_claimnext = (1 - w) • blocks_to_claimprev + w • block_to_claimideal
,there w is configured via -XX:OldPLABWeight (0.5 by default).
during collection (-XX:CMSOldPLABNumRefills sets desired number of refills). Though, if initial estimate was too small, GC thread will refill its chunk list too often (refill requires global lock for memory managed, so it may be slow). If on-the-fly PLAB resizing
is enabled JVM will try to detect such conditions as resize PLAB in the middle of young collection.
-XX:+CMSOldPLABResizeQuicker will enable on-the-fly PLAB resizing (disabled by default).
Few more options offer additional tuning:
-XX:CMSOldPLABToleranceFactor=4 – tolerance of the phase-change detector for on-the-fly PLAB resizing during a scavenge.
-XX:CMSOldPLABReactivityFactor=2 – gain in the feedback loop for on-the-fly PLAB resizing В during a scavenge.
-XX:CMSOldPLABReactivityCeiling=10 – clamping of the gain in the feedback loop for on-the-fly PLAB resizing during a scavenge.
heap fragmentation is that you have to run application for really long time before fragmentation will manifest itself. Most of options above require trial and error path (even though -XX:+PrintOldPLAB might give you some insights about your application) .
It much easier just to give damn JVM little more memory (hey, RAM is cheap nowadays) than spend day tuning arcane options.
Anyway, I hope it was as education for you as it was for me.
CMS heap fragmentation. Follow up 1
Other articles about garbage collection in this blog
collector. I was very wrong. In this article, I'm going to explain how PLABs really work with all details.
PLABs
PLAB stand for promotion local allocation buffer. PLABs are used during young collection. Young collection in CMS (and all othergarbage collectors in HotSpot JVM) is a stop-the-world copy collection. CMS may use multiple threads for young collection, each of these threads may need to allocate space for objects being copied either in survivor or old space. PLABs are required to avoid
competition of threads for shared data structures managing free memory. Each thread have one PLAB for survival space and one for old space. Free memory in survivor space are continuous, so do survivor PLABs, which are simply continuous blocks. On other hand,
free memory in old space (using CMS collector) is fragmented and managed via sophisticated dictionary or free chunks ...
Free list space(FLS)
CMS collector cannot compact old space (actually it can, but compaction involves long stop-the-world pause, often referred as GC freeze). Memory manager operates with lists of free chunks to manage fragmented free space. As a counter measure from fragmentation,chunks of free space are grouped by size.В If available, free chunk of exact required size will be used to serve allocation request. If chunks of given size are exhausted, memory manager will split larger chunk into several smaller to satisfy demand. Consecutive
free chunk can also be coalesced to create larger ones (coalescence is made along with sweeping during concurrent GC cycle). This splitting/coalesce logic is controlled by complex heuristics and chunk demand per size statistics.
Old space PLABs
Naturally old space PLABs mimic structure of indexed free list space. Each thread preallocates certain number of chunk of each size below 257 heap words (large chunk allocated from global space). Number of chunks of each size to be preallocated is controlledby statistics. Following JVM flag will enabled verbose reporting of old space PLAB sizing (too verbose for production though).
-XX:+PrintOldPLAB
At the beginning of each young collection we will see following lines in GC log
6.347: [ParNew ...
...
0[10]: 722/5239/897
0[12]: 846/5922/987
0[14]: 666/5100/850
...
1[12]: 229/3296/987
1[14]: 2/2621/850
1[16]: 69/1812/564
1[18]: 247/1160/290
...
[10]: 905
[12]: 1002
[14]: 865
[16]: 567
...
First lines are statistics from each scavenger (young collector) thread in following format:
<tid>[<chunk size>]:<num_retire>/<num_blocks>/<blocks_to_claim>
tid - GC thread ID,
chunk size - chunk size in heap words,
num_retire - number of free chunks in PLAB at the end of young GC,
num_blocks - number of chunks allocated from FLS to PLAB during young GC,
blocks_to_claim - desired number of blocks to refill PLAB.
Next few lines show estimated number of chunks (per size) to be preallocated (per GC thread) at beginning of next young collection.
[<chunk size>]: <blocks_to_claim>
Calculating desired block to claim
Initial number of blocks (chunks) per chunk size is configured via -XX:+CMSParPromoteBlocksToClaim JVM command line option (-XX:+OldPLABSize is alias for this option if CMSGC is used). If resizing of old PLAB is not disabled by -XX:-ResizeOldPLABoption, then desired PLAB size will be adjusted after each young GC.
Ideal desired number per chunk size is calculated by following formula:
block_to_claimideal = MIN(-XX:CMSOldPLABMax, MAX(-XX:CMSOldPLABMin, num_blocks / (-XX:ParallelGCThreads • -XX:CMSOldPLABNumRefills)))
,but effective value is exponentially smoothed over time
blocks_to_claimnext = (1 - w) • blocks_to_claimprev + w • block_to_claimideal
,there w is configured via -XX:OldPLABWeight (0.5 by default).
On-the-fly PLAB resizing
During young collection, if chunk list of certain size will get exhausted, thread will refill it from global free space pool (allocating same number of chunks as at the beginning of collection). Normally thread will have to refill chunk list few timesduring collection (-XX:CMSOldPLABNumRefills sets desired number of refills). Though, if initial estimate was too small, GC thread will refill its chunk list too often (refill requires global lock for memory managed, so it may be slow). If on-the-fly PLAB resizing
is enabled JVM will try to detect such conditions as resize PLAB in the middle of young collection.
-XX:+CMSOldPLABResizeQuicker will enable on-the-fly PLAB resizing (disabled by default).
Few more options offer additional tuning:
-XX:CMSOldPLABToleranceFactor=4 – tolerance of the phase-change detector for on-the-fly PLAB resizing during a scavenge.
-XX:CMSOldPLABReactivityFactor=2 – gain in the feedback loop for on-the-fly PLAB resizing В during a scavenge.
-XX:CMSOldPLABReactivityCeiling=10 – clamping of the gain in the feedback loop for on-the-fly PLAB resizing during a scavenge.
Conclusion
I have spent some time digging though OpenJDK code to make sure, that I'm getting that thing now. It was educating. This article has brought up and explained few more arcane JVM options,В though I doubt that I will ever use them in practice. Problem withheap fragmentation is that you have to run application for really long time before fragmentation will manifest itself. Most of options above require trial and error path (even though -XX:+PrintOldPLAB might give you some insights about your application) .
It much easier just to give damn JVM little more memory (hey, RAM is cheap nowadays) than spend day tuning arcane options.
Anyway, I hope it was as education for you as it was for me.
See also
Java GC, HotSpot's CMS and heap fragmentationCMS heap fragmentation. Follow up 1
Other articles about garbage collection in this blog
相关文章推荐
- Java - Java Platform, SE HotSpot VM GC Tuning Guide - 8 Concurrent Mark Sweep (CMS) Collector
- Java - Java Platform, SE HotSpot VM GC Tuning Guide - 11 Other Considerations
- Java - Java Platform, SE HotSpot VM GC Tuning Guide - 1 Introduction
- Java - Java Platform, SE HotSpot VM GC Tuning Guide - 4 Sizing the Generations
- Java - Java Platform, SE HotSpot VM GC Tuning Guide - 5 Available Collectors
- Java - Java Platform, SE HotSpot VM GC Tuning Guide - 6 The Parallel Collector
- Java - Java Platform, SE HotSpot VM GC Tuning Guide - 7 The Mostly Concurrent Collectors
- Java - Java Platform, SE HotSpot VM GC Tuning Guide - 2 Ergonomics
- Java - Java Platform, SE HotSpot VM GC Tuning Guide
- Java - Java Platform, SE HotSpot VM GC Tuning Guide - 9 Garbage-First Garbage Collector
- Java - Java Platform, SE HotSpot VM GC Tuning Guide - Preface
- Java - Java Platform, SE HotSpot VM GC Tuning Guide - 10 Garbage-First Garbage Collector Tuning
- Java 性能优化 - Sun Hotspot JDK JVM 参数设置
- Java HotSpot Client VM 和 Java HotSpot Server VM的配置
- Java HotSpot Client VM 和 Java HotSpot Server VM的配置.
- Java HotSpot VM Options
- memory management in the java HotSpot Virtual Machine(中文翻译)
- java HotSpot虚拟机垃圾回收优化(四、Sizing the Generations)
- JVM基础 之The Java HotSpot Performance Engine Architecture
- Java HotSpot VM 参数