您的位置：首页 > Web前端

Java GC, HotSpot's CMS promotion buffers

2012-12-31 15:45 323 查看

Recently, I have unfairly blamed promotion local allocation buffers (PLAB) for fragmentation of old space using concurrent mark sweep garbage
collector. I was very wrong. In this article, I'm going to explain how PLABs really work with all details.

PLABs

PLAB stand for promotion local allocation buffer. PLABs are used during young collection. Young collection in CMS (and all other
garbage collectors in HotSpot JVM) is a stop-the-world copy collection. CMS may use multiple threads for young collection, each of these threads may need to allocate space for objects being copied either in survivor or old space. PLABs are required to avoid
competition of threads for shared data structures managing free memory. Each thread have one PLAB for survival space and one for old space. Free memory in survivor space are continuous, so do survivor PLABs, which are simply continuous blocks. On other hand,
free memory in old space (using CMS collector) is fragmented and managed via sophisticated dictionary or free chunks ...

Free list space(FLS)

CMS collector cannot compact old space (actually it can, but compaction involves long stop-the-world pause, often referred as GC freeze). Memory manager operates with lists of free chunks to manage fragmented free space. As a counter measure from fragmentation,
chunks of free space are grouped by size.В If available, free chunk of exact required size will be used to serve allocation request. If chunks of given size are exhausted, memory manager will split larger chunk into several smaller to satisfy demand. Consecutive
free chunk can also be coalesced to create larger ones (coalescence is made along with sweeping during concurrent GC cycle). This splitting/coalesce logic is controlled by complex heuristics and chunk demand per size statistics.

Old space PLABs

Naturally old space PLABs mimic structure of indexed free list space. Each thread preallocates certain number of chunk of each size below 257 heap words (large chunk allocated from global space). Number of chunks of each size to be preallocated is controlled
by statistics. Following JVM flag will enabled verbose reporting of old space PLAB sizing (too verbose for production though).

-XX:+PrintOldPLAB

At the beginning of each young collection we will see following lines in GC log

6.347: [ParNew ...

...

0[10]: 722/5239/897

0[12]: 846/5922/987

0[14]: 666/5100/850

...

1[12]: 229/3296/987

1[14]: 2/2621/850

1[16]: 69/1812/564

1[18]: 247/1160/290

...

[10]: 905

[12]: 1002

[14]: 865

[16]: 567

...

First lines are statistics from each scavenger (young collector) thread in following format:

<tid>[<chunk size>]:<num_retire>/<num_blocks>/<blocks_to_claim>

tid - GC thread ID,
chunk size - chunk size in heap words,
num_retire - number of free chunks in PLAB at the end of young GC,
num_blocks - number of chunks allocated from FLS to PLAB during young GC,
blocks_to_claim - desired number of blocks to refill PLAB.
Next few lines show estimated number of chunks (per size) to be preallocated (per GC thread) at beginning of next young collection.

[<chunk size>]: <blocks_to_claim>

Calculating desired block to claim

Initial number of blocks (chunks) per chunk size is configured via -XX:+CMSParPromoteBlocksToClaim JVM command line option (-XX:+OldPLABSize is alias for this option if CMS
GC is used). If resizing of old PLAB is not disabled by -XX:-ResizeOldPLABoption, then desired PLAB size will be adjusted after each young GC.
Ideal desired number per chunk size is calculated by following formula:
block_to_claimideal = MIN(-XX:CMSOldPLABMax, MAX(-XX:CMSOldPLABMin, num_blocks / (-XX:ParallelGCThreads • -XX:CMSOldPLABNumRefills)))
,but effective value is exponentially smoothed over time
blocks_to_claimnext = (1 - w) • blocks_to_claimprev + w • block_to_claimideal
,there w is configured via -XX:OldPLABWeight (0.5 by default).

On-the-fly PLAB resizing

During young collection, if chunk list of certain size will get exhausted, thread will refill it from global free space pool (allocating same number of chunks as at the beginning of collection). Normally thread will have to refill chunk list few times
during collection (-XX:CMSOldPLABNumRefills sets desired number of refills). Though, if initial estimate was too small, GC thread will refill its chunk list too often (refill requires global lock for memory managed, so it may be slow). If on-the-fly PLAB resizing
is enabled JVM will try to detect such conditions as resize PLAB in the middle of young collection.
-XX:+CMSOldPLABResizeQuicker will enable on-the-fly PLAB resizing (disabled by default).
Few more options offer additional tuning:
-XX:CMSOldPLABToleranceFactor=4 – tolerance of the phase-change detector for on-the-fly PLAB resizing during a scavenge.
-XX:CMSOldPLABReactivityFactor=2 – gain in the feedback loop for on-the-fly PLAB resizing В during a scavenge.
-XX:CMSOldPLABReactivityCeiling=10 – clamping of the gain in the feedback loop for on-the-fly PLAB resizing during a scavenge.

Conclusion

I have spent some time digging though OpenJDK code to make sure, that I'm getting that thing now. It was educating. This article has brought up and explained few more arcane JVM options,В though I doubt that I will ever use them in practice. Problem with
heap fragmentation is that you have to run application for really long time before fragmentation will manifest itself. Most of options above require trial and error path (even though -XX:+PrintOldPLAB might give you some insights about your application) .
It much easier just to give damn JVM little more memory (hey, RAM is cheap nowadays) than spend day tuning arcane options.
Anyway, I hope it was as education for you as it was for me.