您的位置:首页 > 编程语言 > Java开发

[Java GC]Algorithm For GC

2016-03-14 23:42 591 查看






Reference Counting






In computer science, garbage collection (GC) is a form of automatic

memory management. The garbage collector, or just collector, attempts

to reclaim garbage, or memory occupied by objects that are no longer

in use by the program.

2.什么样的Object可以被GC收集(这里说的是什么是可达的对象 Reachability of an object):

所有调用栈(call stack)上的对象,其中包括:所有的函数中的局部变量,参数;全局变量(包括静态变量);存活的线程,


Person p = new Person();
p.car = new Car(RED);
p.car.engine = new Engine();
p.car.horn = new AnnoyingHorn();


Person [p]
Car (red)
/           \
Engine    AnnoyingHorn


p.car = new Car(BLUE);


Person [p]
Car (blue)       Car (red)
/           \
Engine    AnnoyingHorn

就可以被回收了(p才是被Root Set所引用的对象,但p不是Root Set。因为Root Set只有上面描述的几类哦)

也就是说,从Root Set出发,直接或间接所能到达的地方都可以成为reachable(或者成为存活的对象lived).这里StackOverFlowYourKit说的很清楚

3.Strong Refrence,Weak Refrence,Soft Refrence

Strong Refrence:无论如何都不会被JVM回收,JVM宁愿抛出

Soft Reference:在GC时不会被回收,也就是说比Strong Refrence稍微弱一点。但内存耗尽的时候就会先收回SoftReference,软引用非常适合于创建缓存,可以用来存储图片缓存

Weak Reference:在GC时一定会被回收,也就是说比Soft Reference稍微弱一点,WeakHashMap来解决,集合的内存问题(集合只要有一个生命周期长的,所有的都不会回收)

最后一个幽灵引用,我也不是很清楚。。(// Todo)


* Created on 2016/3/17.
* @author 王启航
* @version 1.0
public class Reference {
public static void main(String args[]) {

static void WeakReferenceTest() {
String s = new String("WQH");
WeakReference<String> wr = new WeakReference<>(s);
s = null;
while (wr.get() != null) {
System.out.println("WeakReference get :" + wr.get());
System.out.println("System.gc() " + wr.get());


static void SoftReferenceTest() {
String s = new String("WQH"); //必须是new String(),String s = "WQH"是错误的
SoftReference<String> wr = new SoftReference<>(s);
s = null;
while (wr.get() != null) {
System.out.println("SoftReference get :" + wr.get());
System.out.println("System.gc()" + wr.get());



:not-alive 或者 没有被collector访问过

:alive且自己被collector访问过,但是children not-alive 或者 没有被collector访问过



2.2 infant mortality or the generational hypothesis:弱代假设,即大多数对象都在年轻时候死亡

Reference Counting

As a collection algorithm, reference counting tracks, for each object,

a count of the number of references to it held by other objects. If an

object’s reference count reaches zero, the object has become

inaccessible, and can be destroyed.

When an object is destroyed, any objects referenced by that object

also have their reference counts decreased. Because of this, removing

a single reference can potentially lead to a large number of objects

being freed. A common modification allows reference counting to be

made incremental: instead of destroying an object as soon as its

reference count becomes zero, it is added to a list of unreferenced

objects, and periodically (or as needed) one or more items from this

list are destroyed.

Simple reference counts require frequent updates. Whenever a reference

is destroyed or overwritten, the reference count of the object it

references is decremented, and whenever one is created or copied, the

reference count of the object it references is incremented.

Reference counting is also used in disk operating systems and

distributed systems, where full non-incremental tracing garbage

collection is too time consuming because of the size of the object

graph and slow access speed.


Reference counting(引用计数法):对每个Object,都记录这个Object被其他Object所引用的次数,可以称为counter。当这个Object的被其他Object所引用的次数为0的时候,这个Object是可以被GC所摧毁的。当这个Object被摧毁时,所有被这个Object引用的这个Object对象的counter都会减一。

这样一旦某个对象的counter == 0,这个对象就会被立即回收,有个解决办法:把counter == 0的对象放在一个list里面,用来存储未被其他Object所引用的对象。

引用计数法可以被用在磁盘操作系统(disk operating systems),和分布式系统(distributed systems),而且在PHP的ZEND引擎用的也是引用计数法(好像把。。。)

B被A引用,D被C引用。所以 counterB == 1 ,counterD == 1。

现将A引用指针指向D,counterD == 2,counterB == 0 。所以B可以被GC回收。

现在将A回收(因为counterA == 0,所以可以被回收)则counterD == 1。(取消了A对D的引用)




主要代表是Cheney’s algorithm,是semi-space collector的典型代表。

In this moving GC scheme, memory is partitioned into a “from space”

and “to space”. Initially, objects are allocated into “to space” until

they become full and a collection is triggered. At the start of a

collection, the “to space” becomes the “from space”, and vice versa.

The objects reachable from the root set are copied from the “from

space” to the “to space”. These objects are scanned in turn, and all

objects that they point to are copied into “to space”, until all

reachable objects have been copied into “to space”. Once the program

continues execution, new objects are once again allocated in the “to

space” until it is once again full and the process is repeated. This

approach has the advantage of conceptual simplicity (the three object

color sets are implicitly constructed during the copying process), but

the disadvantage that a (possibly) very large contiguous region of

free memory is necessarily required on every collection cycle. This

technique is also known as stop-and-copy.

The algorithm needs no stack and only two pointers outside of the from-space and to-space: a pointer to the beginning of free space in the to-space, and a pointer to the next word in to-space that needs to be examined. For this reason, it’s sometimes called a “two-finger” collector — it only needs “two fingers” pointing into the to-space to keep track of its state. The data between the two fingers represents work remaining for it to do.



从root set开始遍历(采用DFS算法遍历),可以到达(reachable)的Object将从一个区被拷贝到另一个区




从垃圾回收算法到Object Pool


Tracing collectors are so called because they trace through the

working set of memory. These garbage collectors perform collection in

cycles. A cycle is started when the collector decides (or is notified)

that it needs to reclaim memory, which happens most often when the

system is low on memory[citation needed]. The original method involves

a naïve mark-and-sweep in which the entire memory set is touched

several times.

In the naive mark-and-sweep method, each object in memory has a flag

(typically a single bit) reserved for garbage collection use only.

This flag is always cleared, except during the collection cycle. The

first stage of collection does a tree traversal of the entire ‘root

set’, marking each object that is pointed to as being ‘in-use’. All

objects that those objects point to, and so on, are marked as well, so

that every object that is ultimately pointed to from the root set is

marked. Finally, all memory is scanned from start to finish, examining

all free or used blocks; those with the in-use flag still cleared are

not reachable by any program or data, and their memory is freed. (For

objects which are marked in-use, the in-use flag is cleared again,

preparing for the next cycle.)

This method has several disadvantages, the most notable being that the

entire system must be suspended during collection; no mutation of the

working set can be allowed. This will cause programs to ‘freeze’

periodically (and generally unpredictably), making real-time and

time-critical applications impossible. In addition, the entire working

memory must be examined, much of it twice, potentially causing

problems in paged memory systems.










6.1 Table-based compaction

A table-based algorithm was first described by Haddon and Waite in

1967.[1] It preserves the relative placement of the live objects in the heap, and requires only a constant amount of overhead.

Compaction proceeds from the bottom of the heap (low addresses) to the

top (high addresses). As live (that is, marked) objects are

encountered, they are moved to the first available low address, and a

record is appended to a break table of relocation information. For

each live object, a record in the break table consists of the object’s

original address before the compaction and the difference between the

original address and the new address after compaction. The break table

is stored in the heap that is being compacted, but in an area that are

marked as unused. To ensure that compaction will always succeed, the

minimum object size in the heap must be larger than or the same size

as a break table record.

As compaction progresses, relocated objects are copied towards the

bottom of the heap. Eventually an object will need to be copied to the

space occupied by the break table, which now must be relocated

elsewhere. These movements of the break table, (called rolling the

table by the authors) cause the relocation records to become

disordered, requiring the break table to be sorted after the

compaction is complete. The cost of sorting the break table is O(n log

n), where n is the number of live objects that were found in the mark

stage of the algorithm.

Finally, the break table relocation records are used to adjust pointer

fields inside the relocated objects. The live objects are examined for

pointers, which can be looked up in the sorted break table of size n

in O(log n) time if the break table is sorted, for a total running

time of O(n log n). Pointers are then adjusted by the amount specified

in the relocation table.


首先有一个指针,从heap的bottom(low addresses)遍历到top(high addresses)

在遍历的过程中,当遇到一个live的对象,把这个对象移到heap的bottom端,并且在这个heap中创建一个break table,这个table存储2个内容,这个对象的长度,这个对象原来的位置和移动过后的位置的差值

最终所有的对象,都被移动到bottom端。由于移动之后,原来所有stack与heap的指向都是错误的,所以现在要将所有的指向给重行定位,得到最终结果,对于重新定位,当然要遍历整个break-table了。(注意,因为break table存储在heap中,所以 the minimum object size in the heap must be larger than or the same size as a break table record)

最后有一点注意:算法的时间复杂度为O(n log n),where n is the number of live objects that were found in the mark stage of the algorithm.怎么算的呢?遍历break-table,所以得到了算法基础的排序:n个对象排序的时间复杂度咯?这里有连接

6.2 LISP2 Algorithm

In order to avoid O(n log n) complexity, the LISP2 algorithm uses 3

different passes over the heap. In addition, heap objects must have a

separate forwarding pointer slot that is not used outside of garbage


After standard marking, the algorithm proceeds in the following 3


Compute the forwarding location for live objects.

Keep track of a free and live pointer and initialize both to the start of heap. If the live

pointer points to a live object, update that object’s forwarding

pointer to the current free pointer and increment the free pointer

according to the object’s size. Move the live pointer to the next

object End when the live pointer reaches the end of heap.

Update all

pointers For each live object, update its pointers according to the

forwarding pointers of the objects they point to.

Move objects For

each live object, move its data to its forwarding location. This

algorithm is O(n) on the size of the heap; it has a better complexity

than the table-based approach, but the table-based approach’s n is the

size of the used space only, not the entire heap space as in the LISP2

algorithm. However, the LISP2 algorithm is simpler to implement.











Moving vs. non-moving




Stop-the-world vs. concurrent


内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  java 算法 GC