您的位置:首页 > 其它

HashMap实现原理分析-resize()详解

2017-06-29 14:29 609 查看

为什么会有resize()方法

介绍resize() 方法前先了解一下Java为什么会有resize()方法,他的作用是什么,我们有一个默认认知是,HashMap的get查找的复杂度是O(1)的,那么如果初始散列表大小是16,加载因子是0.75的话,如果数据量过多(例如256),按照拉链法,每一个bucketIndex位置上的单链表的长度都会很长(并触发上节所贴代码的红黑树转化),在单链表中查找元素的复杂度为O(n),几乎远远不能达到O(1)的性能,在hashMap的情景中优化O(n) 的方式就是使n足够小,即BucketIndex碰撞的机会足够小。那么我们就需要加大散列表的长度,使key的hashCode计算出的bucketIndex均匀分散,所以java中使用了resize()
方法拉大散列表。

*hash碰撞的时候,Java8把链表替换成了TreeMap,使得查询性能提升为O(logN),这个值为8,即链表长度为8时,将转换链表为TreeMap。

我们先看看Java中hashMap的各成员变量,先忽略与树有关的部分。

/**
* The default initial capacity - MUST be a power of two.
* 箱子的个数不能太多或太少。如果太少,很容易触发扩容,如果太多,遍历哈希表会比较慢。
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
* 最大容量为2的30次方
*/
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
* The load factor used when none specified in constructor.
* 默认加载因子0.75
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;

/**
* The bin count threshold for using a tree rather than list for a
* bin.  Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*如果哈希函数不合理,即使扩容也无法减少箱子中链表的长度,因此 Java 的处理方案是当链表太长时,转换成红黑树。这个值表示当某个箱子中,链表长度大于 8 时,有可能会转化成树。
*/
static final int TREEIFY_THRESHOLD = 8;

/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
* 在哈希表扩容时,如果发现链表长度小于 6,则会由树重新退化为链表。
*/
static final int UNTREEIFY_THRESHOLD = 6;

/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
* 在转变成树之前,还会有一次判断,只有键值对数量大于 64 才会发生转换。这是为了避免在哈希表建立初期,多个键值对恰好被放入了同一个链表中而导致不必要的转化。
*/
static final int MIN_TREEIFY_CAPACITY = 64;/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
* 散列表(bucket)
*/
transient Node<K,V>[] table;

/**
* Holds cached entrySet(). Note that AbstractMap fields are used
* for keySet() and values().
*/
transient Set<Map.Entry<K,V>> entrySet;

/**
* The number of key-value mappings contained in this map.
*/
transient int size;

/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash).  This field is used to make iterators on Collection-views of
* the HashMap fail-fast.  (See ConcurrentModificationException).
*/
transient int modCount;
/**
* The next size value at which to resize (capacity * load factor).
*
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.
int threshold; //当前散列表的临界值
/**
* The load factor for the hash table.
*
* @serial
*/
final float loadFactor; //当前的加载因子

resize方法的代码如下:

/**
* Initializes or doubles table size.  If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
*
* @return the table
*/
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;  //table即原散列表,当map初始化以后,第一次put时,该值为null
int oldCap = (oldTab == null) ? 0 : oldTab.length; //记录原始的散列表大小
int oldThr = threshold;  //当前使用的桶大小
int newCap, newThr = 0; //初始化新值
if (oldCap > 0) { //如果原始大小大于0
if (oldCap >= MAXIMUM_CAPACITY) { //如果桶大小大于最大容量,则直接返回Integer.MAX_VALUE,并返回原值,不错扩容处理
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)   //如果新值的2倍小于最大容量,并且原始大小大于默认初始化值
newThr = oldThr << 1; // double threshold  //则设置新临界值为2倍原临界值
}
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else {               // zero initial threshold signifies using defaults   //如使用默认构造器,第一次put时肯定进入该代码段,即初始大小为默认16,初始临界值为12
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr; //设置临界值变量为新计算的值
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; //按照新算出来的大小来初始化一个新的散列表
table = newTab; //将新桶赋值给成员变量
if (oldTab != null) { //如果旧的散列表里有数据,将旧散列表中的数据取出按新散列表大小重新计算BucketIndex,并存于新散列表中
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}

总结:

1.resize时,HashMap使用新数组代替旧数组,对原有的元素根据hash值重新就算索引位置,重新安放所有对象;resize是耗时的操作。

2.每次resize新的散列数组长度是原来的2倍

3.当HashMap散列数组的长度大于>2的30次幂将不再扩充数组,直接将数组大小设置为Integer.MAX_VALUE
4.当hash碰撞较多时,链表长度大于等于8将转换单链表至红黑树(Java8优化)

优化hashMap

       由以上代码分析得知为使 Map 对象有效地处理任意数目的项,Map 实现可以调整自身的大小。但调整大小的开销很大。调整大小需要将所有元素重新插入到新数组中,这是因为不同的数组大小意味着对象现在映射到不同的索引值。先前冲突的键可能不再冲突,而先前不冲突的其他键现在可能冲突。这显然表明,如果将 Map 调整得足够大,则可以减少甚至不再需要重新调整大小,这很有可能显著提高速度。

如何提升性能?

1.当你要创建一个比较大的hashMap时,充分利用另一个构造函数
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param  initialCapacity the initial capacity
* @param  loadFactor      the load factor
* @throws IllegalArgumentException if the initial capacity is negative
*         or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor)
initialCapacity:初始容量 和loadFactor:加载因子。容量 是哈希表中桶的数量,初始容量只是哈希表在创建时的容量。加载因子 是哈希表在其容量自动增加之前可以达到多满的一种尺度。当哈希表中的条目数超出了加载因子与当前容量的乘积时,通过调用 rehash 方法将容量翻倍。 

应该避免HashMap多次进行了hash重构,扩容是一件很耗费性能的事,在默认中initialCapacity只有16,而loadFactor是 0.75,需要多大的容量,你最好能准确的估计你所需要的最佳大小,同样的Hashtable,Vectors也是一样的道理。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: