您的位置：首页 > 其它

HashMap实现原理分析-resize()详解

2017-06-29 14:29 609 查看

为什么会有resize()方法

介绍resize() 方法前先了解一下Java为什么会有resize()方法，他的作用是什么，我们有一个默认认知是，HashMap的get查找的复杂度是O(1)的，那么如果初始散列表大小是16，加载因子是0.75的话，如果数据量过多（例如256），按照拉链法，每一个bucketIndex位置上的单链表的长度都会很长（并触发上节所贴代码的红黑树转化），在单链表中查找元素的复杂度为O(n),几乎远远不能达到O(1)的性能，在hashMap的情景中优化O(n) 的方式就是使n足够小，即BucketIndex碰撞的机会足够小。那么我们就需要加大散列表的长度，使key的hashCode计算出的bucketIndex均匀分散，所以java中使用了resize()
方法拉大散列表。

*hash碰撞的时候，Java8把链表替换成了TreeMap，使得查询性能提升为O(logN)，这个值为8，即链表长度为8时，将转换链表为TreeMap。

我们先看看Java中hashMap的各成员变量，先忽略与树有关的部分。

/**
* The default initial capacity - MUST be a power of two.
* 箱子的个数不能太多或太少。如果太少，很容易触发扩容，如果太多，遍历哈希表会比较慢。
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
* 最大容量为2的30次方
*/
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
* The load factor used when none specified in constructor.
* 默认加载因子0.75
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;

/**
* The bin count threshold for using a tree rather than list for a
* bin.  Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*如果哈希函数不合理，即使扩容也无法减少箱子中链表的长度，因此 Java 的处理方案是当链表太长时，转换成红黑树。这个值表示当某个箱子中，链表长度大于 8 时，有可能会转化成树。
*/
static final int TREEIFY_THRESHOLD = 8;

/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
* 在哈希表扩容时，如果发现链表长度小于 6，则会由树重新退化为链表。
*/
static final int UNTREEIFY_THRESHOLD = 6;

/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
* 在转变成树之前，还会有一次判断，只有键值对数量大于 64 才会发生转换。这是为了避免在哈希表建立初期，多个键值对恰好被放入了同一个链表中而导致不必要的转化。
*/
static final int MIN_TREEIFY_CAPACITY = 64;/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
* 散列表（bucket）
*/
transient Node<K,V>[] table;

/**
* Holds cached entrySet(). Note that AbstractMap fields are used
* for keySet() and values().
*/
transient Set<Map.Entry<K,V>> entrySet;

/**
* The number of key-value mappings contained in this map.
*/
transient int size;

/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash).  This field is used to make iterators on Collection-views of
* the HashMap fail-fast.  (See ConcurrentModificationException).
*/
transient int modCount;
/**
* The next size value at which to resize (capacity * load factor).
*
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.
int threshold; //当前散列表的临界值
/**
* The load factor for the hash table.
*
* @serial
*/
final float loadFactor; //当前的加载因子

resize方法的代码如下：

/**
* Initializes or doubles table size.  If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
*
* @return the table
*/
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;  //table即原散列表，当map初始化以后，第一次put时，该值为null
int oldCap = (oldTab == null) ? 0 : oldTab.length; //记录原始的散列表大小
int oldThr = threshold;  //当前使用的桶大小
int newCap, newThr = 0; //初始化新值
if (oldCap > 0) { //如果原始大小大于0
if (oldCap >= MAXIMUM_CAPACITY) { //如果桶大小大于最大容量，则直接返回Integer.MAX_VALUE,并返回原值，不错扩容处理
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)   //如果新值的2倍小于最大容量，并且原始大小大于默认初始化值
newThr = oldThr << 1; // double threshold  //则设置新临界值为2倍原临界值
}
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else {               // zero initial threshold signifies using defaults   //如使用默认构造器，第一次put时肯定进入该代码段，即初始大小为默认16，初始临界值为12
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr; //设置临界值变量为新计算的值
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; //按照新算出来的大小来初始化一个新的散列表
table = newTab; //将新桶赋值给成员变量
if (oldTab != null) { //如果旧的散列表里有数据，将旧散列表中的数据取出按新散列表大小重新计算BucketIndex，并存于新散列表中
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}

总结：

1.resize时，HashMap使用新数组代替旧数组，对原有的元素根据hash值重新就算索引位置，重新安放所有对象；resize是耗时的操作。

2.每次resize新的散列数组长度是原来的2倍

3.当HashMap散列数组的长度大于>2的30次幂将不再扩充数组，直接将数组大小设置为Integer.MAX_VALUE
4.当hash碰撞较多时，链表长度大于等于8将转换单链表至红黑树（Java8优化）

优化hashMap

由以上代码分析得知为使 Map 对象有效地处理任意数目的项，Map 实现可以调整自身的大小。但调整大小的开销很大。调整大小需要将所有元素重新插入到新数组中，这是因为不同的数组大小意味着对象现在映射到不同的索引值。先前冲突的键可能不再冲突，而先前不冲突的其他键现在可能冲突。这显然表明，如果将 Map 调整得足够大，则可以减少甚至不再需要重新调整大小，这很有可能显著提高速度。

如何提升性能？

1.当你要创建一个比较大的hashMap时，充分利用另一个构造函数

/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param  initialCapacity the initial capacity
* @param  loadFactor      the load factor
* @throws IllegalArgumentException if the initial capacity is negative
*         or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor)

initialCapacity：初始容量和loadFactor：加载因子。容量是哈希表中桶的数量，初始容量只是哈希表在创建时的容量。加载因子是哈希表在其容量自动增加之前可以达到多满的一种尺度。当哈希表中的条目数超出了加载因子与当前容量的乘积时，通过调用 rehash 方法将容量翻倍。

应该避免HashMap多次进行了hash重构,扩容是一件很耗费性能的事，在默认中initialCapacity只有16，而loadFactor是 0.75，需要多大的容量，你最好能准确的估计你所需要的最佳大小，同样的Hashtable，Vectors也是一样的道理。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航