HashMap实现原理分析-resize()详解
2017-06-29 14:29
609 查看
为什么会有resize()方法
介绍resize() 方法前先了解一下Java为什么会有resize()方法,他的作用是什么,我们有一个默认认知是,HashMap的get查找的复杂度是O(1)的,那么如果初始散列表大小是16,加载因子是0.75的话,如果数据量过多(例如256),按照拉链法,每一个bucketIndex位置上的单链表的长度都会很长(并触发上节所贴代码的红黑树转化),在单链表中查找元素的复杂度为O(n),几乎远远不能达到O(1)的性能,在hashMap的情景中优化O(n) 的方式就是使n足够小,即BucketIndex碰撞的机会足够小。那么我们就需要加大散列表的长度,使key的hashCode计算出的bucketIndex均匀分散,所以java中使用了resize()方法拉大散列表。
*hash碰撞的时候,Java8把链表替换成了TreeMap,使得查询性能提升为O(logN),这个值为8,即链表长度为8时,将转换链表为TreeMap。
我们先看看Java中hashMap的各成员变量,先忽略与树有关的部分。
/** * The default initial capacity - MUST be a power of two. * 箱子的个数不能太多或太少。如果太少,很容易触发扩容,如果太多,遍历哈希表会比较慢。 */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 /** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two <= 1<<30. * 最大容量为2的30次方 */ static final int MAXIMUM_CAPACITY = 1 << 30; /** * The load factor used when none specified in constructor. * 默认加载因子0.75 */ static final float DEFAULT_LOAD_FACTOR = 0.75f; /** * The bin count threshold for using a tree rather than list for a * bin. Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * shrinkage. *如果哈希函数不合理,即使扩容也无法减少箱子中链表的长度,因此 Java 的处理方案是当链表太长时,转换成红黑树。这个值表示当某个箱子中,链表长度大于 8 时,有可能会转化成树。 */ static final int TREEIFY_THRESHOLD = 8; /** * The bin count threshold for untreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD, and at * most 6 to mesh with shrinkage detection under removal. * 在哈希表扩容时,如果发现链表长度小于 6,则会由树重新退化为链表。 */ static final int UNTREEIFY_THRESHOLD = 6; /** * The smallest table capacity for which bins may be treeified. * (Otherwise the table is resized if too many nodes in a bin.) * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts * between resizing and treeification thresholds. * 在转变成树之前,还会有一次判断,只有键值对数量大于 64 才会发生转换。这是为了避免在哈希表建立初期,多个键值对恰好被放入了同一个链表中而导致不必要的转化。 */ static final int MIN_TREEIFY_CAPACITY = 64;/** * The table, initialized on first use, and resized as * necessary. When allocated, length is always a power of two. * (We also tolerate length zero in some operations to allow * bootstrapping mechanics that are currently not needed.) * 散列表(bucket) */ transient Node<K,V>[] table; /** * Holds cached entrySet(). Note that AbstractMap fields are used * for keySet() and values(). */ transient Set<Map.Entry<K,V>> entrySet; /** * The number of key-value mappings contained in this map. */ transient int size; /** * The number of times this HashMap has been structurally modified * Structural modifications are those that change the number of mappings in * the HashMap or otherwise modify its internal structure (e.g., * rehash). This field is used to make iterators on Collection-views of * the HashMap fail-fast. (See ConcurrentModificationException). */ transient int modCount; /** * The next size value at which to resize (capacity * load factor). * * @serial */ // (The javadoc description is true upon serialization. // Additionally, if the table array has not been allocated, this // field holds the initial array capacity, or zero signifying // DEFAULT_INITIAL_CAPACITY. int threshold; //当前散列表的临界值 /** * The load factor for the hash table. * * @serial */ final float loadFactor; //当前的加载因子
resize方法的代码如下:
/** * Initializes or doubles table size. If null, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either stay at same index, or move * with a power of two offset in the new table. * * @return the table */ final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; //table即原散列表,当map初始化以后,第一次put时,该值为null int oldCap = (oldTab == null) ? 0 : oldTab.length; //记录原始的散列表大小 int oldThr = threshold; //当前使用的桶大小 int newCap, newThr = 0; //初始化新值 if (oldCap > 0) { //如果原始大小大于0 if (oldCap >= MAXIMUM_CAPACITY) { //如果桶大小大于最大容量,则直接返回Integer.MAX_VALUE,并返回原值,不错扩容处理 threshold = Integer.MAX_VALUE; return oldTab; } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) //如果新值的2倍小于最大容量,并且原始大小大于默认初始化值 newThr = oldThr << 1; // double threshold //则设置新临界值为2倍原临界值 } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults //如使用默认构造器,第一次put时肯定进入该代码段,即初始大小为默认16,初始临界值为12 newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; //设置临界值变量为新计算的值 @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; //按照新算出来的大小来初始化一个新的散列表 table = newTab; //将新桶赋值给成员变量 if (oldTab != null) { //如果旧的散列表里有数据,将旧散列表中的数据取出按新散列表大小重新计算BucketIndex,并存于新散列表中 for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); else { // preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); if (loTail != null) { loTail.next = null; newTab[j] = loHead; } if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }
总结:
1.resize时,HashMap使用新数组代替旧数组,对原有的元素根据hash值重新就算索引位置,重新安放所有对象;resize是耗时的操作。2.每次resize新的散列数组长度是原来的2倍
3.当HashMap散列数组的长度大于>2的30次幂将不再扩充数组,直接将数组大小设置为Integer.MAX_VALUE
4.当hash碰撞较多时,链表长度大于等于8将转换单链表至红黑树(Java8优化)
优化hashMap
由以上代码分析得知为使 Map 对象有效地处理任意数目的项,Map 实现可以调整自身的大小。但调整大小的开销很大。调整大小需要将所有元素重新插入到新数组中,这是因为不同的数组大小意味着对象现在映射到不同的索引值。先前冲突的键可能不再冲突,而先前不冲突的其他键现在可能冲突。这显然表明,如果将 Map 调整得足够大,则可以减少甚至不再需要重新调整大小,这很有可能显著提高速度。如何提升性能?
1.当你要创建一个比较大的hashMap时,充分利用另一个构造函数/** * Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and load factor. * * @param initialCapacity the initial capacity * @param loadFactor the load factor * @throws IllegalArgumentException if the initial capacity is negative * or the load factor is nonpositive */ public HashMap(int initialCapacity, float loadFactor)initialCapacity:初始容量 和loadFactor:加载因子。容量 是哈希表中桶的数量,初始容量只是哈希表在创建时的容量。加载因子 是哈希表在其容量自动增加之前可以达到多满的一种尺度。当哈希表中的条目数超出了加载因子与当前容量的乘积时,通过调用 rehash 方法将容量翻倍。
应该避免HashMap多次进行了hash重构,扩容是一件很耗费性能的事,在默认中initialCapacity只有16,而loadFactor是 0.75,需要多大的容量,你最好能准确的估计你所需要的最佳大小,同样的Hashtable,Vectors也是一样的道理。
相关文章推荐
- HashMap实现原理分析(详解)
- HashMap实现原理分析(详解)
- HashMap实现原理分析(详解)
- HashMap实现原理分析
- HashMap实现原理分析
- HashMap实现原理分析
- Java学习资料-HashMap实现原理分析
- java无锁hashmap原理与实现详解
- HashMap实现原理分析
- HashMap实现原理分析
- HashMap实现原理分析
- HashMap实现原理分析
- HashMap实现原理分析
- HashMap实现原理分析
- [Java基础要义] HashMap的设计原理和实现分析
- HashMap实现原理分析
- HashMap实现原理分析
- HashMap实现原理分析
- HashMap实现原理分析
- (10) java源码分析 ---- HashMap源码分析 及其 实现原理分析