HashSet源码分析(基于1.8)
2016-06-24 16:40
267 查看
其实对于HashSet,他的源码很简单,只是里面有些问题我感觉有必要拿出来跟大家共同探讨一下。
从上面的源码我们可以看出,add()这个方法实际上是调用map中的put方法,而put()又调用putVal这个方法,因此下面我贴出HashMap中putVal()这个方法来着重分析一下。
现在,HashMap的put动作(也就是HashSet的add动作)就分析完了,可能现在很多人还处于模糊状态。下面我再结合一些实际情况说一下。
1.对key对象的要求?
答:对于key对象,我们一般需要重写key对象对应类的hashcode()和equals()方法。因为如果2个对象通过equals()方法返回为true,那么java要求此时2个对象的hashcode()方法返回的hash值要相同。
2.如果添加了一个key和value,而在集合中之前已经有了该key,具体执行动作是怎样的?
答:如果key已经存在,那么就会执行到上面putVal()源码中“e=p;”这一句,然后转向“下面处理映射已经存在的情况”,然后就会覆盖之前的value。
3.如果通过equals()方法比较返回true,hashcode()方法返回的结果却不相等,会产生什么后果?
答:通过上面的源码我们可以看到,对于一个key,程序会先通过该key计算出其hash值,下面给出源码中hash值的计算代码:
可见,该值与hashcode()方法有关,一般正常来说,通过equals()方法比较为true的2个对象,我们是默认为他们是相同的对象,如果他们hashcode()方法返回的值不同,他们通过hash()方法返回的值也就不同,因此当A1已经存入HashMap时,与A1相等的A2再次执行put动作时,不会根据相同的hash值查找到之前与他相同的A1,而是会查找到另外一个位置,如果恰好该位置为null,那么就会成功执行插入动作,这时我们就会看到,2个相同的key均插入进同一个HashMap对象中了。
HashSet底层是HashMap实现
我们知道,HashMap中key值是不能“重复”的(这个是否重复是通过hashcode和equals比较出来的,这是一个值得探讨的问题),HashSet正是借鉴了HashMap的key的这样一个特性,以此产生了这样一个不能包含重复数据的集合。源码分析
由于源码不是很多,所以我连同注释一同贴出来了:package java.util; import java.io.InvalidObjectException; /** * This class implements the <tt>Set</tt> interface, backed by a hash table * (actually a <tt>HashMap</tt> instance). It makes no guarantees as to the * iteration order of the set; in particular, it does not guarantee that the * order will remain constant over time. This class permits the <tt>null</tt> * element. * 这个类由hash表支持,并实现了Set接口(实际上他是一个HashMap实例)。他不保证set中的遍历顺序,尤其 * 特别的是,他也不保证set中数据顺序的恒久不变。这个类允许null的存在。 * <p>This class offers constant time performance for the basic operations * (<tt>add</tt>, <tt>remove</tt>, <tt>contains</tt> and <tt>size</tt>), * assuming the hash function disperses the elements properly among the * buckets. Iterating over this set requires time proportional to the sum of * the <tt>HashSet</tt> instance's size (the number of elements) plus the * "capacity" of the backing <tt>HashMap</tt> instance (the number of * buckets). Thus, it's very important not to set the initial capacity too * high (or the load factor too low) if iteration performance is important. * 上面这段话跟HashMap中的是一样的 * <p><strong>Note that this implementation is not synchronized.</strong> * If multiple threads access a hash set concurrently, and at least one of * the threads modifies the set, it <i>must</i> be synchronized externally. * This is typically accomplished by synchronizing on some object that * naturally encapsulates the set. * * If no such object exists, the set should be "wrapped" using the * {@link Collections#synchronizedSet Collections.synchronizedSet} * method. This is best done at creation time, to prevent accidental * unsynchronized access to the set:<pre> * Set s = Collections.synchronizedSet(new HashSet(...));</pre> * * <p>The iterators returned by this class's <tt>iterator</tt> method are * <i>fail-fast</i>: if the set is modified at any time after the iterator is * created, in any way except through the iterator's own <tt>remove</tt> * method, the Iterator throws a {@link ConcurrentModificationException}. * Thus, in the face of concurrent modification, the iterator fails quickly * and cleanly, rather than risking arbitrary, non-deterministic behavior at * an undetermined time in the future. * * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed * as it is, generally speaking, impossible to make any hard guarantees in the * presence of unsynchronized concurrent modification. Fail-fast iterators * throw <tt>ConcurrentModificationException</tt> on a best-effort basis. * Therefore, it would be wrong to write a program that depended on this * exception for its correctness: <i>the fail-fast behavior of iterators * should be used only to detect bugs.</i> * * <p>This class is a member of the * <a href="{@docRoot}/../technotes/guides/collections/index.html"> * Java Collections Framework</a>. * * @param <E> the type of elements maintained by this set * * @author Josh Bloch * @author Neal Gafter * @see Collection * @see Set * @see TreeSet * @see HashMap * @since 1.2 */ public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, java.io.Serializable { static final long serialVersionUID = -5024744406713321676L; private transient HashMap<E,Object> map;//底层是HashMap实现的 // Dummy value to associate with an Object in the backing Map //PRESENT是一个假的value值,帮助用HashMap实现HashSet。 private static final Object PRESENT = new Object(); /** * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has * default initial capacity (16) and load factor (0.75). */ //注意HashMap的默认初始容量和负载因子 public HashSet() { map = new HashMap<>(); } /** * Constructs a new set containing the elements in the specified * collection. The <tt>HashMap</tt> is created with default load factor * (0.75) and an initial capacity sufficient to contain the elements in * the specified collection. * * @param c the collection whose elements are to be placed into this set * @throws NullPointerException if the specified collection is null */ public HashSet(Collection<? extends E> c) { map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16)); addAll(c); } /** * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has * the specified initial capacity and the specified load factor. * * @param initialCapacity the initial capacity of the hash map * @param loadFactor the load factor of the hash map * @throws IllegalArgumentException if the initial capacity is less * than zero, or if the load factor is nonpositive */ public HashSet(int initialCapacity, float loadFactor) { map = new HashMap<>(initialCapacity, loadFactor); } /** * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has * the specified initial capacity and default load factor (0.75). * * @param initialCapacity the initial capacity of the hash table * @throws IllegalArgumentException if the initial capacity is less * than zero */ public HashSet(int initialCapacity) { map = new HashMap<>(initialCapacity); } /** * Constructs a new, empty linked hash set. (This package private * constructor is only used by LinkedHashSet.) The backing * HashMap instance is a LinkedHashMap with the specified initial * capacity and the specified load factor. * * @param initialCapacity the initial capacity of the hash map * @param loadFactor the load factor of the hash map * @param dummy ignored (distinguishes this * constructor from other int, float constructor.) * @throws IllegalArgumentException if the initial capacity is less * than zero, or if the load factor is nonpositive */ HashSet(int initialCapacity, float loadFactor, boolean dummy) { map = new LinkedHashMap<>(initialCapacity, loadFactor); } /** * Returns an iterator over the elements in this set. The elements * are returned in no particular order. * * @return an Iterator over the elements in this set * @see ConcurrentModificationException */ public Iterator<E> iterator() { return map.keySet().iterator(); } /** * Returns the number of elements in this set (its cardinality). * * @return the number of elements in this set (its cardinality) */ public int size() { return map.size(); } /** * Returns <tt>true</tt> if this set contains no elements. * * @return <tt>true</tt> if this set contains no elements */ public boolean isEmpty() { return map.isEmpty(); } /** * Returns <tt>true</tt> if this set contains the specified element. * More formally, returns <tt>true</tt> if and only if this set * contains an element <tt>e</tt> such that * <tt>(o==null ? e==null : o.equals(e))</tt>. * * @param o element whose presence in this set is to be tested * @return <tt>true</tt> if this set contains the specified element */ public boolean contains(Object o) { return map.containsKey(o); } /** * Adds the specified element to this set if it is not already present. * More formally, adds the specified element <tt>e</tt> to this set if * this set contains no element <tt>e2</tt> such that * <tt>(e==null ? e2==null : e.equals(e2))</tt>. * If this set already contains the element, the call leaves the set * unchanged and returns <tt>false</tt>. * * @param e element to be added to this set * @return <tt>true</tt> if this set did not already contain the specified * element */ public boolean add(E e) { return map.put(e, PRESENT)==null; } /** * Removes the specified element from this set if it is present. * More formally, removes an element <tt>e</tt> such that * <tt>(o==null ? e==null : o.equals(e))</tt>, * if this set contains such an element. Returns <tt>true</tt> if * this set contained the element (or equivalently, if this set * changed as a result of the call). (This set will not contain the * element once the call returns.) * * @param o object to be removed from this set, if present * @return <tt>true</tt> if the set contained the specified element */ public boolean remove(Object o) { return map.remove(o)==PRESENT; } /** * Removes all of the elements from this set. * The set will be empty after this call returns. */ public void clear() { map.clear(); } /** * Returns a shallow copy of this <tt>HashSet</tt> instance: the elements * themselves are not cloned. * * @return a shallow copy of this set */ @SuppressWarnings("unchecked") public Object clone() { try { HashSet<E> newSet = (HashSet<E>) super.clone(); newSet.map = (HashMap<E, Object>) map.clone(); return newSet; } catch (CloneNotSupportedException e) { throw new InternalError(e); } } /** * Save the state of this <tt>HashSet</tt> instance to a stream (that is, * serialize it). * * @serialData The capacity of the backing <tt>HashMap</tt> instance * (int), and its load factor (float) are emitted, followed by * the size of the set (the number of elements it contains) * (int), followed by all of its elements (each an Object) in * no particular order. */ //同样支持序列化传输 private void writeObject(java.io.ObjectOutputStream s) throws java.io.IOException { // Write out any hidden serialization magic s.defaultWriteObject(); // Write out HashMap capacity and load factor s.writeInt(map.capacity()); s.writeFloat(map.loadFactor()); // Write out size s.writeInt(map.size()); // Write out all elements in the proper order. for (E e : map.keySet()) s.writeObject(e); } /** * Reconstitute the <tt>HashSet</tt> instance from a stream (that is, * deserialize it). */ private void readObject(java.io.ObjectInputStream s) throws java.io.IOException, ClassNotFoundException { // Read in any hidden serialization magic s.defaultReadObject(); // Read capacity and verify non-negative. int capacity = s.readInt(); if (capacity < 0) { throw new InvalidObjectException("Illegal capacity: " + capacity); } // Read load factor and verify positive and non NaN. float loadFactor = s.readFloat(); if (loadFactor <= 0 || Float.isNaN(loadFactor)) { throw new InvalidObjectException("Illegal load factor: " + loadFactor); } // Read size and verify non-negative. int size = s.readInt(); if (size < 0) { throw new InvalidObjectException("Illegal size: " + size); } // Set the capacity according to the size and load factor ensuring that // the HashMap is at least 25% full but clamping to maximum capacity. capacity = (int) Math.min(size * Math.min(1 / loadFactor, 4.0f), HashMap.MAXIMUM_CAPACITY); // Create backing HashMap map = (((HashSet<?>)this) instanceof LinkedHashSet ? new LinkedHashMap<E,Object>(capacity, loadFactor) : new HashMap<E,Object>(capacity, loadFactor)); // Read in all elements in the proper order. for (int i=0; i<size; i++) { @SuppressWarnings("unchecked") E e = (E) s.readObject(); map.put(e, PRESENT); } } /** * Creates a <em><a href="Spliterator.html#binding">late-binding</a></em> * and <em>fail-fast</em> {@link Spliterator} over the elements in this * set. * * <p>The {@code Spliterator} reports {@link Spliterator#SIZED} and * {@link Spliterator#DISTINCT}. Overriding implementations should document * the reporting of additional characteristic values. * * @return a {@code Spliterator} over the elements in this set * @since 1.8 */ public Spliterator<E> spliterator() { return new HashMap.KeySpliterator<E,Object>(map, 0, -1, 0, 0); } }
HashSet中值得注意的方法
既然HashSet中不能包含重复的数据,那么我们就着重探讨一下add()这个方法。从上面的源码我们可以看出,add()这个方法实际上是调用map中的put方法,而put()又调用putVal这个方法,因此下面我贴出HashMap中putVal()这个方法来着重分析一下。
//进行插入操作,分为三种情况,1.插入位置无数据,直接存入 2.插入位置有数据,但是较少且符合链表结构存储的条件,那么以链表操作存入 //3.插入位置有数据,但是以树结构进行存储,那么以树的相关操作进行存入 //下面这段代码我们应当注意的是,hash的值是用key计算出来的。 final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) //1.如果HashMap未被初始化,那么初始化它 n = (tab = resize()).length; if ((p = tab[i = (n - 1) & hash]) == null)//如果索引处的数据是null,那么单独处理这种情况 tab[i] = newNode(hash, key, value, null); //下面处理不是null的情况 else { Node<K,V> e; K k; //下面的p在上面已经赋值过, p = tab[i = (n - 1)&hash]代表的是key索引位置处原来存在的数据 //另外,p的hash成员是通过key值计算出来的 if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) //我们这里要特别注意,上面条件判断的是:p.hash == hash &&p.key==key||(key != null && key.equals(k)),又“==”是比较地址的,因此只要key地址相同或者key的equals方法返回为true,就会得到e=p e = p; else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } //下面处理key映射已经存在的情况 if (e != null) { // existing mapping for key 如果映射关系存在 V oldValue = e.value;//e的value给oldVaule //在HashSet中,key已存在的话,e.value就是PRESENT,那么当已经存在key时,如果再次添加相同的key,下面if内的语句会执行,因此也就会执行添加动作。 if (!onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }
现在,HashMap的put动作(也就是HashSet的add动作)就分析完了,可能现在很多人还处于模糊状态。下面我再结合一些实际情况说一下。
1.对key对象的要求?
答:对于key对象,我们一般需要重写key对象对应类的hashcode()和equals()方法。因为如果2个对象通过equals()方法返回为true,那么java要求此时2个对象的hashcode()方法返回的hash值要相同。
2.如果添加了一个key和value,而在集合中之前已经有了该key,具体执行动作是怎样的?
答:如果key已经存在,那么就会执行到上面putVal()源码中“e=p;”这一句,然后转向“下面处理映射已经存在的情况”,然后就会覆盖之前的value。
3.如果通过equals()方法比较返回true,hashcode()方法返回的结果却不相等,会产生什么后果?
答:通过上面的源码我们可以看到,对于一个key,程序会先通过该key计算出其hash值,下面给出源码中hash值的计算代码:
static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
可见,该值与hashcode()方法有关,一般正常来说,通过equals()方法比较为true的2个对象,我们是默认为他们是相同的对象,如果他们hashcode()方法返回的值不同,他们通过hash()方法返回的值也就不同,因此当A1已经存入HashMap时,与A1相等的A2再次执行put动作时,不会根据相同的hash值查找到之前与他相同的A1,而是会查找到另外一个位置,如果恰好该位置为null,那么就会成功执行插入动作,这时我们就会看到,2个相同的key均插入进同一个HashMap对象中了。
相关文章推荐
- 从源码安装Mysql/Percona 5.5
- c语言实现hashmap(转载)
- 浅析Ruby的源代码布局及其编程风格
- Equals和==的区别 公共变量和属性的区别小结
- asp.net 抓取网页源码三种实现方法
- JS小游戏之仙剑翻牌源码详解
- JS小游戏之宇宙战机源码详解
- jQuery源码分析之jQuery中的循环技巧详解
- JS hashMap实例详解
- 本人自用的global.js库源码分享
- java中原码、反码与补码的问题分析
- ASP.NET使用HttpWebRequest读取远程网页源代码
- 解析WeakHashMap与HashMap的区别详解
- java String 类的一些理解 关于==、equals、null
- 全面解析Java中的HashMap类
- hashCode方法的使用讲解
- C#使用Equals()方法比较两个对象是否相等的方法
- PHP网页游戏学习之Xnova(ogame)源码解读(六)
- C#获取网页HTML源码实例
- PHP网页游戏学习之Xnova(ogame)源码解读(八)