HashMap源码解析 - 码农教程

原创：转载需注明原创地址 https://www.cnblogs.com/fanerwei222/p/11918123.html

开始!

关于HashMap平时用的多, 面试的时候问的也多, 会问Hash碰撞, Hash表扩容, Hash表的整体实现数据结构, 自己用的时候也会遇到一些问题, 用多个线程去处理HashMap时会发生一点奇怪的东西, 有时候是百思不得其解, 不过确也能知其一二, 之前也看过几遍源码, 但是也是过三两天就忘得差不多了, 现记录下, 便日后方便反复阅读理解.

JDK版本:1.8

HashMap数据结构 : 数组 + 链表 + 红黑树 (JDk1.8)

首先来一张自制的文本数据结构原理图:

略有粗糙, 不过也能看.

JDk1.8的hashmap由数组+链表+红黑树组成, 数组中的一个元素叫做bucket(桶), 一个桶里面可以有多个节点,桶可由链表或者红黑树组成. 关于何时使用链表, 何时使用红黑树在下面的属性中会有说明.

首先介绍一下各个属性值, 有默认的, 也有后期需要使用到的.

★ 红黑树有关的三个值

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 * 桶的树化阈值, 当一个桶中的节点数量不少于这个值的话,
 * 就从链表转化成红黑树, 桶中所有节点由链表节点转化成红黑树节点
 */
static final int TREEIFY_THRESHOLD = 8;

/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 * 桶由树转化成链表的还原阈值
 * 在扩容的时候,桶中的元素小于这个值的话就会把桶中的红黑树转化(还原,切分)成链表
 */
static final int UNTREEIFY_THRESHOLD = 6;

/**
 * The smallest table capacity for which bins may be treeified.
 * (Otherwise the table is resized if too many nodes in a bin.)
 * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
 * between resizing and treeification thresholds.
 * 哈希表的最小树化容量
 * 当整个哈希表的容量大于这个值, 桶才能进行树形化, 否则桶不会进行树形化
 */
static final int MIN_TREEIFY_CAPACITY = 64;

其他一些字段介绍:

/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 * 数组表
 */
transient HashMap.Node<K,V>[] table;

/**
 * Holds cached entrySet(). Note that AbstractMap fields are used
 * for keySet() and values().
 * 缓存条目集
 */
transient Set<Map.Entry<K,V>> entrySet;

/**
 * The number of key-value mappings contained in this map.
 * map当前的大小
 */
transient int size;

/**
 * The number of times this HashMap has been structurally modified
 * Structural modifications are those that change the number of mappings in
 * the HashMap or otherwise modify its internal structure (e.g.,
 * rehash).  This field is used to make iterators on Collection-views of
 * the HashMap fail-fast.  (See ConcurrentModificationException).
 * modCount用于记录HashMap的修改次数,
 * 在HashMap的put(),get(),remove(),Interator()等方法中,都使用了该属性
 * 由于HashMap不是线程安全的,所以在迭代的时候,会将modCount赋值到迭代器的expectedModCount属性中,然后进行迭代,
 * 如果在迭代的过程中HashMap被其他线程修改了,modCount的数值就会发生变化,
 * 这个时候expectedModCount和ModCount不相等,
 * 迭代器就会抛出ConcurrentModificationException()异常
 */
transient int modCount;

/**
 * The next size value at which to resize (capacity * load factor).
 * 下一个要调整大小的阈值, 这个值会等于 容量 x 负载因子
 * @serial
 */
int threshold;

/**
 * The load factor for the hash table.
 * 扩容所用的负载因子
 * @serial
 */
final float loadFactor;

我们使用hashmap都是:

Map<String, Object> mymap = new HashMap<String, Object>();

找到它的构造函数:

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 * 默认无参构造函数把默认的负载因子给到了 mymap
 */
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

这样一个HashMap实例mymap就生成了, 我们给mymap中加点东西.

mymap.put("hello", "world");

去到put方法中瞧一瞧, 直接从外部点击进去会到Map接口中的方法, 直接找到在HashMap中的实现,

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

先是hash()方法生成key的hash值:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

再是调用了putVal()方法, 并且最后两个参数的值传了一个false, 一个true, 进去看看:

/**
 * Implements Map.put and related methods.
 *
 * @param hash hash for key 传进来的key的哈希值
 * @param key the key 传进来的key
 * @param value the value to put 传进来的值
 * @param onlyIfAbsent if true, don't change existing value 如果是真, 则不改变已存在的值
 * @param evict if false, the table is in creation mode. 这个值可以给插入后的操作执行的方法提供一个判断
 * @return previous value, or null if none 返回原来key的value值
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    /**
     * 此处我传进来的值
     * key : hello
     * value : world
     * onlyIfAbsent : false
     * evict : true
     */
    /**
     * 定义局部变量
     * tab : 局部数组表
     * param : 找到对应i下标的Node
     * tabLength : 局部数组表的长度
     * i : tab的下标
     */
    HashMap.Node<K,V>[] tab;
    HashMap.Node<K,V> param;
    int tabLength, i;
    /**
     * 此时table是一个空数组
     * tab 为null, 执行 resize() 方法(*此方法后续分析)
     */
    if ((tab = table) == null || (tabLength = tab.length) == 0){
        /**
         * 把重新分配的表赋值给tab
         */
        tabLength = (tab = resize()).length;
    }
    /**
     * tab长度-1后与key的hash值得到一个下标赋值给i
     * 获取tab中该下标的值, 如果为null,
     * 则执行newNode()方法生成一个Node节点并且将节点赋值给tab[i]
     */
    if ((param = tab[i = (tabLength - 1) & hash]) == null){
        /**
         * 生成一个Node
         */
        tab[i] = newNode(hash, key, value, null);
    }
    /**
     * 接下来的else分支操作是替换旧值
     */
    else {
        /**
         * 临时节点
         */
        HashMap.Node<K,V> okNode;
        /**
         * 临时key
         */
        K tmpK;
        /**
         * 首先对比 找到的节点的hash值是否和传进来的一致;
         * 如果一致, 继续比较
         * 找到的节点的key值是否和传入的key相等==, 相等判断的是引用地址
         * 如果不相等, 判断一下传入的key是否和找到的节点的key equals,
         * equals方法判断的依据默认也是判断引用地址是否相等, 但是很多类像String,
         * 或者自定义的类, 都有可能重写equals() 方法,
         * 这里加上这么一个判断对传入Object类型的key做了一个是否equals校验的宽容性检查,
         * 如果一个自定义对象, 重写了equals方法, 重写了 hashcode方法, 并且只要该对象的两个实例的id相同,
         * 就判定两个实例相等, 这里就起了关键作用, 因为两个对象实例的引用地址一般都不相等, 除非直接赋值引用.
         *
         * 找到节点的hahs值和传入的hash值相等 而且 找到的节点的key和传入的key也相等,则判定该key存在
         */
        if (param.hash == hash &&
                ((tmpK = param.key) == key || (key != null && key.equals(tmpK))))
        {
            /**
             * 将找到的节点赋给临时节点
             */
            okNode = param;
        }
        /**
         * 继续判断找到的节点是否是树节点,也就是红黑树节点的实例
         */
        else if (param instanceof HashMap.TreeNode)
        {
            /**
             * 如果是红黑树的节点, 则调用putTreeVal()方法更新tab, hash, key, value等需要更新的值
             * 并且返回旧的树节点
             */
            okNode = ((HashMap.TreeNode<K,V>)param).putTreeVal(this, tab, hash, key, value);
        }
        else {
            /**
             * 链表节点的判断和赋值操作
             * 链表操作, 不做过多解释,
             * 嘿嘿, 给自己留点思考的空间吧, 留着以后自己看的时候思考一下
             */
            for (int binCount = 0; ; ++binCount) {
                if ((okNode = param.next) == null) {
                    param.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                    {
                        /**
                         * 链表树形化操作
                         */
                        treeifyBin(tab, hash);
                    }
                    break;
                }
                if (okNode.hash == hash &&
                        ((tmpK = okNode.key) == key || (key != null && key.equals(tmpK))))
                    break;
                param = okNode;
            }
        }
        /**
         * 被返回的树节点, 不是树节点就是链表节点
         */
        if (okNode != null) { // existing mapping for key
            /**
             * 旧节点的值
             */
            V oldValue = okNode.value;
            /**
             * 根据 onlyIfAbsent 值来决定返回的value
             * 如果值没有改变返回的值就是传进来的value自己
             */
            if (!onlyIfAbsent || oldValue == null)
            {
                okNode.value = value;
            }
            /**
             * 空方法
             */
            afterNodeAccess(okNode);
            return oldValue;
        }
    }
    /**
     * 操作数自增1
     */
    ++modCount;
    /**
     * 表的容量自增1之后如果大于要扩容的阈值, 则继续重新计算大小
     */
    if (++size > threshold)
    {
        resize();
    }
    /**
     * 一个空方法, 用户可以自行实现
     */
    afterNodeInsertion(evict);
    /**
     * 此时的值是一个新的值, 所以没有旧的值, 返回null
     */
    return null;
}

那几个空方法:

// Callbacks to allow LinkedHashMap post-actions
void afterNodeAccess(HashMap.Node<K,V> p) { }
void afterNodeInsertion(boolean evict) { }
void afterNodeRemoval(HashMap.Node<K,V> p) { }

到这里put操作就差不多了.

再来看看其他3个构造函数吧!

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and the default load factor (0.75).
 *
 * @param  initialCapacity the initial capacity.
 * @throws IllegalArgumentException if the initial capacity is negative.
 * 传入一个初始容量值, 使用默认的负载因子
 */
public HashMap(int initialCapacity) {
    /**
     * 调用的下面的这个构造函数
     */
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and the default load factor (0.75).
 *
 * @param  initialCapacity the initial capacity.
 * @throws IllegalArgumentException if the initial capacity is negative.
 * 传入一个初始容量值, 使用默认的负载因子
 */
public HashMap(int initialCapacity) {
    /**
     * 调用的下面的这个构造函数
     */
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and load factor.
 *
 * @param  initialCapacity the initial capacity
 * @param  loadFactor      the load factor
 * @throws IllegalArgumentException if the initial capacity is negative
 *         or the load factor is nonpositive
 * 传入自定义的初始容量, 传入自定义的负载因子
 */
public HashMap(int initialCapacity, float loadFactor) {
    /**
     * 初始容量不能小于0
     * 否则抛出异常
     */
    if (initialCapacity < 0)
    {
        throw new IllegalArgumentException("Illegal initial capacity: " +
                initialCapacity);
    }
    /**
     * 初始容量如果大于最大允许容量, 则使用最大允许容量
     */
    if (initialCapacity > MAXIMUM_CAPACITY)
    {
        initialCapacity = MAXIMUM_CAPACITY;
    }
    /**
     * 负载因子只能是大于0的浮点数
     * 非法的负载因子会抛出异常
     */
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
    {
        throw new IllegalArgumentException("Illegal load factor: " +
                loadFactor);
    }
    this.loadFactor = loadFactor;
    /**
     * 调用方法计算初始容量的 2 的 幂的值
     * 容量大小只允许为2的倍数
     */
    this.threshold = tableSizeFor(initialCapacity);
}

/**
 * Returns a power of two size for the given target capacity.
 * 对给定的容量, 计算出接近该值2倍大小幂的值
 */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

/**
 * Constructs a new <tt>HashMap</tt> with the same mappings as the
 * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
 * default load factor (0.75) and an initial capacity sufficient to
 * hold the mappings in the specified <tt>Map</tt>.
 *
 * @param   m the map whose mappings are to be placed in this map
 * @throws  NullPointerException if the specified map is null
 * 初始化时直接传入一个map, 使用默认的负载因子
 */
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    /**
     * 将传入的map的条目放进新的map中
     */
    putMapEntries(m, false);
}

/**
 * Implements Map.putAll and Map constructor.
 *
 * @param m the map
 * @param evict false when initially constructing this map, else
 * true (relayed to method afterNodeInsertion).
 */
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    /**
     * 传入map 的大小
     */
    int size = m.size();
    if (size > 0) {
        /**
         * 先看看自身table是不是null
         */
        if (table == null) { // pre-size
            float ft = ((float)size / loadFactor) + 1.0F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                    (int)ft : MAXIMUM_CAPACITY);
            if (t > threshold)
                threshold = tableSizeFor(t);
        }
        /**
         * 看看大小是否超过了下一个要调整的大小, 超过了则重新计算
         */
        else if (size > threshold)
        {
            resize();
        }
        /**
         * 使用entrySet遍历并且put
         * entrySet 这里值得追溯一下 , 仔细点可以看到其实没有一个地方显式的设置entrySet的值,
         * 在put操作里面没有找到相关给entrySet设置值的代码, 这里面的值是怎么来的值得追寻
         */
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict);
        }
    }
}

上面说到entrySet()的值是怎么来的, 其实是从抽象类 AbstractCollection<E> 中的 toString()方法来的;
怎么说呢, 在需要使用entrySet()的时候就会使用到这个方法:

public String toString() {
    Iterator<E> it = iterator();
    if (! it.hasNext())
        return "[]";

    StringBuilder sb = new StringBuilder();
    sb.append('[');
    for (;;) {
        E e = it.next();
        sb.append(e == this ? "(this Collection)" : e);
        if (! it.hasNext())
            return sb.append(']').toString();
        sb.append(',').append(' ');
    }
}

这个方法又会调用 iterator() 方法:

final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
    public final int size()                 { return size; }
    public final void clear()               { HashMap.this.clear(); }
    public final Iterator<Map.Entry<K,V>> iterator() {
        return new EntryIterator();
    }
    public final boolean contains(Object o) {
        if (!(o instanceof Map.Entry))
            return false;
        Map.Entry<?,?> e = (Map.Entry<?,?>) o;
        Object key = e.getKey();
        Node<K,V> candidate = getNode(hash(key), key);
        return candidate != null && candidate.equals(e);
    }
    public final boolean remove(Object o) {
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>) o;
            Object key = e.getKey();
            Object value = e.getValue();
            return removeNode(hash(key), key, value, true, true) != null;
        }
        return false;
    }
    public final Spliterator<Map.Entry<K,V>> spliterator() {
        return new EntrySpliterator<>(HashMap.this, 0, -1, 0, 0);
    }
    public final void forEach(Consumer<? super Map.Entry<K,V>> action) {
        Node<K,V>[] tab;
        if (action == null)
            throw new NullPointerException();
        if (size > 0 && (tab = table) != null) {
            int mc = modCount;
            for (int i = 0; i < tab.length; ++i) {
                for (Node<K,V> e = tab[i]; e != null; e = e.next)
                    action.accept(e);
            }
            if (modCount != mc)
                throw new ConcurrentModificationException();
        }
    }
}

Iterator()方法又会new一个EntryIterator , 所以其实是懒加载返回了下一个节点:

final class EntryIterator extends HashIterator
    implements Iterator<Map.Entry<K,V>> {
    public final Map.Entry<K,V> next() { return nextNode(); }
}

这个藏得有点深!

今天暂时先到这里!

未完待续...

原文地址：https://www.cnblogs.com/fanerwei222/p/11918123.html