Java源码解析CopyOnWriteArrayList的讲解_Java教程

本文基于jdk1.8进行分析。

arraylist和hashmap是我们经常使用的集合，它们不是线程安全的。我们一般都知道hashmap的线程安全版本为concurrenthashmap，那么arraylist有没有类似的线程安全的版本呢？还真有，它就是copyonwritearraylist。

copyonwrite这个短语，还有一个专门的称谓cow. cow不仅仅是java实现集合框架时专用的机制，它在计算机中被广泛使用。

首先看一下什么是copyonwritearraylist，它的类前面的javadoc注释很长，我们只截取最前面的一小段。如下。它的介绍中说到，copyonwritearraylist是arraylist的一个线程安全的变种，在copyonwritearraylist中，所有改变操作（add，set等）都是通过给array做一个新的拷贝来实现的。通常来看，这花费的代价太大了，但是，当读取list的线程数量远远多于写list的线程数量时，这种方法依然比别的实现方式更高效。

				?

									/**

									 * a thread-safe variant of {@link java.util.arraylist} in which all mutative

									 * operations ({@code add}, {@code set}, and so on) are implemented by

									 * making a fresh copy of the underlying array.

									 * <p>this is ordinarily too costly, but may be <em>more</em> efficient

									 * than alternatives when traversal operations vastly outnumber

									 * mutations, and is useful when you cannot or don't want to

									 * synchronize traversals, yet need to preclude interference among

									 * concurrent threads. the "snapshot" style iterator method uses a

									 * reference to the state of the array at the point that the iterator

									 * was created. this array never changes during the lifetime of the

									 * iterator, so interference is impossible and the iterator is

									 * guaranteed not to throw {@code concurrentmodificationexception}.

									 * the iterator will not reflect additions, removals, or changes to

									 * the list since the iterator was created. element-changing

									 * operations on iterators themselves ({@code remove}, {@code set}, and

									 * {@code add}) are not supported. these methods throw

									 * {@code unsupportedoperationexception}.

									 **/

下面看一下成员变量。只有2个，一个是基本数据结构array，用于保存数据，一个是可重入锁，它用于写操作的同步。

				?

									/** the lock protecting all mutators **/

									final transient reentrantlock lock = new reentrantlock();

									/** the array, accessed only via getarray/setarray. **/

									private transient volatile object[] array;

下面看一下主要方法。get方法如下。get方法没有什么特殊之处，不加锁，直接读取即可。

				?

									/**

									 * {@inheritdoc}

									 * @throws indexoutofboundsexception {@inheritdoc}

									 **/

									public e get(int index) {

									  return get(getarray(), index);

									}

									/**

									 * gets the array. non-private so as to also be accessible

									 * from copyonwritearrayset class.

									 **/

									final object[] getarray() {

									  return array;

									}

									@suppresswarnings("unchecked")

									private e get(object[] a, int index) {

									  return (e) a[index];

									}

下面看一下add。add方法先加锁，然后，把原array拷贝到一个新的数组中，并把待添加的元素加入到新数组，最后，再把新数组赋值给原数组。这里可以看到，add操作并不是直接在原数组上操作，而是把整个数据进行了拷贝，才操作的，最后把新数组赋值回去。

				?

									/**

									 * appends the specified element to the end of this list.

									 * @param e element to be appended to this list

									 * @return {@code true} (as specified by {@link collection#add})

									 **/

									public boolean add(e e) {

									  final reentrantlock lock = this.lock;

									  lock.lock();

									  try {

									    object[] elements = getarray();

									    int len = elements.length;

									    object[] newelements = arrays.copyof(elements, len + 1);

									    newelements[len] = e;

									    setarray(newelements);

									    return true;

									  } finally {

									    lock.unlock();

									  }

									}

									/**

									 * sets the array.

									 **/

									final void setarray(object[] a) {

									  array = a;

									}

这里，思考一个问题。线程1正在遍历list，此时，线程2对线程进行了写入，那么，线程1可以遍历到线程2写入的数据吗？

首先明确一点，这个场景不会抛出任何异常，程序会安静的执行完成。是否能到读到线程2写入的数据，取决于遍历方式和线程2的写入时机及位置。

首先看遍历方式，我们2中方式遍历list，foreach和get(i)的方式。foreach的底层实现是迭代器，所以迭代器就不单独作为一种遍历方式了。首先看一下通过for循环get(i)的方式。这种遍历方式下，能否读取到线程2写入的数据，取决了线程2的写入时机和位置。如果线程1已经遍历到第5个元素了，那么如果线程2在第5个后面进行写入，那么线程1就可以读取到线程2的写入。

				?

									public class myclass {

									  static list<string> list = new copyonwritearraylist<>();

									  public static void main(string[] args){

									    list.add("a");

									    list.add("b");

									    list.add("c");

									    list.add("d");

									    list.add("e");

									    list.add("f");

									    list.add("g");

									    list.add("h");

									    //启动线程1，遍历数据

									    new thread(()->{

									      try{

									        for(int i = 0; i < list.size();i ++){

									          system.out.println(list.get(i));

									          thread.sleep(1000);

									        }

									      }catch (exception e){

									        e.printstacktrace();

									      }

									    }).start();

									    try{

									      //主线程作为线程2，等待2s

									      thread.sleep(2000);

									    }catch (exception e){

									      e.printstacktrace();

									    }

									    //主线程作为线程2，在位置4写入数据，即，在遍历位置之后写入数据

									    list.add(4,"n");

									  }

									}

上述程序的运行结果如下，是可以遍历到n的。

a
b
c
d
n
e
f
g
h

如果线程2在第5个位置前面写入，那么线程1就读取不到线程2的写入。同时，还会带来一个副作用，就是某个元素会被读取2次。代码如下：

				?

									public class myclass {

									  static list<string> list = new copyonwritearraylist<>();

									  public static void main(string[] args){

									    list.add("a");

									    list.add("b");

									    list.add("c");

									    list.add("d");

									    list.add("e");

									    list.add("f");

									    list.add("g");

									    list.add("h");

									    //启动线程1，遍历数据

									    new thread(()->{

									      try{

									        for(int i = 0; i < list.size();i ++){

									          system.out.println(list.get(i));

									          thread.sleep(1000);

									        }

									      }catch (exception e){

									        e.printstacktrace();

									      }

									    }).start();

									    try{

									      //主线程作为线程2，等待2s

									      thread.sleep(2000);

									    }catch (exception e){

									      e.printstacktrace();

									    }

									    //主线程作为线程2，在位置1写入数据，即，在遍历位置之后写入数据

									    list.add(1,"n");

									  }

									}

上述代码的运行结果如下，其中，b被遍历了2次。

a
b
b
c
d
e
f
g
h

那么，采用foreach方式遍历呢？答案是无论线程2写入时机如何，线程2都无法读取到线程2的写入。原因在于copyonwritearraylist在创建迭代器时，取了当前时刻数组的快照。并且，add操作只会影响原数组，影响不到迭代器中的快照。

				?

									public iterator<e> iterator() {

									  return new cowiterator<e>(getarray(), 0);

									}

									private cowiterator(object[] elements, int initialcursor) {

									    cursor = initialcursor;

									    snapshot = elements;

									}