下载超大的excel

结论

通过 POI的SXSSFWorkbook,使用操作系统的临时文件来作为缓存,可以生成超大的excel 文件(我自己测试到500W,就没往下测了)。

记得使用压缩。关键代码

1
2
3
4
5
6
7
8
9
10
SXSSFWorkbook wb = null;
try {
wb = new SXSSFWorkbook();
wb.setCompressTempFiles(true); //压缩临时文件,很重要,否则磁盘很快就会被写满
...
} finally {
if (wb != null) {
wb.dispose();// 删除临时文件,很重要,否则磁盘可能会被写满
}
}

背景

由于业务需要,最近要做一个导出超大数据的功能。之间已经有人做过一版,由于受到POI 导出超大数据量时会出错的影响,它把一个大文件拆成很多个小文件,然后再压缩下载,结果经常出现少一两个文件的问题。

目标

支持单个 excel 的 sheet 导出100w 的数据

方案

导出 csv 文件

首先想到的是导出 csv 文件,最方便。但是调研后,也是最快放弃的,因为它存在两个很严重的问题:

  • 不同系统上的编码不一样,需要人工选择,对于普通用户不做好
  • 没有优化和数据压缩,数据量越大,csv 文件的大小比 excel 更大,当数据导出超过10w 时,csv 文件大小是 excel 的1.5倍
导出格式 1w 10w 30w 50w 70w 90w 100w
csv 4.0K/120ms 50M/1261ms 160M/3828ms 271M/7415ms 381M/8929ms 491M/11356ms 546M/13688ms

每行30个字段,每个字段里的内容由 Math.random()产生

导出 excel 文件

大数据量的情况下,csv 的表现较差。只能考虑 excel. 对 excel 作了一个简单的测试

指标 1w 2w 3w 4w 5w 6w 7w 8w 10w
耗时 3326ms 6483ms 7894 ms 9899 ms 12873 ms 15198 ms 17362 ms 20106 ms 25494 ms
导出文件大小 3.7M 7.4MM 12M 15M 19M 23M 26M 30M 37M
cpu 使用率 100% 100% 100% 100% 100% 200% 200% 800% 900%

cpu 使用率均指稳定时的 cpu 使用率

发现几个很严重的问题:

  • 随着数据量的增大,cpu使用率直线上升,这会给系统带来很大的风险
  • 当数据量超过10w 时,会出现 OOM 异常

excel 在内存里存储地越来越大,研究到了瓶颈。要解决这个问题,有两种方案:

  • 先生成多个小 execel 文件,最后合并成一个大文件。查了文档,发现Java 里的工具都是先读出来,再写到 Workbook 对象里, 这样还是会碰到同样的问题。如果用 excel 的工具,则运维成本过大,因此这个方案行不通
  • 参考操作系统里的虚拟内存,用这个来突破 机器的内存限制。但是磁盘的性能很差,这样做的效率很低。

这时,在 POI 的文档里发现了SXSSFWorkbook,其支持使用临时文件,可以用来生成超大 Excel 文件。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Since 3.8-beta3, POI provides a low-memory footprint SXSSF API built on top of XSSF.
SXSSF is an API-compatible streaming extension of XSSF to be used when very large
spreadsheets have to be produced, and heap space is limited. SXSSF achieves its
low memory footprint by limiting access to the rows that are within a sliding window,
while XSSF gives access to all rows in the document. Older rows that are no longer
in the window become inaccessible, as they are written to the disk.
In auto-flush mode the size of the access window can be specified, to hold a certain
number of rows in memory. When that value is reached, the creation of an additional
row causes the row with the lowest index to to be removed from the access window and
written to disk. Or, the window size can be set to grow dynamically; it can be trimmed
periodically by an explicit call to flushRows(int keepRows) as needed.
Due to the streaming nature of the implementation, there are the following
limitations when compared to XSSF:
* Only a limited number of rows are accessible at a point in time.
* Sheet.clone() is not supported.
* Formula evaluation is not supported

以下是 SXSSFWorkbook的测试结果:

使用缓存文件导出 excel

指标 10w 20w 30w 50w 80w 100w 150w 200w 300w
导出文件大小 37M 74M 111M 184M 295M 368M 552M 736M 1.1G
耗时(ms) 16259 29516 45846 75503 120434 156484 233730 303510 463399
cpu 使用率 100 100 100 100 100 100 100 100 100
内存使用(k) 149460 176576 141940 143700 168460 180168 169632 198320 187484
缓存文件大小 37M 74M 111M 185M 295M 369M 553M 737M 1.1G

可以看到,其在性能与资源耗用上都比较平均,至此,问题完美解决。

SXSSFWorkbook在使用上有一些注意项

  • Note that SXSSF allocates temporary files that you must always clean up explicitly, by calling the dispose method.
1
2
3
4
5
6
7
SXSSF flushes sheet data in temporary files (a temp file per sheet) and the size
of these temporary files can grow to a very large value. For example, for a 20 MB
csv data the size of the temp xml becomes more than a gigabyte. If the size of the
temp files is an issue, you can tell SXSSF to use gzip compression:
SXSSFWorkbook wb = new SXSSFWorkbook();
wb.setCompressTempFiles(true); // temp files will be gzipped

测试代码

生成 csv

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
private static void prcoessCSV(int rowsNum) throws Exception {
try {
long startTime = System.currentTimeMillis();
final int NUM_OF_ROWS = rowsNum;
final int NUM_OF_COLUMNS = 30;
File file = new File("ooxml-scatter-chart_" + rowsNum + ".csv");
BufferedWriter bf = new BufferedWriter(new FileWriter(file));
StringBuffer sb = new StringBuffer();
try {
for (int rownum = 0; rownum < NUM_OF_ROWS; rownum++) {
for (int cellnum = 0; cellnum < NUM_OF_COLUMNS; cellnum++) {
sb.append(Math.random());
if ((cellnum + 1) != NUM_OF_COLUMNS) {
sb.append(",");
}
}
sb.append("\n");
if (rownum % 10000 == 0) {
bf.write(sb.toString());
sb = new StringBuffer();
}
}
bf.close();
} catch (Exception ex) {
ex.printStackTrace();
}
long endTime = System.currentTimeMillis();
System.out.println("process " + rowsNum + " spent time:" + (endTime - startTime));
} catch (Exception e) {
e.printStackTrace();
throw e;
}
}

excel,不使用缓存

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
try {
long startTime = System.currentTimeMillis();
final int NUM_OF_ROWS = rowsNum;
final int NUM_OF_COLUMNS = 30;
Workbook wb = new XSSFWorkbook();
Sheet sheet = wb.createSheet("Sheet 1");
// Create a row and put some cells in it. Rows are 0 based.
Row row;
Cell cell;
for (int rowIndex = 0; rowIndex < NUM_OF_ROWS; rowIndex++) {
row = sheet.createRow(rowIndex);
for (int colIndex = 0; colIndex < NUM_OF_COLUMNS; colIndex++) {
cell = row.createCell(colIndex);
cell.setCellValue(Math.random());
}
}
// Write the output to a file
FileOutputStream out = new FileOutputStream("ooxml-scatter-chart_XSSF_" + rowsNum + ".xlsx");
wb.write(out);
out.close();
wb.close();
long endTime = System.currentTimeMillis();
System.out.println("process " + rowsNum + " spent time:" + (endTime - startTime));
} catch (Exception e) {
e.printStackTrace();
throw e;
}

excel,使用缓存

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
try {
long startTime = System.currentTimeMillis();
final int NUM_OF_ROWS = rowsNum;
final int NUM_OF_COLUMNS = 30;
SXSSFWorkbook wb = null;
try {
wb = new SXSSFWorkbook();
wb.setCompressTempFiles(true); //压缩临时文件,很重要,否则磁盘很快就会被写满
Sheet sh = wb.createSheet();
int rowNum = 0;
for (int num = 0; num < NUM_OF_ROWS; num++) {
if (num % 100_0000 == 0) {
sh = wb.createSheet("sheet " + num);
rowNum = 0;
}
rowNum++;
Row row = sh.createRow(rowNum);
for (int cellnum = 0; cellnum < NUM_OF_COLUMNS; cellnum++) {
Cell cell = row.createCell(cellnum);
cell.setCellValue(Math.random());
}
}
FileOutputStream out = new FileOutputStream("ooxml-scatter-chart_SXSSFW_" + rowsNum + ".xlsx");
wb.write(out);
out.close();
} catch (Exception ex) {
ex.printStackTrace();
} finally {
if (wb != null) {
wb.dispose();// 删除临时文件,很重要,否则磁盘可能会被写满
}
}
long endTime = System.currentTimeMillis();
System.out.println("process " + rowsNum + " spent time:" + (endTime - startTime));
} catch (Exception e) {
e.printStackTrace();
throw e;
}

克制

每天都会接触到很多的事物,如新游戏、好电影、很火的小说,抑或是新的流行语言、有趣的发现等,这些东西是如此有趣,以至于每件事都想尝试一下。然而一个残酷的现实就是我们根本没有这么多的时间和精力去处理这么多的信息,也做不好那么多感兴趣的事情。我们的精力是如此有限,如果不注意保护,等精力被耗尽之后,工作的就不是大脑,而是肌肉了.

慢慢发现,碎片化的时间配碎片化的事情会更合适一些,比如地铁上刷RSS和微博,等到电脑前就专心做需要一整块时间的事。RSS和知乎这类流媒体有一个很大的特点,就是信息流基本上是无限的,只要你愿意,可以一直往下刷,而时间也在不知不觉中耗尽。如果是在碎片化的时间做这类事情,就不会发生这种事情,因为会被自然打断:要下车了。

人接受新事物是需要耗费能量的,如果在碎片化的事情上花费太多的精力,会反过来抵制我们做核心工作的能力。但是碎片化的消息本身是无法避免的,所以需要一套机制,能让我们在接受碎片化信息的同时,不耗费过多的精力和时间。实践下来,做一个筛选会提高很多的效率:
快速过滤,选出重点->细读资料后作整理、分类

在整理时,会发现有趣的、需要知道的、想知道的东西太多,然而精力非常有限,所以我们需要保持克制,合适的时候,要能丢掉一部分资料,也丢掉一部分兴趣。对于自己的欲望,始终保持克制

克制是一种美德

Vector浅析

Vector与ArrayList很像,主要区别在于Vector是线程安全的。

Vector的类声明:

*

As of the Java 2 platform v1.2, this class was retrofitted to
* implement the {@link List} interface, making it a member of the
*
* Java Collections Framework
. Unlike the new collection
* implementations, {@code Vector} is synchronized. If a thread-safe
* implementation is not needed, it is recommended to use {@link
* ArrayList} in place of {@code Vector}.

阅读源码后发现,Vector是通过给所有的外部类加上synchronized来实现线程安全的。这种实现方式很简单,但由于synchronized是对this加锁,所以当需要同时一个对象的多个方法时,效率就会很低。

另外,查看iterator的源码

/**
 * Returns an iterator over the elements in this list in proper sequence.
 *
 * <p>The returned iterator is <a href="#fail-fast"><i>fail-fast</i></a>.
 *
 * @return an iterator over the elements in this list in proper sequence
 */
public synchronized Iterator<E> iterator() {
    return new Itr();
}

可以发现,在返回遍历对象后,就已经退出了锁,如果修改了对象之后,再去遍历,则会抛出ConcurrentModificationException异常.

Vector<String> vector = new Vector<String>();
vector.add("abc");
Iterator<String> ite = vector.iterator();
vector.add("bcd");
while (ite.hasNext()) {
    System.out.println(ite.next());
}

运行结果:

Exception in thread "main" java.util.ConcurrentModificationException
    at java.util.Vector$Itr.checkForComodification(Vector.java:1184)
    at java.util.Vector$Itr.next(Vector.java:1137)
    at com.showstone.containerresearch.list.VectorResearch.main(VectorResearch.java:14)

记一次redis报错JedisDataException: ERR Proxy fail to forward command

前几天老大要求优化redis,减少redis的读取次数,用mget来替代循环多次读取redis.线下测试没有问题,等到线上却发现大量报:redis.clients.jedis.exceptions.JedisDataException: ERR Proxy fail to forward command.

google后没有发现什么有用的资料.挂上vpn连上生产环境的配置,发现程序也是正常work的.证明不是环境的问题.

以Jedis的庞大用户量,应该不会是Jedis的问题,继续进行测试,最后发现当key的个数为0时,能重现问题.

redis.clients.jedis.exceptions.JedisDataException: ERR Proxy fail to forward command   
  at redis.clients.jedis.Protocol.processError(Protocol.java:113)   
  at redis.clients.jedis.Protocol.process(Protocol.java:138)
  at redis.clients.jedis.Protocol.read(Protocol.java:192)   
  at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:282) 
  at redis.clients.jedis.Connection.getBinaryMultiBulkReply(Connection.java:218)
  at redis.clients.jedis.Connection.getMultiBulkReply(Connection.java:211)  

通过命令行直接操作:

127.0.0.1:6379> mget
(error) ERR Proxy fail to forward command

坑爹的是redis的官方文档没有关于这些内容的任何介绍:

MGET key [key ...]

Available since 1.0.0.
Time complexity: O(N) where N is the number of keys to retrieve.
Returns the values of all specified keys. For every key that does not hold a string value or does not exist, the special value nil is returned. Because of this, the operation never fails.

Return value
Array reply: list of values at the specified keys.
Examples
redis> SET key1 "Hello"
OK
redis> SET key2 "World"
OK
redis> MGET key1 key2 nonexisting
1) "Hello"
2) "World"
3) (nil)
redis> 

java容器研究思路

#容器的研究思路

  • 为什么研究容器
    • 容器是JDK里的基础功能,平时使用得较多。
    • 容器相对简单,比较容易研究
    • 已经有很多人对容器进行了研究,相关资料比较齐全
  • 容器是什么,要研究那些特性
    • 容器是一段程序,是一系列的对象。
    • 作为程序,它由 数据结构+算法构成
    • 作为对象,它由 属性+方法构成
    • 作为类,它有自己的继承树
    • 容器有很多的工具类,如collections、Arrays、Iterator等等
  • 怎么去研究容器
    • 评价软件的4个因素:可维护性,可靠性,移植性,效率
    • 如何去测试评估程序的特性
    • 程序的并发性能
    • 程序的极限性能

##容器的分类
容器分为list、set、Map等三大类

  • list

    有序的数组
    有ArrayList,LinkedList,Vector,CopyOnwriteArrayList
  • set

    HashSet, TreeSet
  • Map

    HashMap,TreeMap,HashTable,LinkedHashMap,WeakHashMap

LinkedList浅析

LinkedList浅析

  • LinkedList的作用及特点
  • LinkedList的数据结构
  • LinkedList的常用方法

LinkedList的作用及特点

  • LinkedList可作为链表或者双向队列来使用
  • LinkedList的数据结构是基于链表来实现的,所以在它的开头、末尾添加结点的速度会非常快

LinkedList 是一个继承于AbstractSequentialList的双向链表。它也可以被当作堆栈、队列或双端队列进行操作。
LinkedList 实现 List 接口,能对它进行队列操作。
LinkedList 实现 Deque 接口,即能将LinkedList当作双端队列使用。
LinkedList 实现了Cloneable接口,即覆盖了函数clone(),能克隆。
LinkedList 实现java.io.Serializable接口,这意味着LinkedList支持序列化,能通过序列化去传输。
LinkedList 是非同步的。
以上内容来自于 skywang12345的博客

LinkedList的数据结构

LinkedList的定义

public class LinkedList<E>
    extends AbstractSequentialList<E>
    implements List<E>, Deque<E>, Cloneable, java.io.Serializable
{
    transient int size = 0;

    /**
     * Pointer to first node.
     * Invariant: (first == null && last == null) ||
     *            (first.prev == null && first.item != null)
     */
    transient Node<E> first;

    /**
     * Pointer to last node.
     * Invariant: (first == null && last == null) ||
     *            (last.next == null && last.item != null)
     */
    transient Node<E> last;

 ...

}

Node的定义

private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;

    Node(Node<E> prev, E element, Node<E> next) {
        this.item = element;
        this.next = next;
        this.prev = prev;
    }
}

LinkedList的常用方法

list接口定义的方法

int size();
boolean isEmpty();
boolean contains(Object o);
Iterator<E> iterator();
Object[] toArray();
<T> T[] toArray(T[] a);
boolean add(E e);
boolean remove(Object o);
boolean containsAll(Collection<?> c);
boolean addAll(Collection<? extends E> c);
boolean addAll(int index, Collection<? extends E> c);
boolean removeAll(Collection<?> c);
boolean retainAll(Collection<?> c);
default void replaceAll(UnaryOperator<E> operator) ;
default void sort(Comparator<? super E> c);
void clear();
boolean equals(Object o);
int hashCode();
E get(int index);
E set(int index, E element);
void add(int index, E element);
E remove(int index);
int indexOf(Object o);
int lastIndexOf(Object o);
ListIterator<E> listIterator();
ListIterator<E> listIterator(int index);
List<E> subList(int fromIndex, int toIndex);

LinkedList自己的方法

void addFirst(E e);
void addLast(E e);
boolean offerFirst(E e);
boolean offerLast(E e);
E removeFirst();
E removeLast();
E pollFirst();
E pollLast();
E getFirst();
E getLast();
E peekFirst();
E peekLast();
boolean removeFirstOccurrence(Object o);
boolean removeLastOccurrence(Object o);
boolean add(E e);
boolean offer(E e);
E remove();
E poll();
E element();
E peek();
void push(E e);
E pop();
boolean remove(Object o);
boolean contains(Object o);
public int size();
Iterator<E> iterator();
Iterator<E> descendingIterator();

总的来说,LinkedList里基本上都是这两个接口对应的方法
各个接口的实现,基本上都是比较正常的,这是不再重复描述

ArrayList浅析

#ArrayList浅析

  • ArrayList的作用、特点
  • ArrayList的数据结构
  • ArrayList的常用方法及实现
  • ArrayList里需要注意的地方

###ArrayList的作用、特点
ArrayList实现了List接口,常用操作有add,remove,get,size。其内部的数据采用数组进行存储。因此随机读写的速度很快,但删除、添加等操作相对会消耗比较多的时间,因为会有相关的一系列节点移动。

###ArrayList的数据结构
ArrayList内部以数组的形式存储数据。默认的是一个空数组,当添加数据后,会扩充为一个长度为10的数组。

默认的空数组
/**
 * Constructs an empty list with an initial capacity of ten.
 */
public ArrayList() {
    super();
    this.elementData = EMPTY_ELEMENTDATA;
}

private static final Object[] EMPTY_ELEMENTDATA = {};

###ArrayList的常用方法及其实现
1.add()

加入元素之前,先确保有足够的空间

public boolean add(E e) {<br/>
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

指定位置加入元素时,之后所有的元素都需要后移

public void add(int index, E element) {
    rangeCheckForAdd(index);

    ensureCapacityInternal(size + 1);  // Increments modCount!!
    System.arraycopy(elementData, index, elementData, index + 1, size - index);
    elementData[index] = element;
    size++;
}

addAll

public boolean addAll(Collection<? extends E> c) {
    Object[] a = c.toArray();
    int numNew = a.length;
    ensureCapacityInternal(size + numNew);  // Increments modCount
    System.arraycopy(a, 0, elementData, size, numNew);
    size += numNew;
    return numNew != 0;
}

2.remove

删除之前,先检查是否越界,同时也有大量的元素移动操作

public E remove(int index) {
    rangeCheck(index);//简单地检查下标是否越界

    modCount++;
    E oldValue = elementData(index);

    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work

    return oldValue;
}

3.contain

public boolean contains(Object o) {
    return indexOf(o) >= 0;
}

public int indexOf(Object o) {
    if (o == null) {
        for (int i = 0; i < size; i++)
            if (elementData[i]==null)
                return i;
    } else {
        for (int i = 0; i < size; i++)
            if (o.equals(elementData[i]))
                return i;
    }
    return -1;
}

4.clone

public Object clone() {
    try {
        @SuppressWarnings("unchecked")
            ArrayList<E> v = (ArrayList<E>) super.clone();
        v.elementData = Arrays.copyOf(elementData, size);
        v.modCount = 0;
        return v;
    } catch (CloneNotSupportedException e) {
        // this shouldn't happen, since we are Cloneable
        throw new InternalError();
    }
}

5.iterator

通过使用cursor和lastRet来进行标记

private class Itr implements Iterator<E> {
    int cursor;       // index of next element to return
    int lastRet = -1; // index of last element returned; -1 if no such
    int expectedModCount = modCount;
    ...
}

###ArrayList里需要注意的地方

  • sublist并没有创建一个新的ArrayList,只是加了一个下标的起点

代码:

public List<E> subList(int fromIndex, int toIndex) {
    subListRangeCheck(fromIndex, toIndex, size);
    return new SubList(this, 0, fromIndex, toIndex);
}

 private class SubList extends AbstractList<E> implements RandomAccess {
    private final AbstractList<E> parent;
    private final int parentOffset;
    private final int offset;
    int size;
    ...
}

所以在原来的List或者Sublist里的操作,都会影响到另一个list

  • 因为ArrayList是基于数组的,所以当添加的元素超过原来的数组大小时,它需要先创建一个新的数组,并把原来的元素复制过去。这在一定程序上会影响程序的性能。类似的操作有删除,根据上面的实现代码可以知道,每删除一个元素就会对它后面的元素进行移动。可以把那些需要删除的参数用一个list保存下来,然后用removeAll来一次全部删除。

代码

public boolean removeAll(Collection<?> c) {
    return batchRemove(c, false);
}

private boolean batchRemove(Collection<?> c, boolean complement) {
    final Object[] elementData = this.elementData;
    int r = 0, w = 0;
    boolean modified = false;
    try {
        for (; r < size; r++)
            if (c.contains(elementData[r]) == complement)//这是一个比较巧妙的实现,值得学习
                elementData[w++] = elementData[r];
    } finally {
        // Preserve behavioral compatibility with AbstractCollection,
        // even if c.contains() throws.
        if (r != size) {
            System.arraycopy(elementData, r,
                             elementData, w,
                             size - r);
            w += size - r;
        }
        if (w != size) {
            // clear to let GC do its work
            for (int i = w; i < size; i++)
                elementData[i] = null;
            modCount += size - w;
            size = w;
            modified = true;
        }
    }
    return modified;
}
  • ArrayList里只定义了size和elementData两个元素,其他的都来自于父类AbstractList

protected transient int modCount = 0;

private int size;

以下是测试指定容器大小和让容器自动扩张比较的代码

public static void main(String[] args) {
    long startTime = System.currentTimeMillis();
    for(int i=0;i<100000;i++){
        ArrayList list = new ArrayList();
        for(int num=0;num<1000;num++){
            list.add("object");
        }
    }
    System.out.println("用时:"+ (System.currentTimeMillis()-startTime)+"毫秒");
    startTime = System.currentTimeMillis();
    for(int i=0;i<100000;i++){
        ArrayList list = new ArrayList(2000);
        for(int num=0;num<1000;num++){
            list.add("object");
        }
    }
    System.out.println("用时:"+ (System.currentTimeMillis()-startTime)+"毫秒");
}

运行结果:

用时:607毫秒
用时:415毫秒

CopyOnWriteArray浅析

  • CopyOnWriteArrayList的特点与用法
  • CopyOnWriteArrayList的数据结构
  • CopyOnWriteArrayList的常用方法及实现

CopyOnWriteArrayList的特点与用法

CopyOnWriteArrayList的数据结构和用法与ArrayList基本都相同,区别主要在于CopyOnWriteArrayList在添加和删除元素等修改集合状态的操作时,都会重新复制一个新的数组,以保证多线程下的安全性。因为每次修改数据结点都要复制整个对象数组,会比较耗时间,所以CopyOnWriteArrayList适合读多写少情况下的多线程场景,如缓存之类的。

CopyOnWriteArrayList的数据结构

CopyOnWriteArrayList的数据结构与ArrayList的基本一样,只是它不再有扩容之类的操作,因为每次修改都是一个新的数组。。。

CopyOnWriteArrayList的常用方法及实现

public class CopyOnWriteArrayList<E>
    implements List<E>, RandomAccess, Cloneable, java.io.Serializable {
    ...
}

从CopyOnWriteArrayList的类结构就可以看出,它的方法签名与ArrayList基本一致,这里就不再重复了。下面讲讲它方法的特殊实现。

  1. 构造方法
    CopyOnWriteArrayList的默认构造方法会创建一个长度为0的数组来储存数据,而不是像ArrayList那样创建一个长度为10的数组

    /**Sets the array. */    
    final void setArray(Object[] a) {
      array = a;
    }
    
    /**
    * Creates an empty list.
    */
    public CopyOnWriteArrayList() {
        setArray(new Object[0]);
    }
    
  2. 添加方法

    /**
     * Appends the specified element to the end of this list.
     *
     * @param e element to be appended to this list
     * @return {@code true} (as specified by {@link Collection#add})
     */
    public boolean add(E e) {
        final ReentrantLock lock = this.lock;
        lock.lock();
        try {
            Object[] elements = getArray();
            int len = elements.length;
            Object[] newElements = Arrays.copyOf(elements, len + 1);
            newElements[len] = e;
            setArray(newElements);
            return true;
        } finally {
            lock.unlock();
        }
    }
    
    /**
     * Inserts the specified element at the specified position in this
     * list. Shifts the element currently at that position (if any) and
     * any subsequent elements to the right (adds one to their indices).
     *
     * @throws IndexOutOfBoundsException {@inheritDoc}
     */
    public void add(int index, E element) {
        final ReentrantLock lock = this.lock;
        lock.lock();
        try {
            Object[] elements = getArray();
            int len = elements.length;
            if (index > len || index < 0)
                throw new IndexOutOfBoundsException("Index: "+index+
                                                    ", Size: "+len);
            Object[] newElements;
            int numMoved = len - index;
            if (numMoved == 0)
                newElements = Arrays.copyOf(elements, len + 1);
            else {
                newElements = new Object[len + 1];
                System.arraycopy(elements, 0, newElements, 0, index);
                System.arraycopy(elements, index, newElements, index + 1,
                                 numMoved);
            }
            newElements[index] = element;
            setArray(newElements);
        } finally {
            lock.unlock();
        }
    }
    
  3. 修改方法

    方法在进入和退出时有加锁、释放锁

    /**
     * Replaces the element at the specified position in this list with the
     * specified element.
     *
     * @throws IndexOutOfBoundsException {@inheritDoc}
     */
    public E set(int index, E element) {
        final ReentrantLock lock = this.lock;
        lock.lock();
        try {
            Object[] elements = getArray();
            E oldValue = get(elements, index);
    
            if (oldValue != element) {
                int len = elements.length;
                Object[] newElements = Arrays.copyOf(elements, len);
                newElements[index] = element;
                setArray(newElements);
            } else {
                // Not quite a no-op; ensures volatile write semantics
                setArray(elements);
            }
            return oldValue;
        } finally {
            lock.unlock();
        }
    }
    
  4. 删除方法

    /**
     * Removes the element at the specified position in this list.
     * Shifts any subsequent elements to the left (subtracts one from their
     * indices).  Returns the element that was removed from the list.
     *
     * @throws IndexOutOfBoundsException {@inheritDoc}
     */
    public E remove(int index) {
        final ReentrantLock lock = this.lock;
        lock.lock();
        try {
            Object[] elements = getArray();
            int len = elements.length;
            E oldValue = get(elements, index);
            int numMoved = len - index - 1;
            if (numMoved == 0)
                setArray(Arrays.copyOf(elements, len - 1));
            else {
                Object[] newElements = new Object[len - 1];
                System.arraycopy(elements, 0, newElements, 0, index);
                System.arraycopy(elements, index + 1, newElements, index,
                                 numMoved);
                setArray(newElements);
            }
            return oldValue;
        } finally {
            lock.unlock();
        }
    }
    
    /**
     * Removes the first occurrence of the specified element from this list,
     * if it is present.  If this list does not contain the element, it is
     * unchanged.  More formally, removes the element with the lowest index
     * {@code i} such that
     * <tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>
     * (if such an element exists).  Returns {@code true} if this list
     * contained the specified element (or equivalently, if this list
     * changed as a result of the call).
     *
     * @param o element to be removed from this list, if present
     * @return {@code true} if this list contained the specified element
     */
    public boolean remove(Object o) {
        Object[] snapshot = getArray();
        int index = indexOf(o, snapshot, 0, snapshot.length);
        return (index < 0) ? false : remove(o, snapshot, index);
    }
    
    /**
     * A version of remove(Object) using the strong hint that given
     * recent snapshot contains o at the given index.
     */
    private boolean remove(Object o, Object[] snapshot, int index) {
        final ReentrantLock lock = this.lock;
        lock.lock();
        try {
            Object[] current = getArray();
            int len = current.length;
            if (snapshot != current) findIndex: {
                int prefix = Math.min(index, len);
                for (int i = 0; i < prefix; i++) {
                    if (current[i] != snapshot[i] && eq(o, current[i])) {
                        index = i;
                        break findIndex;
                    }
                }
                if (index >= len)
                    return false;
                if (current[index] == o)
                    break findIndex;
                index = indexOf(o, current, index, len);
                if (index < 0)
                    return false;
            }
            Object[] newElements = new Object[len - 1];
            System.arraycopy(current, 0, newElements, 0, index);
            System.arraycopy(current, index + 1,
                             newElements, index,
                             len - index - 1);
            setArray(newElements);
            return true;
        } finally {
            lock.unlock();
        }
    }
    
    /**
     * Removes from this list all of the elements whose index is between
     * {@code fromIndex}, inclusive, and {@code toIndex}, exclusive.
     * Shifts any succeeding elements to the left (reduces their index).
     * This call shortens the list by {@code (toIndex - fromIndex)} elements.
     * (If {@code toIndex==fromIndex}, this operation has no effect.)
     *
     * @param fromIndex index of first element to be removed
     * @param toIndex index after last element to be removed
     * @throws IndexOutOfBoundsException if fromIndex or toIndex out of range
     *         ({@code fromIndex < 0 || toIndex > size() || toIndex < fromIndex})
     */
    void removeRange(int fromIndex, int toIndex) {
        final ReentrantLock lock = this.lock;
        lock.lock();
        try {
            Object[] elements = getArray();
            int len = elements.length;
    
            if (fromIndex < 0 || toIndex > len || toIndex < fromIndex)
                throw new IndexOutOfBoundsException();
            int newlen = len - (toIndex - fromIndex);
            int numMoved = len - toIndex;
            if (numMoved == 0)
                setArray(Arrays.copyOf(elements, newlen));
            else {
                Object[] newElements = new Object[newlen];
                System.arraycopy(elements, 0, newElements, 0, fromIndex);
                System.arraycopy(elements, toIndex, newElements,
                                 fromIndex, numMoved);
                setArray(newElements);
            }
        } finally {
            lock.unlock();
        }
    }
    
  5. 遍历

    遍历时,直接使用内部的数组,这样可以避开多线程时的问题,类似的还有indexOf

    public void forEach(Consumer<? super E> action) {
        if (action == null) throw new NullPointerException();
        Object[] elements = getArray();
        int len = elements.length;
        for (int i = 0; i < len; ++i) {
            @SuppressWarnings("unchecked") E e = (E) elements[i];
            action.accept(e);
        }
    }
    
    public int indexOf(E e, int index) {
        Object[] elements = getArray();
        return indexOf(e, elements, index, elements.length);
    }
    

HashSet浅析

  • HashSet的特点与用法
  • HashSet的数据结构
  • HashSet的常用方法及实现
  • TreeSet的简要说明

HashSet的特点与用法

HashSet是一个没有重复元素的集合,其内部元素也没有顺序。
它可以放入空元素。
它不是线程安全的。
它内部是基于HashMap实现的。

以下摘取了一部分HashSet文档:

...   

* This class implements the <tt>Set</tt> interface, backed by a hash table
* (actually a <tt>HashMap</tt> instance).  It makes no guarantees as to the
* iteration order of the set; in particular, it does not guarantee that the
* order will remain constant over time.  This class permits the <tt>null</tt>
* element.
*
* <p>This class offers constant time performance for the basic operations
* (<tt>add</tt>, <tt>remove</tt>, <tt>contains</tt> and <tt>size</tt>),
* assuming the hash function disperses the elements properly among the
* buckets.  Iterating over this set requires time proportional to the sum of
* the <tt>HashSet</tt> instance's size (the number of elements) plus the
* "capacity" of the backing <tt>HashMap</tt> instance (the number of
* buckets).  Thus, it's very important not to set the initial capacity too
* high (or the load factor too low) if iteration performance is important.

...    

如果要遍历元素,则需要注意容器的大小,因为遍历的时间复杂度是容器的实际大小X容器的初始化大小。(测试一有代码和测试结果。)

HashSet的数据结构

HashSet内部使用HashMap来存储数据,新加入的元素会作为key储存到set当中。value部分由一个默认的object元素来进行填充。

private transient HashMap<E,Object> map;

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

/**
 * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
 * default initial capacity (16) and load factor (0.75).
 */
public HashSet() {
    map = new HashMap<>();
}

HashSet的常用方法

  1. add

    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }
    
  2. remove

    public boolean remove(Object o) {
        return map.remove(o)==PRESENT;
    }
    
  3. contain

    public boolean contains(Object o) {
        return map.containsKey(o);
    }
    
  4. 构造函数

    没什么好说的,直接上代码

    public HashSet(int initialCapacity, float loadFactor) {
        map = new HashMap<>(initialCapacity, loadFactor);
    }
    

附一 元素遍历的测试

测试1.1

首先,测试当一个集合包含1000个元素时,集合初始容量为10000和100000时的性能。

long startTime = System.currentTimeMillis();
for(int i=0;i<10000;i++){
    HashSet<String> set = new HashSet<String>(100000);
    for(int j=0;j<1000;j++){
        set.add(j+"");
    }
    Iterator<String> ite = set.iterator();
    while(ite.hasNext()){
        ite.next();
    }
}
System.out.println("开始时间:"+(System.currentTimeMillis()-startTime));
startTime = System.currentTimeMillis();
for(int i=0;i<10000;i++){
    HashSet<String> set = new HashSet<String>(1000);
    for(int j=0;j<1000;j++){
        set.add(j+"");
    }
    Iterator<String> ite = set.iterator();
    while(ite.hasNext()){
        ite.next();
    }
}
System.out.println("开始时间:"+(System.currentTimeMillis()-startTime));

运行结果:

开始时间:2647
开始时间:781

测试1.2

为了消除元素初始化时,不同容量大小引起的误差,去掉遍历,测试运行的时间

long startTime = System.currentTimeMillis();
for(int i=0;i<10000;i++){
    HashSet<String> set = new HashSet<String>(100000);
    for(int j=0;j<1000;j++){
        set.add(j+"");
    }
    /*Iterator<String> ite = set.iterator();
    while(ite.hasNext()){
        ite.next();
    }*/
}
System.out.println("开始时间:"+(System.currentTimeMillis()-startTime));
startTime = System.currentTimeMillis();
for(int i=0;i<1000;i++){
    HashSet<String> set = new HashSet<String>(1000);
    for(int j=0;j<1000;j++){
        set.add(j+"");
    }
    /*Iterator<String> ite = set.iterator();
    while(ite.hasNext()){
        ite.next();
    }*/
}
System.out.println("开始时间:"+(System.currentTimeMillis()-startTime));

运行结果:

开始时间:1142
开始时间:795

TreeSet的简要说明

TreeSet的实现除了委托的数据结构是TreeMap外,其他的操作与HashSet基本类似