Performance Comparison of Java Compression Algorithms

original
2016/12/13 21:35
Reading 4.6W

preface
In game development, the necessary information initialization is often carried out when the player enters the game. Usually, the initialization information packet is relatively large, usually about 30-40kb. It is still necessary to compress it before sending messages. I just saw it a while ago, and listed some common compression algorithms, as shown in the following figure:

Whether it is sharable indicates whether it is possible to search any position of the data stream and further read data down. This function is particularly suitable in Hadoop's MapReduce.
The following is a brief introduction to these compression formats, and a pressure test is carried out to compare their performance

DEFLATE
DEFLATE is a lossless data compression algorithm that uses LZ77 algorithm and Huffman Coding at the same time. The source code of DEFLATE compression and decompression can be found on the free and universal compression library zlib. zlib official website: http://www.zlib.net/  
The zlib compression library is supported in the jdk. The compression class Deflater and decompression class Inflator provide native methods

 private native int deflateBytes(long addr, byte[] b, int off, int len, int flush);
 private native int inflateBytes(long addr, byte[] b, int off, int len) throws DataFormatException;

You can directly use the compression class Deflater and decompression class Inflater provided by jdk. The code is as follows:

 public static byte[] compress(byte input[]) { ByteArrayOutputStream bos = new ByteArrayOutputStream(); Deflater compressor = new Deflater(1); try { compressor.setInput(input); compressor.finish(); final byte[] buf = new byte[2048]; while (! compressor.finished()) { int count = compressor.deflate(buf); bos.write(buf, 0, count); } } finally { compressor.end(); } return bos.toByteArray(); } public static byte[] uncompress(byte[] input) throws DataFormatException { ByteArrayOutputStream bos = new ByteArrayOutputStream(); Inflater decompressor = new Inflater(); try { decompressor.setInput(input); final byte[] buf = new byte[2048]; while (! decompressor.finished()) { int count = decompressor.inflate(buf); bos.write(buf, 0, count); } } finally { decompressor.end(); } return bos.toByteArray(); }

You can specify the compression level of the algorithm, so that you can balance the compression time and the output file size. The optional levels are 0 (no compression) and 1 (fast compression) to 9 (slow compression). Here, speed takes precedence.

gzip
The implementation algorithm of gzip is still deflate, but the file header and file tail are added to the deflate format. Similarly, jdk also provides support for gzip, which are GZIPOutputStream and GZIPInputStream classes, respectively. It can also be found that GZIPOutputStream inherits from DeflaterOutputStream, GZIPInputStream inherits from InflaterInputStream, and writeHeader and writeTailer methods can be found in the source code:

 private void writeHeader() throws IOException { ...... } private void writeTrailer(byte[] buf, int offset) throws IOException { ...... }

The specific code is as follows:

 public static byte[] compress(byte srcBytes[]) { ByteArrayOutputStream out = new ByteArrayOutputStream(); GZIPOutputStream gzip; try { gzip = new GZIPOutputStream(out); gzip.write(srcBytes); gzip.close(); } catch (IOException e) { e.printStackTrace(); } return out.toByteArray(); } public static byte[] uncompress(byte[] bytes) { ByteArrayOutputStream out = new ByteArrayOutputStream(); ByteArrayInputStream in = new ByteArrayInputStream(bytes); try { GZIPInputStream ungzip = new GZIPInputStream(in); byte[] buffer = new byte[2048]; int n; while ((n = ungzip.read(buffer)) >= 0) { out.write(buffer, 0, n); } } catch (IOException e) { e.printStackTrace(); } return out.toByteArray(); }

bzip2
Bzip2 is a data compression algorithm and program developed by Julian Seward and released according to the free software/open source software protocol. Seward released bzip2 0.15 for the first time in July 1996. In the following years, the stability of this compression tool has improved and become increasingly popular. Seward released version 1.0 in late 2000. More wikis bzip2
Bzip2 has higher compression efficiency than traditional gzip, but its compression speed is slower.
Bzip2 is not implemented in jdk, but it is implemented in commons compress. Maven introduces:

 <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-compress</artifactId> <version>1.12</version> </dependency>

The specific code is as follows:

 public static byte[] compress(byte srcBytes[]) throws IOException { ByteArrayOutputStream out = new ByteArrayOutputStream(); BZip2CompressorOutputStream bcos = new BZip2CompressorOutputStream(out); bcos.write(srcBytes); bcos.close(); return out.toByteArray(); } public static byte[] uncompress(byte[] bytes) { ByteArrayOutputStream out = new ByteArrayOutputStream(); ByteArrayInputStream in = new ByteArrayInputStream(bytes); try { BZip2CompressorInputStream ungzip = new BZip2CompressorInputStream( in); byte[] buffer = new byte[2048]; int n; while ((n = ungzip.read(buffer)) >= 0) { out.write(buffer, 0, n); } } catch (IOException e) { e.printStackTrace(); } return out.toByteArray(); }

The compression algorithms of lzo, lz4 and snappy described below have the priority of compression speed, but the compression efficiency is lower.

lzo
LZO is a data compression algorithm dedicated to decompression speed. LZO is the abbreviation of Lempel Ziv Oberhumer. This algorithm is lossless. More wikis LZO
Third party libraries need to be imported. Maven imports:

 <dependency> <groupId>org.anarres.lzo</groupId> <artifactId>lzo-core</artifactId> <version>1.0.5</version> </dependency>

Specific implementation code:

 public static byte[] compress(byte srcBytes[]) throws IOException { LzoCompressor compressor = LzoLibrary.getInstance().newCompressor( LzoAlgorithm.LZO1X, null); ByteArrayOutputStream os = new ByteArrayOutputStream(); LzoOutputStream cs = new LzoOutputStream(os, compressor); cs.write(srcBytes); cs.close(); return os.toByteArray(); } public static byte[] uncompress(byte[] bytes) throws IOException { LzoDecompressor decompressor = LzoLibrary.getInstance() .newDecompressor(LzoAlgorithm.LZO1X, null); ByteArrayOutputStream baos = new ByteArrayOutputStream(); ByteArrayInputStream is = new ByteArrayInputStream(bytes); LzoInputStream us = new LzoInputStream(is, decompressor); int count; byte[] buffer = new byte[2048]; while ((count = us.read(buffer)) != - 1) { baos.write(buffer, 0, count); } return baos.toByteArray(); }

lz4
LZ4 is a lossless data compression algorithm, focusing on compression and decompression speed More wikis lz4
Maven introduces third-party libraries:

 <dependency> <groupId>net.jpountz.lz4</groupId> <artifactId>lz4</artifactId> <version>1.2.0</version> </dependency>

Specific code implementation:

 public static byte[] compress(byte srcBytes[]) throws IOException { LZ4Factory factory = LZ4Factory.fastestInstance(); ByteArrayOutputStream byteOutput = new ByteArrayOutputStream(); LZ4Compressor compressor = factory.fastCompressor(); LZ4BlockOutputStream compressedOutput = new LZ4BlockOutputStream( byteOutput, 2048, compressor); compressedOutput.write(srcBytes); compressedOutput.close(); return byteOutput.toByteArray(); } public static byte[] uncompress(byte[] bytes) throws IOException { LZ4Factory factory = LZ4Factory.fastestInstance(); ByteArrayOutputStream baos = new ByteArrayOutputStream(); LZ4FastDecompressor decompresser = factory.fastDecompressor(); LZ4BlockInputStream lzis = new LZ4BlockInputStream( new ByteArrayInputStream(bytes), decompresser); int count; byte[] buffer = new byte[2048]; while ((count = lzis.read(buffer)) != - 1) { baos.write(buffer, 0, count); } lzis.close(); return baos.toByteArray(); }

snappy
Snappy (formerly known as Zippy) is a fast data compression and decompression library written by Google in C++language based on the idea of LZ77, and was open source in 2011. Its goal is not maximum compression rate or compatibility with other compression libraries, but very high speed and reasonable compression rate. More wikis snappy
Maven introduces third-party libraries:

 <dependency> <groupId>org.xerial.snappy</groupId> <artifactId>snappy-java</artifactId> <version>1.1.2.6</version> </dependency>

Specific code implementation:

 public static byte[] compress(byte srcBytes[]) throws IOException { return  Snappy.compress(srcBytes); } public static byte[] uncompress(byte[] bytes) throws IOException { return Snappy.uncompress(bytes); }

Pressure test
The following compression and decompression tests are carried out for 35kb player data. 35kb data is relatively small. All the following test results are only for the specified data range, and do not indicate which compression algorithm is good or bad.
Test environment:
jdk:1.7.0_79
cpu: i5-4570@3.20GHz 4-core
memory:4G

Perform 2000 compression and decompression tests on 35kb data. The test code is as follows:

 public static void main(String[] args) throws Exception { FileInputStream fis = new FileInputStream(new File("player.dat")); FileChannel channel = fis.getChannel(); ByteBuffer bb = ByteBuffer.allocate((int) channel.size()); channel.read(bb); byte[] beforeBytes = bb.array(); int times = 2000; System. out. println ("Size before compression:"+beforeBytes. length+"bytes"); long startTime1 = System.currentTimeMillis(); byte[] afterBytes = null; for (int i = 0;  i < times; i++) { afterBytes = GZIPUtil.compress(beforeBytes); } long endTime1 = System.currentTimeMillis(); System. out. println ("Compressed size:"+afterBytes. length+"bytes"); System. out. println ("Compression times:"+times+", time:"+(endTime1 - startTime1) + "ms"); byte[] resultBytes = null; long startTime2 = System.currentTimeMillis(); for (int i = 0;  i < times; i++) { resultBytes = GZIPUtil.uncompress(afterBytes); } System. out. println ("uncompressed size:"+resultBytes. length+"bytes"); long endTime2 = System.currentTimeMillis(); System. out. println ("Decompression times:"+times+", time:"+(endTime2 - startTime2) + "ms"); }

GZIPUtil in the code is replaced according to different algorithms, and the test results are shown in the following figure:

The statistics of the size before compression, the size after compression, the compression time, the decompression time, and the CPU peak are made respectively

summary
From the results, deflate, gzip and bzip2 pay more attention to the compression ratio, and the compression and decompression time will be longer; The compression algorithms of lzo, lz4 and snappy have the priority of compression speed, and the compression ratio will be slightly lower; Lzo, lz4 and snappy are lower at the cpu peak. Because within the tolerance of the compression rate, we pay more attention to the compression and decompression time, as well as the use of the CPU. All of us finally use SNAPPY. It is not difficult to find that SNAPPY has the lowest compression and decompression time, as well as the CPU peak, and there are not many disadvantages in the pressure rate.

Personal blog: codingo.xyz

Expand to read the full text
Loading
Click to join the discussion 🔥 (24) Post and join the discussion 🔥
Reward
twenty-four comment
three hundred and twenty-three Collection
twenty-eight fabulous
 Back to top
Top