Dynadot  Tile mover  Tencent Cloud

How fast is it to create Tar archive without Gzip?

 Sider  GigsGigsCloud

About Gzip and Tar

Everyone on Linux and BSD seems to be using a program called gzip, which is often used in combination with another program called tar. Tar, named after Tape ARchive, is a program that copies files and folders ("directories") to a format originally designed for archiving on tape. But tar archives can also be saved to many file systems other than tape. Tar archives can be saved to ordinary hard disks, solid state disks, NVMe drives, etc.

When creating an archive, people often want to minimize the size of the archive. This is where gzip comes into play. Gzip reduces the size of files, so they take up less storage space. After that, you can "unzip" the gzip compressed tar file. Decompression restores the tar archive to its original size. When decompressing, you can use the tar program again to "extract" or "decompress" the archive. Extract the original files that are expected to be restored to the archive, just as they were when the archive was created.

In addition to archiving for long-term storage, many people often use tar and gzip for short-term backups. For example, on my server, Darkstar, I compiled and installed many programs. Before compiling, I use tar to make a short-term backup of the situation before compiling and installing.

Three good reasons for compiling

First, compiling lets us get the latest source code of the program. Second, once we have completed several compilations, it is much easier to compile a program from the latest source code than to figure out how to use our distribution's package manager to install an usually older version. Third, our own compilation makes the program source easy to obtain.

The programs I compile on Darkstar are usually located in/usr/local. Before I put the new program in usr/local, I like to create an archive of/usr/local (in addition to my regular backup of Darkstar), because it exists before the new software is added. With a convenient/usr/local archive, it is easy to recover if crazy errors occur during my new installation.

Creating a precompiled backup can take a long time

Recently, with /usr/local More and more software has been added to the. It takes too long to make a pre compilation archive, about half an hour.

Recently, I used top(1) The command observes the archive being formed. I noticed that gzip was reported to use 100% of a cpu during the entire archive formation process.

What is the speed and capacity of an ordinary Tar archive without Gzip?

I wonder what happens to the total time required to make a precompiled archive if I don't use gzip. I also want to know how big the file will be. The following shows the data and analysis of the amazing creation time difference I found. The archive size also varies greatly, but is far less than the creation time difference.

Create time data

I run the precompiled archive twice, using gzip once and not using gzip once. I made line numbered transcripts for these two tests.

 000023  root@darkstar :/usr# time tar cvzf local-revert.tgz local 000024 local/ [ . . . ] 401625 local/include/gforth/0.7.3/config.h four hundred and one thousand six hundred and twenty-six 401627 real 28m11.063s 401628 user 27m1.436s 401629 sys 1m21.425s four hundred and one thousand six hundred and thirty root@darkstar :/usr# time tar cvf local-revert.tar local 401631 local/ [ . . . ] 803232 local/include/gforth/0.7.3/config.h eight hundred and three thousand two hundred and thirty-three 803234 real 1m14.494s 803235 user 0m4.409s 803236 sys 0m46.376s eight hundred and three thousand two hundred and thirty-seven root@darkstar :/usr#

This Stack Overflow post explains the difference between the real time, user time and system time reported by the time (1) command. The "real" time is the wall clock time, so the "real" time shows the time required for our order to complete.

Gzip took 22 times as long!

Here, we can see that it takes about 28 minutes to create an archive using gzip. It only took 1.25 minutes to create the archive without gzip. The production time of gzip compressed file is 22 times longer than that of uncompressed compressed file!

Archive size data

Now let's check the archive size.

 root@darkstar :/usr# ls -lh local-revert.t* -rw-r--r-- 1 root root 22G Oct 4 05:22 local-revert.tar -rw-r--r-- 1 root root 10G Oct 4 05:20 local-revert.tgz root@darkstar :/usr#

The gzipped archive is 10 GB, while the ordinary uncompressed tar archive is 22 GB.

Gzip compression ratio is 55%

The compressed file is 55% compressed. This is a big compression!

conclusion

On Darkstar, there is sufficient additional disk space. Therefore, having an archive twice as large but 22 times faster is probably the best choice. Looking ahead, before compiling, I will backup /usr/local To enable recovery to skip any compression completely. Now I don't have to wait for half an hour!

Additional thinking

The creation time and archive size results are expected to vary depending on the type of file involved. For example, unlike the files in/usr/local of Darkstar, many image file formats have been compressed, so additional compression will not reduce their size.

When preparing this article, I found pigz. Pigz (pronounced "pig zee") is an implementation of gzip w, which allows the use of multi-core processors. Maybe pigz will soon become the new neighbor of/usr/local of Darkstar.

Another way to speed up compression is to use a different compression program than gzip. There are many popular ones, such as bzip2 and xz. You can use the - I option of tar to call these other compressors.

Of course, it is one thing to change the compression program with the - I option of tar, and another thing to make tar work in parallel. This is a Stack Exchange post about parallel processing of tarring. I will have to try.

Finally, it seems clear that the source code we compile ourselves is the source code of the program we actually run, unlike the source code we obtain and the compiler. However, as early as 1984, Ken Thompson realized that programs compiled by ourselves may sometimes be quite different from our expectations.

 Dynadot  Hostwinds
Like( one )
Copyright notice: This article is authorized by the Knowledge Sharing Attribution 4.0 International License Agreement [BY-NC-SA]
Article name:《 How fast is it to create Tar archive without Gzip?
Article link: https://oldtang.com/11857.html
The resources of this website are only for personal learning and exchange. Please delete them within 24 hours after downloading, and they are not allowed to be used for commercial purposes, otherwise the legal issues will be borne by yourself.