In another article compression tools on linux gzip vs bzip2 vs lzma vs compress, I compared those tools, some are good for speed, while some are good for space saving or for particular type of files. Hence, they either save on space, or CPU. In nowadays, CPU resources are generally configured high enough to meet peak hour usage, so there are plenty of them can be used in other times.
How do we zipping tools to utilize more CPU resources?
Multiple zipping processes, using xargs
Here is the example, details in Control and run multiple processes in bash
nice /usr/bin/find /home/backups/archivelogs -not -name "*.bz2" | xargs -n 1 -P 5 bzip2
In the case above, there usually have thousands of small log files need to be zipped everything before archived, by using 5 parallel processes, easily, I got zipping process done 5 times faster.
Multiple zipping thread
What about big single file?
There is a multiple threads zipping tool called pbzip2. It's available in many linux distributions.
Here is example, file is loaded into memory before following test.
Bzip2 and time cost, single thread
$time bzip2 M
bzip2 -d and time cost
$time bzip2 -d M.bz2
pbzip2 and time cost, multiple threads
$time pbzip2 M
pbzip2 -d and time cost
$time pbzip2 -d M.bz2
More examples for pbzip
Example: pbzip2 -b15vk myfile.tar
Example: pbzip2 -p4 -r -5 myfile.tar second*.txt
Example: tar cf myfile.tar.bz2 --use-compress-prog=pbzip2 dir_to_compress/
Example: pbzip2 -d -m500 myfile.tar.bz2
Example: pbzip2 -dc myfile.tar.bz2 | tar x
Example: pbzip2 -c < myfile.txt > myfile.txt.bz2
it does even better, but with a little less space saving.
$time pigz M
$time pigz -d M
File size compare between pigz and pbzip2
-rw-r--r-- 1 test test 363407360 Dec 19 11:24 M
-rw-r--r-- 1 test test 142629350 Dec 19 11:24 M.bz2
-rw-r--r-- 1 test test 152936211 Dec 19 11:24 M.gz