Here just a quick tips for text string compression on Linux bash


f=$(echo "hello Fibervillage" | gzip -cf)


f=$(echo "hello Fibrevillage" | gzip -cf)

However, when you want to uncompress it in the same way, you get

echo $f |gunzip -cf
gzip: stdin is a a multi-part gzip file -- not supporte

This is because you have broken compressed data in the $f string. Thus, cannot be uncompressed.

This way works, 

$echo "hello Fibrevillage" | gzip -cf > /tmp/myfile
$cat /tmp/myfile |gunzip -cf
hello Fibrevillage

Note: You don't have same bytes in variable f as you have in /tmp/myfile. The VAR=$(...) construction is designed for working with text. This is why you get extra trailing trim for example.


This way simply works.

$f=$(echo "hello Fibrevillage" | base64 -w0)
$echo $f |base64 -d
hello Fibrevillage

You can even combine two zip command together

$ F=$(echo "Hello world" | gzip | base64 -w0) # compressed, base64 encoded data
$ echo $F | base64 -d | gunzip # use base64 decoded, uncompressed data
Hello world

It will work, but each 3 (compressed) bytes will be stored in 4 bytes of text.


Probably you have noticed, both ways efficiency is not really good. Below are two tools that specificly designed for string/text compression.


General purpose compression libraries will build the state needed for compressing data dynamically, in order to be able to compress every kind of data. This is a very good idea, but not for a specific problem: compressing small strings will not work.

Smaz is a simple compression library suitable for compressing very short strings. But Smaz is not good for compressing general purpose data, however still can compress text by 40-50% in the average case (works better with English), and is able to perform a bit of compression for HTML and urls as well. The important point is that Smaz is able to compress even strings of two or three bytes!

You can download its source code and compile it from


shoco is a C library to compress and decompress short strings. It is very fast and easy to use. The default compression model is optimized for english words, but you can generate your own compression model based on your specific input data.

Get its code from


Comments powered by CComment