How to compress files using bzip2

Bzip2 typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression. It is a must algorithm for file compression under Linux. It will save you a lot of space. In our example below we successfully compress a 404MB directory down to 229MB in a minute.

Let's see a real example. See command output below:

backup:/home# du -csh JiraFullBackup
404M    JiraFullBackup
404M    total

backup:/home# tar -cjf JiraFullBackup.tar.bzip2 JiraFullBackup/

backup:/home# du -csh JiraFullBackup.tar.bzip2
229M    JiraFullBackup.tar.bzip2
229M    total

As you can see from the above commands the bzip2 compress our directory from 404MB down to 229MB. That's a lot of space saving!

A detailed list of the bzip2 command can be found below.


-c --stdout

    Compress or decompress to standard output.
-d --decompress

    Force decompression. bzip2, bunzip2 and bzcat are really the same program, and the decision about what actions to take is done on the basis of which name is used. This flag overrides that mechanism, and forces bzip2 to decompress.
-z --compress

    The complement to -d: forces compression, regardless of the invokation name.
-t --test

    Check integrity of the specified file(s), but don't decompress them. This really performs a trial decompression and throws away the result.
-f --force

    Force overwrite of output files. Normally, bzip2 will not overwrite existing output files. Also forces bzip2 to break hard links to files, which it otherwise wouldn't do.

    bzip2 normally declines to decompress files which don't have the correct magic header bytes. If forced (-f), however, it will pass such files through unmodified. This is how GNU gzip behaves.
-k --keep

    Keep (don't delete) input files during compression or decompression.
-s --small

    Reduce memory usage, for compression, decompression and testing. Files are decompressed and tested using a modified algorithm which only requires 2.5 bytes per block byte. This means any file can be decompressed in 2300k of memory, albeit at about half the normal speed.

    During compression, -s selects a block size of 200k, which limits memory use to around the same figure, at the expense of your compression ratio. In short, if your machine is low on memory (8 megabytes or less), use -s for everything. See MEMORY MANAGEMENT below.
-q --quiet

    Suppress non-essential warning messages. Messages pertaining to I/O errors and other critical events will not be suppressed.
-v --verbose

    Verbose mode -- show the compression ratio for each file processed. Further -v's increase the verbosity level, spewing out lots of information which is primarily of interest for diagnostic purposes.
-L --license -V --version

    Display the software version, license terms and conditions.
-1 (or --fast) to -9 (or -best)

    Set the block size to 100 k, 200 k ... 900 k when compressing. Has no effect when decompressing. See MEMORY MANAGEMENT below. The --fast and --best aliases are primarily for GNU gzip compatibility. In particular, --fast doesn't make things significantly faster. And --best merely selects the default behaviour.

    Treats all subsequent arguments as file names, even if they start with a dash. This is so you can handle files with names beginning with a dash, for example: bzip2 -- -myfilename.
--repetitive-fast, --repetitive-best

    These flags are redundant in versions 0.9.5 and above. They provided some coarse control over the behaviour of the sorting algorithm in earlier versions, which was sometimes useful. 0.9.5 and above have an improved algorithm which renders these flags irrelevant.

Posted on: 08/02/2011

If you want to leave a comment please Login or Register