Compression Utilities

tar and zip

The zip utility has been around since long before the advent of WinZip. In fact it predates pkzip, the first popular DOS compression/archiving utility.

Unix, as usual, breaks things down to more elemental processes: zipping (compressing a file) is separate from archiving (enclosing a group of files in an “envelope”).

In the Linux world, there are several zip programs: gzip (GNU zip), bzip, zip and compress.

 

tar: tape archive

tar is used to pack the entire contents of a directory or directories into a single file called a tarball, which can then be backed up to tape or saved as a file.

tar preserves the entire directory organization including file ownership, permissions, links, and the directory structure.

Note that tar cannot archive files or directories with names longer than 255 characters.

 

tar functions

The most commonly used tar functions are:

c – create an archive

x – extract files from an archive

t – show the “table of contents” of an archive
(you must use -z as well, if the archive is compressed)

A – append one archive to another

r – append files to an existing archive

P – store files with absolute pathnames

u – appends files to an archive if they are newer than the same files already in the archive

W – verify contents of the archive after creation

 

tar options

Additionally, there are commonly used options:

v – verbose

f  <filename> – use the specified file

z – gzip/gunzip

–gzip – gzip

–gunzip – unzip

j – bzip/bunzip

Z – use the compress utility

 

gzip: Lempel-Ziv compression (LZ77)

The gzip utility compresses files like WinZip or pkzip. It can achieve up to 60-70% compression, though often this is much less. The command itself is used very simply:

gzip zipped_file.gz file_to_zip.txt

gzip -d zipped_file.gz #d for “decompress”

gzip -r zipped_file.gz /directory/to/zip

gunzip zipped_file.gz

zcat zipped_file.gz

zmore zipped_file.gz

The tricky part comes with the huge number of options. Particularly, you can control the level of compression, on a scale of 1 (fast) to 9 (small):

gzip -v -1 zipped_file.gz file_to_zip.txt

 

bzip: Burrows-Wheeler Block Sorting Huffman Coding Algorithm

bzip can’t compress a directory full of files, while other algorithms can. You also can’t zcat ar zmore a .bz file. But you often get higher compression: 50-75%.

bzip2 -v file1 file2

bzcat file1.bz2

bzmore file1.bz2

bzless file1.bz2

bunzip2 -v file1.bz2 file2.bz2

bzip -d file1.bz2

bzip -f file1.bz2 #force

bzip -k file1.txt #keep the original file; otherwise deleted

bzip -q file1.txt #quiet

 

Unpacking files

When you download a file, you often get an archive that contains several – perhaps many – files within it. Archive files can also contain directories, which can contain files and subdirectories. Generally, these archive files have names ending in .tar, .tgz, .tar.gz, tar.bz2 or .zip.

To get at the files contained in the archives, you must unpack them.

 

Unpacking .tar files

If the file is a .tar file, open a terminal window, move to the directory containing the .tar file, and issue the command:

tar xvf filename.tar

The command will unpack the contained files to your current directory. You can use the ls command to view the results of your work.

(Quick: what do the three options mean?)

 

Unpacking .tgz and .tar.gz files

If the packed file is a .tgz or .tar.gz file, it has been archived and then compressed. So, the previous command, which merely unarchives the file, won’t work. To unpack a .tgz or .tar.gz file, open a terminal window, move to the directory containing the .tgz or .tar.gz file, and type:

tar zxvf <filename>

The command will expand the compressed file and then unpack the archived files to your current directory.

 

Unpacking .bz2 files

To unpack a .bz2 (bzip) file, open a terminal window, move to the directory containing the . bz2 file, and type:

tar jxfv file.tar.bz2

Notice that the only difference is the “j” option replacing the “z” option.

 

Unzipping .zip files

If the packed file is a .zip file, you shouldn’t issue a tar command at all. Instead, open a terminal window, move to the directory containing the compressed file, and type:

unzip filename.zip

Like the previous command, this command also unpacks the contained files to your current directory.

 

Archiving files: Making a tarball

Why do we do this? For one, it’s easier to send someone a single file than a whole mess of them. But for another, it’s often important to send files with permissions and file attributes intact, something that never happens with most forms of copying.

To create a .tar file, open a terminal window, move to the directory containing the files or directories you want to pack, and type:

tar cvf mynewfile.tar file1 file2 …

You can list as many files and directories as you want; just separate each entry from the previous entry with a space.

 

Compressing your tarball

If you want to compress the packed file, simply include the z flag on the command line:

tar zcvf mynewfile.tgz file1 file2 …

Be very aware that tar syntax gets quite tricky. Resort to man tar often, and/or follow instructions diligently.

 

Creating a .zip file

A potential advantage of a .zip file is that familiar Windows programs use the .zip format. To create a .zip file, at a command line and in the directory containing the files you want to zip, type:

zip myfile.zip file1 file2 …

where myfile.zip is the name of the .zip file you want to create and file1 and file2 are files or directories you want to pack. As in the .tar command, the ellipses (…) indicate that your list can go on forever.

 

The compress utility: LZW compression

compress -v file1 file2

compress -f smallfile1 symlink1

uncompress file1.Z file2.Z

zcat file1.Z

zmore file1.Z

zless file1.Z

 

Building blocks

The area of archives is a great place to look at how Unix commands can be assembled.

Imagine almost any scenario in which you wanted to zip up a group of files. On the command line, the most heinous chore is listing those files.

How about using the find command to do the listing for you?

find . –name “*.[ch]” –print | zip newfile.zip -@

Which means:

“Find, in this very directory ( . ), the files with names like anything dot c or h, print that list, and pipe it to zip, which will create a new zip file, and put the files listed by find into it.”

Check out man zip to find out about using -@.

 

Getting regular

find . –name “*.[ch]” –print | zip new.zip -@

Notice that text string in weak quotes? The square brackets are part of a whole other phenomenon in Unix: Regular Expressions.

Suffice it to say for now that in this case, anything.c or anything.h would match the expression. Just be aware that these things exist and that the letters inside the brackets describe a list or range of matches.

Tarring what you find

Want to find files, tar them up, maybe even remove the originals once you’ve created your tarball? Consider this:

find * -type f -mtime +90 -print > list.txt

tar -create -gzip -file my.tar.gz -files-from list.txt -remove-files