tar and zip
The zip utility has been around since long before the advent of WinZip. In fact it predates pkzip, the first popular DOS compression/archiving utility.
Unix, as usual, breaks things down to more elemental processes: zipping (compressing a file) is separate from archiving (enclosing a group of files in an “envelope”).
In the Linux world, there are several zip programs: gzip (GNU zip), bzip, zip and compress.
tar is used to pack the entire contents of a directory or directories into a single file called a tarball, which can then be backed up to tape or saved as a file.
tar preserves the entire directory organization including file ownership, permissions, links, and the directory structure.
Note that tar cannot archive files or directories with names longer than 255 characters.
The most commonly used tar functions are:
c – create an archive
x – extract files from an archive
t – show the “table of contents” of an archive
(you must use -z as well, if the archive is compressed)
A – append one archive to another
r – append files to an existing archive
P – store files with absolute pathnames
u – appends files to an archive if they are newer than the same files already in the archive
W – verify contents of the archive after creation
Additionally, there are commonly used options:
v – verbose
f <filename> – use the specified file
z – gzip/gunzip
–gzip – gzip
–gunzip – unzip
j – bzip/bunzip
Z – use the compress utility
The gzip utility compresses files like WinZip or pkzip. It can achieve up to 60-70% compression, though often this is much less. The command itself is used very simply:
gzip zipped_file.gz file_to_zip.txt
gzip -d zipped_file.gz #d for “decompress”
gzip -r zipped_file.gz /directory/to/zip
The tricky part comes with the huge number of options. Particularly, you can control the level of compression, on a scale of 1 (fast) to 9 (small):
gzip -v -1 zipped_file.gz file_to_zip.txt
bzip can’t compress a directory full of files, while other algorithms can. You also can’t zcat ar zmore a .bz file. But you often get higher compression: 50-75%.
bzip2 -v file1 file2
bunzip2 -v file1.bz2 file2.bz2
bzip -d file1.bz2
bzip -f file1.bz2 #force
bzip -k file1.txt #keep the original file; otherwise deleted
bzip -q file1.txt #quiet
When you download a file, you often get an archive that contains several – perhaps many – files within it. Archive files can also contain directories, which can contain files and subdirectories. Generally, these archive files have names ending in .tar, .tgz, .tar.gz, tar.bz2 or .zip.
To get at the files contained in the archives, you must unpack them.
If the file is a .tar file, open a terminal window, move to the directory containing the .tar file, and issue the command:
tar xvf filename.tar
The command will unpack the contained files to your current directory. You can use the ls command to view the results of your work.
(Quick: what do the three options mean?)
If the packed file is a .tgz or .tar.gz file, it has been archived and then compressed. So, the previous command, which merely unarchives the file, won’t work. To unpack a .tgz or .tar.gz file, open a terminal window, move to the directory containing the .tgz or .tar.gz file, and type:
tar zxvf <filename>
The command will expand the compressed file and then unpack the archived files to your current directory.
To unpack a .bz2 (bzip) file, open a terminal window, move to the directory containing the . bz2 file, and type:
tar jxfv file.tar.bz2
Notice that the only difference is the “j” option replacing the “z” option.
If the packed file is a .zip file, you shouldn’t issue a tar command at all. Instead, open a terminal window, move to the directory containing the compressed file, and type:
Like the previous command, this command also unpacks the contained files to your current directory.
Why do we do this? For one, it’s easier to send someone a single file than a whole mess of them. But for another, it’s often important to send files with permissions and file attributes intact, something that never happens with most forms of copying.
To create a .tar file, open a terminal window, move to the directory containing the files or directories you want to pack, and type:
tar cvf mynewfile.tar file1 file2 …
You can list as many files and directories as you want; just separate each entry from the previous entry with a space.
If you want to compress the packed file, simply include the z flag on the command line:
tar zcvf mynewfile.tgz file1 file2 …
Be very aware that tar syntax gets quite tricky. Resort to man tar often, and/or follow instructions diligently.
A potential advantage of a .zip file is that familiar Windows programs use the .zip format. To create a .zip file, at a command line and in the directory containing the files you want to zip, type:
zip myfile.zip file1 file2 …
where myfile.zip is the name of the .zip file you want to create and file1 and file2 are files or directories you want to pack. As in the .tar command, the ellipses (…) indicate that your list can go on forever.
compress -v file1 file2
compress -f smallfile1 symlink1
uncompress file1.Z file2.Z
The area of archives is a great place to look at how Unix commands can be assembled.
Imagine almost any scenario in which you wanted to zip up a group of files. On the command line, the most heinous chore is listing those files.
How about using the find command to do the listing for you?
find . –name “*.[ch]” –print | zip newfile.zip -@
“Find, in this very directory ( . ), the files with names like anything dot c or h, print that list, and pipe it to zip, which will create a new zip file, and put the files listed by find into it.”
Check out man zip to find out about using -@.
find . –name “*.[ch]” –print | zip new.zip -@
Notice that text string in weak quotes? The square brackets are part of a whole other phenomenon in Unix: Regular Expressions.
Suffice it to say for now that in this case, anything.c or anything.h would match the expression. Just be aware that these things exist and that the letters inside the brackets describe a list or range of matches.
Want to find files, tar them up, maybe even remove the originals once you’ve created your tarball? Consider this:
find * -type f -mtime +90 -print > list.txt
tar -create -gzip -file my.tar.gz -files-from list.txt -remove-files