Though tar is widely used for archiving it is rarely used for daily backups because it has no incremental capability - or at least, most people don't think it has. In fact, the GNU version of tar has a perfectly good mechanism for creating and restoring incremental archives; it just isn't very well documented on the man page and you have to hunt around to find a proper description.
It works by storing additional metadata in a separate file called a snapshot file. Let me illustrate using a miniature example - let's suppose I start on 'day 1' with a directory called mypics that contains three files:
$ ls ~/mypics
caption.jpg storm1.jpg sunset1.jpg
If I create a tar archive of this I'll get all three files in it. It is effectively our 'level 0' backup:
$ cd ~/mypics
$ tar cvf /backups/mypics.0.tar -g mypics.snar .
The first argument to tar (cvf) is actually a set of three options. c means create the archive, v means verbose (that is, list the names of the files as they are written to the archive) and f means 'the next argument is the name of the file to write the archive to'. I am assuming here that /backups is a mount point for a filesystem from an NFS server, or perhaps for an external disk drive.
The -g flag is the interesting one. It tells tar to keep a record of what has been archived (and when) in the snapshot file mypics.snar. Finally, the insignificant-looking '.' at the end of the command is the name of the directory I want to archive; in this case, the current directory.
By day 2 I've added a new file to my directory called baby.jpg. I create another archive. It contains only the new file and is our 'level 1':
$ tar cvf /backups/mypics.1.tar -g mypics.snar .
I can continue on day 3, creating a level 2 backup like this:
$ tar cvf /backups/mypics.2.tar -g mypics.snar .
Please be clear that the digits I've put in the output filenames are only for my benefit and in no way control what level my archive will be. That's all handled by the snapshot file mypics.snar. As long as I keep updating the same snapshot, each archive will be incremental to the previous one.
OK, now let's assume that for some reason we lost the entire content of the mypics directory and need to restore from the backup. I would need to restore each of the levels in order:
$ tar xvf /backups/mypics.0.tar -g /dev/null
$ tar xvf /backups/mypics.1.tar -g /dev/null
$ tar xvf /backups/mypics.2.tar -g /dev/null
Even when restoring, you still need the -g flag to get the incremental behaviour, but in this case it does not actually need the snapshot file. It is conventional to give /dev/null as a placeholder argument here, but anything will do. When extracting from the incremental backup, tar attempts to restore the exact state the filesystem had when the archive was created. In particular, it will delete those files in the filesystem that did not exist in their directories when the archive was created.
The above scheme creates a new level of backup each day. An alternative scheme might be to do a level 0 archive to begin with, then just a level 1 on each following day. Of course, the level 1's will gradually get larger, but this scheme makes it a little easier to restore from the archive as you only need to keep the level 0 and the most recent level 1. This requires some manual management of the snapshot file.
In particular you would need to create a working copy of it to use for the level 1 backup on day 2, and on day three you'd again make a working copy of the original snapshot file to make your next level 1. On day 2 you'd do something like:
$ cp mypics.snar mypics.snar-2
$ tar cvf /backups/mypics.day2.1.tar -g mypics.snar-2 .
and on day 3 you'd do it again:
$ cp mypics.snar mypics.snar-3
$ tar cvf /backups/mypics.day3.1.tar -g mypics.snar-3 .
Six obvious things about backups
- The most important thing about backups is not that you choose the latest, fastest, super-compresso technology, but that you actually make sure you do them, in some reasonable way, on a consistent, regular basis. Doing backups is a bit like paying insurance premiums - you kinda hope that you're never going to need to make a claim, and the temptation is not to do them at all.
- Making backups of a filesystem on to the same hard drive that the file system is on is a bit like asking Carla Sarkozy for a date - ie a complete waste of time. Don't do it this way.
- If you backup on to another machine on your network, keep in mind that if your machine gets hacked, the backup server might too. (There is nothing more reassuring than a ten-foot physical gap between your local network and an external USB drive sitting on a shelf.)
- If you backup on to removable media, label them!
- Consider storing external backup media (such as CDs or hard drives) off-premises. I find my next-door neighbours quite co-operative in this. Of course, you are giving them access to all your private data, so you need to trust them (or assume they won't figure out how to access it).
- Whatever backup method you use, make sure you can actually restore files. Do a 'fire drill' - pretend you've lost some files, then go through the process of recovering them.
You should follow us on Identi.ca or Twitter