A multiple-RAID backup server (May 2007-)

Summary

We keep a description of the departmental backup system on our Computing Wiki The department has currently three RAID based systems (we call them diary's) that serve as a backup system for most of our data disks (basically the disks available in /n). The idea is that as we grow, we either replace the capacity of the RAID disks, and/or buy another RAID unit. The current backup system must scale in either case. The backup system uses the standard rsync program to synchronize your data to the backup disk.

Much like the automounted /n directories, /n/backup provides the user with a transparent interface to the backed up data on this multiple-RAID system. Disks are mounted read-only, so you cannot delete files from the backup server, but permissions are the same, so nobody can see files that you don't want them to see.

Current April 2020 layout: (a total of 121TB)

/dev/md126     xfs    28T   16T   12T  59% /backup1
/dev/sdb       xfs    37T   29T  7.6T  80% /backup2
/dev/sda       xfs    26T   16T  9.7T  63% /backup3
/dev/sdb       xfs    20T   11T  8.6T  56% /backup4

Additional User Tips

Our disk/tape backup system is not 100% foolproof (but we try) and here are some tips on improving your personal data even more. Probably the most important lacking feature is preserving every version of every file you create (though this is for a good reason). However, for a small number of files (typically the ones you edit manually) you may have a good reason to want to have this capability.

manually backup your important files into a special area, and let our raid take care of the rest, for example
```
        cp $file $backup/`date+%Y-%m-%d-%H:%M:%S`_$file
```
If these are text files, they generally compress very well. Combine this with a simple checksum detection if the file was even changed, and you have a near perfect personal backup. An example of this script is in ~teuben/bin/dated-backup
if a manual backup as just described in the previous item doesn't appeal to you, use something like FlyBack or Dirvish (see below)
cloud tools like Dropbox, and even github, can help maintaining versions off-site.
use tools like RCS, CVS or SVN to keep your own local history. Not just useful for code, but also for papers.
if you have a laptop, rsync your important laptop data to a departmental disk and let the diary system take care of those
rsync your important department disk data to your laptop or personal disk.

Some technical background

The raid servers are named diary1, diary2, and diary3. On each of these a raid disk is mounted as /backup1 and /backup2 resp., within which a backup directory houses all disks in the following hierarchy:

	diaryN:/backupN/backup/host/disk

where N=1,2,.... Some other administrative directories are present as well (e.g. /backupN/attick). Currently diary3 also has /backup4. This obviously complicates the schema when diary4 will be added to the system. Currently the backup program is written in python, and is controlled by a simple ascii table with two active columns

	host:/disk	backup-id
e.g.
	chara:/chara4	1
	hammer:/archive 0
	sol:/sol        -4

meaning that chara:/chara4 is physically backed up on diary1.astro.umd.edu (i.e. in diary1:/backup1/backup/chara/chara4). A 0 in the 2nd column will skip that specific disk. A negative number means the backup system also skips it, but something else Kevin is doing is important. In addition, any diskname that ends in "nb" (e.g. /chara//chara6nb) will not be backed up to the diary system. Treat them like scratch disks.

Logfiles of the nightly backups are currently stored in textual format here.

Caveats and future improvements

since backups are done in finite time, there is no true snapshot status. They generally start at 1am and depending on the daily activity could take a few hours.
diary2 is currently running an older version of linux
directories such as earth:/var/spool/mail wind up on /backup2/backup/earth/mail
incremental backups are done nightly at 1am, and sunday 1am the backups are mirrored and you loose all backups.
this sawtooth pattern can be flattened with a program like dirvish, which use rsync and unix hard links to write true snapshots, at the potential cost of a large number of inodes and diskspace. This is being looked into now if it will work for us.
Other dirvish-like options: rsnapshot, flyback.
implement a Time Machine like option (likening the one in MacOS 10.5), e.g. FlyBack
use one tape for daily incremental tape backups, the other for regular full backups?
use a lockfile (and useful history file) in the root directory (parallel to the lost+found) of each disk.
a global "locate" command that - after nightly collection of all the updatedb's, can loop over them. I have "blocate" at home.

History

2004: our current 10Mbit network can't handle the single tape backup system
2005: implemented diary1 (8x300GB) to play with. Only LMA machines backed up
2006: ordered diary2 (24 x 500GB)
2007: upgraded diary1's 8 300GB drives to 1TB drives
june 2009: upgraded diary2's 24 500GB drives to 1.5GB
jan 2013: new diary3, 8 3TB drives in raid5 gives 21 TB. (8 slots still empty)
.. 2014:
2019: 2nd raid (/backup4) added to diary3, disks upgraded
2020: moved to (private) repo on github, complete overhaul and using backup.astronet now