A multiple-RAID backup server (May 2007-)

Summary

We keep a description of the departmental backup system on our Computing Wiki The department has currently three RAID based systems (we call them diary's) that serve as a backup system for most of our data disks (basically the disks available in /n). The idea is that as we grow, we either replace the capacity of the RAID disks, and/or buy another RAID unit. The current backup system must scale in either case. The backup system uses the standard rsync program to synchronize your data to the backup disk.

Much like the automounted /n directories, /n/backup provides the user with a transparent interface to the backed up data on this multiple-RAID system. Disks are mounted read-only, so you cannot delete files from the backup server, but permissions are the same, so nobody can see files that you don't want them to see.

Current April 2020 layout: (a total of 121TB)

/dev/md126     xfs    28T   16T   12T  59% /backup1
/dev/sdb       xfs    37T   29T  7.6T  80% /backup2
/dev/sda       xfs    26T   16T  9.7T  63% /backup3
/dev/sdb       xfs    20T   11T  8.6T  56% /backup4

Additional User Tips

Our disk/tape backup system is not 100% foolproof (but we try) and here are some tips on improving your personal data even more. Probably the most important lacking feature is preserving every version of every file you create (though this is for a good reason). However, for a small number of files (typically the ones you edit manually) you may have a good reason to want to have this capability.

Some technical background

The raid servers are named diary1, diary2, and diary3. On each of these a raid disk is mounted as /backup1 and /backup2 resp., within which a backup directory houses all disks in the following hierarchy:
	diaryN:/backupN/backup/host/disk
where N=1,2,.... Some other administrative directories are present as well (e.g. /backupN/attick). Currently diary3 also has /backup4. This obviously complicates the schema when diary4 will be added to the system. Currently the backup program is written in python, and is controlled by a simple ascii table with two active columns
	host:/disk	backup-id
e.g.
	chara:/chara4	1
	hammer:/archive 0
	sol:/sol        -4

meaning that chara:/chara4 is physically backed up on diary1.astro.umd.edu (i.e. in diary1:/backup1/backup/chara/chara4). A 0 in the 2nd column will skip that specific disk. A negative number means the backup system also skips it, but something else Kevin is doing is important. In addition, any diskname that ends in "nb" (e.g. /chara//chara6nb) will not be backed up to the diary system. Treat them like scratch disks.

Logfiles of the nightly backups are currently stored in textual format here.

Caveats and future improvements


Links

  1. first (mostly LMA) raid specs from 2005, about 2 TB. (8x300G)
  2. second raid about 10TB (24x500G), with specs from various companies we approached. We choose Western Scientific.
  3. sample dirvish doc
  4. rsnapshot, another alternative?
  5. trueblade, hosting dirvish??

History

  1. 2004: our current 10Mbit network can't handle the single tape backup system
  2. 2005: implemented diary1 (8x300GB) to play with. Only LMA machines backed up
  3. 2006: ordered diary2 (24 x 500GB)
  4. 2007: upgraded diary1's 8 300GB drives to 1TB drives
  5. june 2009: upgraded diary2's 24 500GB drives to 1.5GB
  6. jan 2013: new diary3, 8 3TB drives in raid5 gives 21 TB. (8 slots still empty)
  7. .. 2014:
  8. 2019: 2nd raid (/backup4) added to diary3, disks upgraded
  9. 2020: moved to (private) repo on github, complete overhaul and using backup.astronet now

Last mod: 28-Apr-2020 by PJT