A multiple-RAID backup server (May 2007-)
Summary
We keep a
description of the departmental backup system on our
Computing Wiki
The department has currently three RAID based systems (we call them diary's)
that serve as a backup
system for most of our data disks (basically the disks available in /n).
The idea is that as we grow, we either replace the capacity of the RAID disks,
and/or buy another RAID unit. The current backup system must scale in either case.
The backup system uses the standard rsync program to synchronize your
data to the backup disk.
Much like the automounted /n directories,
/n/backup provides the user with a transparent
interface to the backed up data on this multiple-RAID system. Disks are
mounted read-only, so you cannot delete files from the backup server, but permissions
are the same, so nobody can see files that you don't want them to see.
Current April 2020 layout: (a total of 121TB)
/dev/md126 xfs 28T 16T 12T 59% /backup1
/dev/sdb xfs 37T 29T 7.6T 80% /backup2
/dev/sda xfs 26T 16T 9.7T 63% /backup3
/dev/sdb xfs 20T 11T 8.6T 56% /backup4
Additional User Tips
Our disk/tape backup system is not 100% foolproof (but we try) and here are some tips
on improving your personal data even more.
Probably the most important lacking feature is
preserving every version of every file you create (though this is for a good reason).
However, for a small number of files (typically the ones you edit manually) you
may have a good reason to want to have this capability.
Some technical background
The raid servers are named diary1, diary2, and diary3. On each of these a raid disk
is mounted as /backup1 and /backup2 resp., within which a backup directory
houses all disks in the following hierarchy:
diaryN:/backupN/backup/host/disk
where N=1,2,....
Some other administrative directories are present as well (e.g. /backupN/attick).
Currently diary3 also has /backup4. This obviously
complicates the schema when diary4 will be
added to the system.
Currently the backup program is written in python, and is
controlled by a simple ascii table with two active columns
host:/disk backup-id
e.g.
chara:/chara4 1
hammer:/archive 0
sol:/sol -4
meaning that chara:/chara4 is physically backed up on diary1.astro.umd.edu
(i.e. in diary1:/backup1/backup/chara/chara4). A 0 in the 2nd column will
skip that specific disk. A negative number means the backup system also skips
it, but something else Kevin is doing is important.
In addition, any diskname that ends in "nb"
(e.g. /chara//chara6nb) will not be backed up to the diary system. Treat
them like scratch disks.
Logfiles of the nightly backups are currently stored in textual format
here.
Caveats and future improvements
- since backups are done in finite time, there is no true snapshot status.
They generally start at 1am and depending on the daily activity could take a few hours.
- diary2 is currently running an older version of linux
- directories such as earth:/var/spool/mail wind up on /backup2/backup/earth/mail
- incremental backups are done nightly at 1am, and sunday 1am the backups are mirrored
and you loose all backups.
- this sawtooth pattern can be flattened with a program like
dirvish, which use rsync and unix hard links to write
true snapshots, at
the potential cost of a large number of inodes and diskspace.
This is being looked into now if it will work for us.
- Other dirvish-like options: rsnapshot,
flyback.
- implement a Time Machine like option (likening the one in MacOS 10.5), e.g.
FlyBack
- use one tape for daily incremental tape backups, the other for regular full backups?
- use a lockfile (and useful history file) in the root directory (parallel to
the lost+found) of each disk.
- a global "locate" command that - after nightly collection of all the updatedb's, can loop
over them. I have "blocate" at home.
Links
- first (mostly LMA) raid specs from 2005, about 2 TB. (8x300G)
- second raid about 10TB (24x500G), with specs from various
companies we approached. We choose Western Scientific.
- sample dirvish doc
- rsnapshot, another alternative?
- trueblade, hosting dirvish??
History
- 2004: our current 10Mbit network can't handle the single tape backup system
- 2005: implemented diary1 (8x300GB) to play with. Only LMA machines backed up
- 2006: ordered diary2 (24 x 500GB)
- 2007: upgraded diary1's 8 300GB drives to 1TB drives
- june 2009: upgraded diary2's 24 500GB drives to 1.5GB
- jan 2013: new diary3, 8 3TB drives in raid5 gives 21 TB. (8 slots still empty)
- .. 2014:
- 2019: 2nd raid (/backup4) added to diary3, disks upgraded
- 2020: moved to (private) repo on github, complete overhaul and using backup.astronet now
Last mod:
28-Apr-2020
by PJT