BORG
Administrator: Derek C. Richardson
Co-Administrator: Randall Perrine
- 04/11/08: A new version of OpenSSH has been built (which includes ssh, ssh-keygen, etc.). To use it, put "/local/bin" ahead of "/usr/bin" in your search path.
- 03/02/07: The CTC's latest purchase: 40 nodes on UMd's deepthought!
- 03/02/07: Info on using ssh without passwords on borg added.
- 03/02/07: Sample .tcshrc file updated.
- 08/08/06: The borg room environment monitor is up and running!
- 04/11/06: Fancy new usage statistics are now available!
- 09/13/05: Added some new notes and functionality. Note borg53 is currently unavailable.
- 11/11/05: A new publication added.
- 09/12/05: Cool schematic added.
- 09/01/05: New cluster usage stats added.
- 09/01/05: The latest upgrade is almost finished.
- 06/11/05: Publications list updated.
- 11/01/04: borg31 now has a working gigabit card.
- 09/22/04: Gigabit networking enabled.
- 07/29/04: A new publication added.
- 07/14/04: A new section on publications has been added.
- 06/28/04: We are in the process of installing PBS Pro as an alternative job management system. Stay tuned.
- 04/15/04: This web page has been substantially redesigned. Comments welcome!
The "borg" is a "Beowulf"
cluster of PCs for use in computational research in the Department of Astronomy at the University of Maryland at College Park.
Funding for the cluster so far has come from my startup package (as a
new professor), Chris
Reynold's startup package, and the new Center for Theory and
Computation.

- Server (queen, borg0): Dell PowerEdge 2400 (2 x 850MHz PIII, 1GB RAM, 36GB internal 10k RPM SCSI (RAID0), 2 Gb NICs, Mylex ExtremeRAID controller)
- Data analysis node (locutus): dual 1U AMD Opteron 250 (2.4GHz, 2GB RAM, 1.5TB RAID, 1Gb NIC)
- Nodes (first generation, borg1-borg24): AMD Athlon (1GHz Thunderbird, 512MB RAM, 100Mb netboot card)
- Nodes (second generation, borg25-borg40): dual 1U AMD Athlon MP 2600+ (2.1GHz, 1GB RAM, 1Gb NIC, 100 Mb NIC)
- Nodes (third generation, borg41-borg53): dual 1U AMD Opteron 250 (2.4GHz, 1GB RAM, 2 x 1Gb NIC)
- 100-Mbit switch: 40-port HP ProCurve with Gb backplane
- Gbit switch: HP ProCurve 4140gl (40 RJ-45 10/100/1000 ports (2xJ4908A); 4 mini-GBIC slots; 2 open module slots)
- Storage: 9x72GB external disk array in RAID configuration on queen, 1.5TB on data analysis node
- Backup: AIT-2 autoloader (400GB capacity, managed by amanda)
- OS: RedHat Linux 9
- Compilers: Gnu (gcc, g77 3.2.2); Portland Group (pgcc, pgf77 5.1-3); Intel (icc, ifc/ifort 7.0/8.0)
- Job control: condor (version 6.4.7)
- Parallel libraries: LAM/MPI (version 7.0), MPICH
Server purchased from Dell.
Nodes 1-24 purchased from Mid-Atlantic Data Systems.
Nodes 25-53 purchased from Appro.
Installation by Formix and locally.
Networking is 10/100/1000 Mbit.
Text-based Borg CPU usage stats, updated hourly:
Borg disk usage stats, updated daily.
Getting Started
- Generally only department members are entitled to an account. See me in person to get set up.
- The default shell is bash. Use chsh to change it.
- "Home" files go in /home/$user/ (or ~/). Data files go in ~/scr/ (a soft link set up at the time your account is created and pointing to a directory common to your work group).
- Your umask is set to 002 by default and normally should not be changed (so new files you create have read and write privileges for you and your group, with read-only privileges for everyone else).
- The work group directories are mode 2775 for file sharing. It is recommended your home directory be mode 2700 for privacy.
- Typical users will want to add /net/condor/bin and /net/lam/bin to their search path.
- For the Intel compilers, source either /net/intel/compiler70/ia32/bin/iccvars.csh or ifcvars.csh (version 7.0),
or source /astromake/astromake_start then type astroload intel (version 8.0; note the FORTRAN compiler is ifort in this version, not ifc).
- Use scp for copying data to/from borg (insecure transfer modes are unavailable).
- Cluster status is recorded in the following files:
- ~dcr/tmp/borgdu.out -- disk usage (updated daily, same as this).
- ~dcr/tmp/borgtop.out -- CPU usage (updated hourly, same as this).
- You can also check your personal usage (hourly, daily, etc.) by running this script:
- ~dcr/bin/scripts/usage -- parsed from borgtop.out every hour.
- All-time usage stats are recorded in this file:
- ~dcr/tmp/CPU/.all/.summary (same as this).
Submitting Jobs
- NEVER RUN JOBS ON THE QUEEN! (borg0, or simply borg) -- doing so will slow everyone down.
- Single jobs should be submitted from the Queen using condor.
- Currently parallel jobs can only be submitted manually by logging in to a borg node.
- At present, only borg25-borg53 are available for parallel runs.
- You must sign up for parallel time by editing /etc/motd.
- Large jobs (many nodes or long time) should be cleared with an administrator first.
- Nodes borg1 through borg24 run at 100 Mbit only.
- You can abbreviate these nodes as b1 through b24.
- Nodes borg0 and borg25 through borg40 support both 100 Mbit and 1 Gbit.
- To use 100 Mbit, refer to the nodes by their full names (or b0, b25-b40).
- To use 1 Gbit, refer to the nodes by bg0, bg25-bg40.
- In other words, when specifying a list of second-generation nodes to run in parallel, use the "bg" naming convention to have the communication run on the gigabit network where available, or use the "b" or "borg" naming convention for 100 Mbit.
- Nodes borg41 through borg53 run at 1 Gbit only.
- You can abbreviate these nodes as either b41 through b53 or bg41 through bg53.
- Note NFS between the Queen and the second- and third-generation nodes is also done at 1 Gbit.
- Second-generation nodes boot over the 100 Mbit network and then switch to 1 Gbit.
- Generally you will not see a factor 10 improvement in performance over the gigabit network when using second-generation nodes.
- The second-generation nodes only have 32-bit buses, limiting throughput. (The third-generation nodes only have a 32-bit kernel at the moment; 64-bit is on the way.)
- Your code may not be bandwidth limited even at 100 Mbit (this applies to third-generation nodes as well).
- Here's how to use ssh without passwords to log into borg nodes (borrowed from here):
- On borg0 (the Queen), type "ssh-keygen -t rsa" (without the quotes). Choose the defaults, and use an empty passphrase.
- Now type "cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys2". That's it!
- Sample .tcshrc file.
- The condor primer.
- Sample files from John Vernaleo for use with compiling their parallel version of ZEUS (ZEUS-MP, publically available):
- PBS Pro 5.4 manuals (password protected PDF files): Quick Start, Admin Guide, User's Guide, Ext. Ref. Specs.
Here is a list of refereed publications that have acknowledged use of borg, in reverse chronological order:
-
Orbital resonances in the inner Neptunian system: II. the resonance history of Proteus, Larissa, Galatea, and Despina
K. Zhang, D. P. Hamilton, Icarus, 2007, in press.
-
Energetic impact of jet inflated cocoons in relaxed galaxy clusters
J. C. Vernaleo, C. S. Reynolds, ApJ, 2007 [preprint]
-
A steady-state model of NEA binaries formed by tidal disruption of gravitational aggregates
K. J. Walsh, D. C. Richardson, Icarus, 2007 [submitted version]
-
Orbital resonances in the inner Neptunian system: I. the 2:1 Proteus-Larissa mean-motion resonance
K. Zhang, D. P. Hamilton, Icarus 188:386, 2007
-
AGN feedback and cooling flows: problems with simple hydrodynamical models
J. C. Vernaleo, C. S. Reynolds, ApJ 645:83, 2006
-
The effect of the Coriolis force on Kelvin-Helmholtz-driven mixing in protoplanetary disks
G. C. Gomez, E. C. Ostriker, ApJ, in press, 2005 [preprint]
-
Binary near-Earth asteroid formation: rubble pile model of tidal disruptions
K. J. Walsh, D. C. Richardson, Icarus 180:201, 2006 [reprint]
-
A fast method for finding bound systems in numerical simulations: results from the formation of asteroid binaries
Z. M. Leinhardt, D. C. Richardson, Icarus 176:432, 2005 [reprint]
-
Saturated-state turbulence and structure from thermal and magnetorotational instability in the ISM: three-dimensional numerical simulations
R. A. Piontek, E. C. Ostriker, ApJ 629:849, 2005
-
Planetesimals to protoplanets. I. Effect of fragmentation on terrestrial planet formation
Z. M. Leinhardt, D. C. Richardson, ApJ, ApJ 625:427, 2005 [reprint]
-
Numerical experiments with rubble piles: equilibrium shapes and spins
D. C. Richardson, P. Elankumaran, R. E. Sanderson, Icarus 173:349, 2005 [reprint]
-
Buoyant radio-lobes in a viscous intracluster medium
C. S. Reynolds, B. McKernan, A. C. Fabian, J. M. Stone, J. C. Vernaleo, MNRAS 357:242, 2005 [reprint]
-
Constraints on compact star parameters from burst oscillation light curves of the accreting millisecond pulsar XTE J1814-338
S. Bhattacharyya, T. E. Strohmayer, M. C. Miller, C. B. Markwardt, ApJ 619:483, 2005 [reprint]
-
Growth of intermediate-mass black holes in globular clusters
K. Gultekin, M. C. Miller, D. P. Hamilton, ApJ 616:221, 2004 [reprint]
-
N-body simulations of planetesimal evolution: effect of varying impactor mass ratio
Z. M. Leinhardt, D. C. Richardson, Icarus 159:306, 2002 [reprint]
VAMPIRE
Graduate students in our department have linked together some of the newer department-side machines into a cluster called VAMPIRE (Very Awesome Multi-Processor Interconnected Research Environment). Check it out!
What's in a Name?
The name "borg" for the cluster was a suggestion by my graduate student, Zoë Leinhardt, and was inspired by the Borg of Star Trek fame. Borg ships are characterized by their completely utilitarian form and highly redundant construction, so that operation is still possible even if many ship systems (i.e., nodes in our case) are offline. The terms Star Trek and Borg are trademarks of Paramount Pictures and are used here without permission. Hope they don't mind...
|
Last modified: Apr 11, 2008
|
|