VAMPIRE
VAMPIRE (Very Awesome Multi-Processor Interconnected Research Environment), is a serial, distributed computing network
created and run by
Kayhan Gultekin,
Zoe Leinhardt, and
Kevin Walsh.
It takes
Astronomy department PCs that are otherwise idle at night and turns them into
the third most powerful (albeit serial) cluster in the department.
Proto-VAMPIRE is now running. There are currently 43 or so processors managed by Condor on the network. This web page
will eventually contain information on the status of VAMPIRE (!ON or !OFF) and
the jobs that are running on it. Maybe Dave Rupke will teach me enough perl
so that I can have a webpage that reflects what is going on in real time!
WHAT IS VAMPIRE?
Very Awesome Multi-Processor Interconnected Research Environment
VAMPIRE is a series of computers running the Condor software package. Condor enables jobs to be run on these computers when they would otherwise be idle.
Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queueing mechanism, scheduling
policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and
where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion.
--Condor webpage
Unused Cycles in the Night
Most machines are idle at night and weekends.
If you can see this bullet point, Kevin never made the plot.
Condor monitors machines for periods of inactivity and takes puts them to use.
High-speed Processors in the Night
14 processors unused for about 14 hours every day.
Equivalent of 8-machine dedicated cluster.
Constantly getting bigger.
Will be on every public linux box.
Private linux boxes of very generous owners.
How it Draws Blood
Advertising Daemons.
Condor sees what machines are available and runs jobs on them.
When someone logs back into the machine (or unidles it), Condor removes the job from that particular machine and finds another home for it.
The job is not simply niced.
Why Should you join the Army of the Night?
Use Condor if you want more computing time for CPU-intensive jobs. Who doesn't want more computing time?
Numerical Simulations
Autmoated Data Analysis/Reduction
Donate your private machine because
Essentially no impact from your end.
We will bump up your priority if you do.
Derek, Cole, Chris, MarkW, and Glen have already allowed their machines for use!
Good Blood and Bad Blood for VAMPIRE
Good Blood
Embarrassingly Parallel jobs (E.g., 100 different Monte Carlo runs.)
Several very long runs with frequent checkpoints. (CPU Hogs)
A large Array of shortish runs. (Short runs / Lots of parameter space.)
A random-number generated novel. (The greatest novel ever written.)
Bad Blood
User input required.
Infrequent outputs/checkpoints.
Memory-intensive jobs.
True Parallel Jobs. (For Now)
Java. (For Now)
Netscape.
HOW DO I SET UP A JOB?
Config Files
Config files are used to tell Condor what to run, where to run it, how to run it, and on what machine to run it. Let's look at a sample config file:
Executable = /home/kayhan/condor/run_my_job
Universe = vanilla
Log = condor.log
Input = input.dat
Output = output.dat
Error = errormessages.txt
Notification = Error
Requirements = Memory >= 512
Rank = (machine == "crater.astro.umd.edu") || (machine == "grus.astro.umd.edu")
Initialdir = /home/kayhan/condor/A/
Queue
Initialdir = /home/kayhan/condor/B/
Queue

Let's take a look at each of those lines in turn.
Executable = /home/kayhan/run_my_job
This line tells condor what executable to run. In our invocation of condor, others must have execute permissions.
Universe = vanilla
This line tells condor in what "universe" to run. There are two main universes: vanilla and standard. In vanilla universe, when an job is suspended, it simply stops running. When it resumes, the executable is run again. It is up to the process to know if it is being re-called and how to handle that. In standard universe, when a job is moved off of a machine, memory is written to disk and read in again when resumed. This may take prohibitively long for some jobs, e.g., ~30min for an N=105 body simulation. For more information on universes in condor, see their webpage.
Log = condor.log
This is the file that logs important information for statistics and status of your jobs.
Input = input.dat
Output = output.dat
Error = errormessages.txt
These are the files to use for stdin, stdout, and stderr.
Notification = Error
This line tells condor only to email me if there is an error.
Requirements = Memory >= 512
Only run on machines that have at least 512 MB of physical memory. There are many different items that can be used as a requirement.
Rank = (machine == "crater.astro.umd.edu") || (machine == "grus.astro.umd.edu")
Prefer to run on machines that match the above criteria if they are available.
To get an idea of some of the criteria that can be used for Requirements and Rank, look here.
Initialdir = /home/kayhan/condor/A/
Queue
Initialdir = /home/kayhan/condor/B/
Queue
This is how to submit the job to run twice in two different directories. This is important if the output of run_my_job is always a file with the name myjob.out. If the output file is named by its initial conditions, e.g., then the following could be used instead:
Initialdir = /home/kayhan/condor/onlydir/
Queue 2

Wrappers
Why do we need a wrapper?
Condor may start and stop your job several times.
Your executable file must "know"
if it is being started for the first time.
if it is being restarted after changing machines.
General Algorithm
Am I starting new or restarting?
If new, make a note of it and start job.
If restarting, look at most recent checkpoint.
While running job, make checkpoints.
Examples
HNBody
zeus
pkdgrav
HOW DO I KEEP TRACK OF MY JOBS?
Useful Commands
Condor View
A java applet that displays current and recent usage of VAMPIRE machines and VAMPIRE jobs.
View by machine.
View by user.
Live Feed?
IS THIS REALLY WORKING?
Yes.