Spectral Line Workflow¶
The current style data reduction workflow requires you to have your RAW files locally present, in the $DATA_LMT tree. To reduce data you will need to know a set of ObsNum ‘s , which can be obtained by querying a database.
To run the pipeline, you would process each obsnum individually, inspect the pipeline products in the $WORK_LMT tree and decide if the data is good enough, or needs some extra flagging/masking. These are usually defined in a small set of ascii files. After this the pipeline can be re-run. In the case an observation was split in a series of obsnums, you can combine them with the same script, viz.
SLpipeline.sh obsnum=12345
SLpipeline.sh obsnum=23456
SLpipeline.sh obsnums=12345,23456
SLpipeline.sh obsnums=my.obsnums
The resulting data can then be found in $WORK_LMT/$ProjectId/$obsnum, or in the case of a combination,
$ProjectId/${on0}_${on1}. Inside there will be a README.html summarizing the
Timely Analysis Products (TAP). You can also list the obsnums, line by line, in a text file my.obsnums
.
Each instrument (e.g. RSR, SEQ) have their own pipeline script. For some instruments (e.g. SEQ) a different script exists for a different observing mode (OTF Mapping, Beam Switching, Position Switching)
RSR pipeline¶
For the RSR pipeline there are two scripts:
rsr_pipeline.sh
and rsr_combine.sh
. These in turn use
two different scripts, rsr_driver.sh
and rsr_sum.sh
, reducing the raw data directly into an ASCII spectrum,
there is no SpecFile created as there is for WARES based data. These two scripts have slightly different
ways to mask bad data, but should otherwise produce a very similar final spectrum.
rsr_driver
uses the RFILE (a simple obsnum,chassis,board tuple to remove from the data) and--exclude
option to remove sections in frequency space to be removed from the baseline fitting.rsr_sum
uses the BLANKING (a more detailed format to exclode certain chassis and board sections from inclusion). a separate windows[] list is used to designate sections in frequency space for baselining.
The parameter files are:
lmtoy.${obsnum}.rc
- general LMTOY parameter filersr.${obsnum}.badlags
- triples of chassis,board,chan,obsnum,metric for badlag flaggingrsr.${obsnum}.blanking
- triples of obsnum,chassic,board/freq ranges - for rsr_driver.pyrsr.${obsnum}.rfile
- triples of obsnum,chassis,board - for rsr_sum.py
RSR Parameters¶
Currently (to be) accepted by SLpipeline.sh when using it for RSR, ignored when you happen to use them otherwise.
badboard= a comma separated list, 0 being the first. this is where board=4 is the highest freq board
badcb= a comma separate list of chassis/board combinations, e.g. badcb=0/1,0/5,3/5 is a common one
vlsr= not implemented yet
blo= baseline order fit. hardcoded at 0 at the moment, so not implemented yet.
The badboard and badcb are normally not used, and during the badlags scanning, a set of badcb combinations are identified if their RMS_diff is outside the 0.01-0.1 window. These are currently heuristically determined, and after inspection the user can still edit them.
Common ones for SLpipeline:
obsnum=0 (or obsnums=0)
debug=0
pdir=””
WARES workflow¶
For WARES based instrument, such as Sequoia (SEQ)
the scripts seq_reduce.sh
and seq_combine.sh
will reduce single and combined OBSNUM’s. They
always work through an intermediate Specfile (equivalent to our future SDFITS file), which gets then gridded
into a FITS cube.
The parameter files are:
lmtoy_${obsnum}.rc
- general LMTOY parameter file
Many more details of the old workflow is in examples/lmtoy_reduce.md
Getting at the RAW data¶
Here are some examples to get raw (netCDF) LMT data and reduce them in your own local workspace. This assumes the LMTOY toolkit has been installed:
Find out which obsnum covers your observation(s). For example here we show all the data observed on a given date, and the second example shows all NGC5194 data in 2020.
% lmtinfo.py grep 2021-12-17
% lmtinfo.py grep 2020 NGC5194
% lmtinfo.py grep 2021-S1-MX-3
Depending on the observing procedure, you may also need to know the calibration ObsNum’s as well, referred to as the CalObsNum, and usually you do. There are some existing databases and logfiles where a simple grep command will probably be sufficient to get the obsnums. The lmtinfo.py data command can also be used. If you have the Scans.csv database, this might be faster. If you see a log file in the $DATA_LMT directory, this might be another place were a record of all ObsNum’s exists. As they say, YMMV.
Note
We need a unified procedure for finding your obsnum - this is all rather haphasard
Get the data from a full $DATA_LMT archive (usually “unity”) via the rsync_lma script. Obviously only somebody on that archive machine can do this, but this is the easiest way. Here is an example of several methods:
# gather the LMT data on the archive machine
cln% lmtar /tmp/irc.tar 79447 79448
# pick one of these two to copy, and don't forget to remove your large /tmp files!
cln% scp irc.tar lma:/tmp
lma% scp cln:/tmp/irc.tar /tmp
# reduce the data on your favorite workstation
lma% tar -C $DATA_LMT -xf /tmp/irc.tar
lma% SLpipeline.sh obsnum=79448
Processing SEQ in 2018S1SEQUOIACommissioning/79448 for IRC+10216
# view the pipeline results
lma% xdg-open 2018S1SEQUOIACommissioning/79448/
This opens a directory using your favorite file browser, you can inspect figures, and there will be two ADMIT directories, each with an index.html that can be inspected the ADMIT way (or any other way).
An alternative would be a direct rsync conection between e.g. cln and lma:
cln% cd $DATA_LMT
cln% rsync -avR `lmtar.py 79447 79448` lma:/lma1/lmt/data_lmt
for which we have a script, which also works from any directory:
cln% rsync_lma 79448
note that this script only needs the main (Map) obsnum, the calibration (Cal) is automatically included.
To re-run: edit settings in
2018S1SEQUOIACommissioning/79448/lmtoy_79448.rc
,and re-run:
lma% SLpipeline.sh obsnum=79448
Re-Processing SEQ in 2018S1SEQUOIACommissioning/79448 for IRC+10216
Parallel Processing: parallel¶
Although the SLpipeline consists of single processor code, this is sufficient for a single ObsNum. However, to stack a large number of ObsNum’s it can be useful to run run a whole data-reduction session using GNU parallel, since the pipelines are independent. Here is an example: first the serial code for the M31 project, where 3 different correlators settings cover three spectral lines:
# CO
SLpipeline.sh obsnum=85776
SLpipeline.sh obsnum=85778
SLpipeline.sh obsnum=85824
SLpipeline.sh obsnums=85776,85778,85824
# XXX
SLpipeline.sh obsnum=85818
SLpipeline.sh obsnum=85826
SLpipeline.sh obsnum=85882
SLpipeline.sh obsnums=85818,85826,85882
# YYY
SLpipeline.sh obsnum=85820
SLpipeline.sh obsnum=85878
SLpipeline.sh obsnums=85820,85878
This took about 29 minutes to reduce. Now we can split this up by first running all eight single obsnum’s in parallel, followed by the three combinations in parallel, viz.
# construct the single obsnum pipelines job
echo SLpipeline.sh obsnum=85776 > job1
echo SLpipeline.sh obsnum=85778 >> job1
echo SLpipeline.sh obsnum=85824 >> job1
echo SLpipeline.sh obsnum=85818 >> job1
echo SLpipeline.sh obsnum=85826 >> job1
echo SLpipeline.sh obsnum=85882 >> job1
echo SLpipeline.sh obsnum=85820 >> job1
echo SLpipeline.sh obsnum=85878 >> job1
# construct the combination pipelines job
echo SLpipeline.sh obsnums=85776,85778,85824 > job2
echo SLpipeline.sh obsnums=85818,85826,85882 >> job2
echo SLpipeline.sh obsnums=85820,85878 >> job2
# ensure you have enough true cores and memory these can be run in two steps:
parallel --jobs 8 < job1
parallel --jobs 3 < job2
Using this technique, the same process took 6 minutes on a 512GB machine with 32 true cores, a speedup of almost a factor 5.
Parallel Processing: slurm¶
To run on a slurm based cluster, we have written a simple frontend where you can almost copy commands from the previous section, except prefix it with sbatch_lmtoy.sh, e.g.
# construct the single obsnum jobs
sbatch_lmtoy.sh SLpipeline.sh obsnum=85776
sbatch_lmtoy.sh SLpipeline.sh obsnum=85778
sbatch_lmtoy.sh SLpipeline.sh obsnum=85824
sbatch_lmtoy.sh SLpipeline.sh obsnum=85818
sbatch_lmtoy.sh SLpipeline.sh obsnum=85826
sbatch_lmtoy.sh SLpipeline.sh obsnum=85882
sbatch_lmtoy.sh SLpipeline.sh obsnum=85820
sbatch_lmtoy.sh SLpipeline.sh obsnum=85878
now you have to wait until all of these are finished before the 2nd batch will do the combinations
# construct the combination jobs
sbatch_lmtoy.sh SLpipeline.sh obsnums=85776,85778,85824
sbatch_lmtoy.sh SLpipeline.sh obsnums=85818,85826,85882
sbatch_lmtoy.sh SLpipeline.sh obsnums=85820,85878
Another option is to place these commands in a text file, exactly like was done for GNU parallel, and submit these
sbatch_lmtoy.sh job1
# watch and wait until job1 is done
squeue -u lmtslr_umass_edu
# when done, submit the next one
sbatch_lmtoy.sh job2
Interactive work is discourged, but sometimes unavoidable. Here is the recommended command:
srun -n 1 -c 4 --mem=16G -p toltec-cpu --x11 --pty bash
adjust memory an cores as needed.
Web server¶
The PI will need a password to acccess their ProjectId. It will be at something like
http://taps.lmtgtm.org/lmtslr/2021-S1-MX-3/
within which various obsnum’s will be visible, and possibly different sources and/or combinations of obsnums,
85776/ # individual obsnum pipeline reduced
85778/
85824/
85776_85824/ # combining the 3 previous obsnums
85776_TAP.tar # TAP tar files for better (?) offline browsing
85778_TAP.tar
85824_TAP.tar
85776_SRDP.tar # full SRDP tar files for better (?) offline browsing
85778_SRDP.tar
85824_SRDP.tar
85776_85824_SRDP.tar
85776_RAW.tar # full RAW telescope data for your local $DATA_LMT tree
85778_RAW.tar # only useful if you want to re-run the pipeline
85824_RAW.tar # and only made available upon special request
Future LMT SLR data reduction?¶
Here we describe the workflow in the future unified SDFITS based system. The first step is always the RAW (lmtslr or dreampy3) based conversion (ingestion) to SDFITS. If you are in an interactive python session, the data will be in memory in a special class, there should be no formal reason to save the SDFITS file (formerly called the SpecFile in lmtslr), but one is well adviced to do this.
Load and Go¶
The initial workflow is load-and-go based. A number of parameters are set, a series of plots can be reviewed, including having access to the final Science Ready Data Product (SRDP). User can set new parameters and try again.
An interface should exist (via dasha?) that summarizes the plots the user wants to see on screen. Vertically are the various plots the pipeline produces, horizontally are the different attempts to run the pipeline. For each pipeline run, user can download the data.
The pipeline will look a little different depending if the observation was a grid (e.g. OTF) a single pointing (e.g. SEQ-Ps or RSR). The former produces a data cube, the latter a single spectrum.
The user should not need to see that behind the scenes our data[ntime,nbeam,npol,nband,nchan]
type of data, but occasionally this will show up in reminders how to average down the data where
this could result in a higher Signal/Noise.
Gridding¶
For a typical OTF grid individual spectra cannot be inspected, especially with a 10Hz integration time there could be over half a million spectra! A waterfall image will give a useful overview: for each beam a time-frequency plot will easily reveal patterns, bad spectra, birdies, etc. A masking file will need to be used to mask out areas in the masking cube.
It will also be useful to inspect the RMS (RMS value of a baseline fit per beam) as function of time along the OTF track, either plotted as an image (in XPOS,YPOS space), or a stacked scatter plot with RMS and TIME as variables.
Stacking¶
For a single pointing it will become important to inspect individual spectra. For example, for RSR with each typical 30 second integration time, there are 24 spectra (4 spectra if you would combine the 6 bands in the full RSR spectral range).
Masking¶
A unified masking file format is being designed. Details are still being drafted in docs/masking.md, but here is a flavor of what is being considered:
time(12:05:10,12:30:05),chan(100,103)
beam(5,7),pol(XX)
select(TSYS, 250.0)
select(RMS, 3.0)
select(XPOS, 40.0, 50.0), select(YPOS, -30.0, -20.0)
beam(1),pol(0),band(3),chan(71,71.5,GHz)
user(rsr1, 1.0, 0.01)
Workflow¶
UMass Server has the data, a web interface will run the new-style pipeline. Data can be inspected. New parameters can be set, and re-imaged. This is being worked on (Jan 2023)
The TolTECA data reduction workflow has a high level config file (yaml?) which via a command line interface steers the pipeline.