Spectral Line Workflow ====================== The current style data reduction workflow requires you to have your RAW files locally present, in the $DATA_LMT tree. To reduce data you will need to know a set of **ObsNum** 's , which can be obtained by querying a database. To run the pipeline, you would process each obsnum individually, inspect the pipeline products in the $WORK_LMT tree and decide if the data is good enough, or needs some extra flagging/masking. These are usually defined in a small set of ascii files. After this the pipeline can be re-run. In the case an observation was split in a series of obsnums, you can combine them with the same script, viz. .. code-block:: SLpipeline.sh obsnum=12345 SLpipeline.sh obsnum=23456 SLpipeline.sh obsnums=12345,23456 SLpipeline.sh obsnums=my.obsnums The resulting data can then be found in $WORK_LMT/$ProjectId/$obsnum, or in the case of a combination, $ProjectId/${on0}_${on1}. Inside there will be a README.html summarizing the Timely Analysis Products (TAP). You can also list the obsnums, line by line, in a text file ``my.obsnums``. Each instrument (e.g. RSR, SEQ) have their own pipeline script. For some instruments (e.g. SEQ) a different script exists for a different observing mode (OTF Mapping, Beam Switching, Position Switching) RSR pipeline ------------ For the RSR pipeline there are two scripts: ``rsr_pipeline.sh`` and ``rsr_combine.sh``. These in turn use two different scripts, ``rsr_driver.sh`` and ``rsr_sum.sh``, reducing the raw data directly into an ASCII spectrum, there is no SpecFile created as there is for WARES based data. These two scripts have slightly different ways to mask bad data, but should otherwise produce a very similar final spectrum. 1. ``rsr_driver`` uses the RFILE (a simple obsnum,chassis,board tuple to remove from the data) and ``--exclude`` option to remove sections in frequency space to be removed from the baseline fitting. 2. ``rsr_sum`` uses the BLANKING (a more detailed format to exclode certain chassis and board sections from inclusion). a separate **windows[]** list is used to designate sections in frequency space for baselining. The parameter files are: 1. ``lmtoy.${obsnum}.rc`` - general LMTOY parameter file 2. ``rsr.${obsnum}.badlags`` - triples of chassis,board,chan,obsnum,metric for badlag flagging 3. ``rsr.${obsnum}.blanking`` - triples of obsnum,chassic,board/freq ranges - for rsr_driver.py 4. ``rsr.${obsnum}.rfile`` - triples of obsnum,chassis,board - for rsr_sum.py RSR Parameters ~~~~~~~~~~~~~~ Currently (to be) accepted by SLpipeline.sh when using it for RSR, ignored when you happen to use them otherwise. 1. badboard= a comma separated list, 0 being the first. this is where board=4 is the highest freq board 2. badcb= a comma separate list of *chassis/board* combinations, e.g. badcb=0/1,0/5,3/5 is a common one 3. vlsr= not implemented yet 4. blo= baseline order fit. hardcoded at 0 at the moment, so not implemented yet. The **badboard** and **badcb** are normally not used, and during the **badlags** scanning, a set of badcb combinations are identified if their RMS_diff is outside the 0.01-0.1 window. These are currently heuristically determined, and after inspection the user can still edit them. Common ones for SLpipeline: 1. obsnum=0 (or obsnums=0) 2. debug=0 3. pdir="" WARES workflow -------------- For WARES based instrument, such as Sequoia (SEQ) the scripts ``seq_reduce.sh`` and ``seq_combine.sh`` will reduce single and combined OBSNUM's. They always work through an intermediate Specfile (equivalent to our future SDFITS file), which gets then gridded into a FITS cube. The parameter files are: 1. ``lmtoy_${obsnum}.rc`` - general LMTOY parameter file Many more details of the old workflow is in ``examples/lmtoy_reduce.md`` Getting at the RAW data ----------------------- Here are some examples to get raw (netCDF) LMT data and reduce them in your own local workspace. This assumes the LMTOY toolkit has been installed: 1. Find out which **obsnum** covers your observation(s). For example here we show all the data observed on a given date, and the second example shows all NGC5194 data in 2020. .. code-block:: % lmtinfo.py grep 2021-12-17 % lmtinfo.py grep 2020 NGC5194 % lmtinfo.py grep 2021-S1-MX-3 .. Depending on the observing procedure, you may also need to know the calibration ObsNum's as well, referred to as the **CalObsNum**, and usually you do. There are some existing databases and logfiles where a simple **grep** command will probably be sufficient to get the obsnums. The **lmtinfo.py data** command can also be used. If you have the **Scans.csv** database, this might be faster. If you see a log file in the $DATA_LMT directory, this might be another place were a record of all ObsNum's exists. As they say, YMMV. .. note:: We need a unified procedure for finding your obsnum - this is all rather haphasard 2. Get the data from a full $DATA_LMT archive (usually "unity") via the **rsync_lma** script. Obviously only somebody on that archive machine can do this, but this is the easiest way. Here is an example of several methods: .. code-block:: # gather the LMT data on the archive machine cln% lmtar /tmp/irc.tar 79447 79448 # pick one of these two to copy, and don't forget to remove your large /tmp files! cln% scp irc.tar lma:/tmp lma% scp cln:/tmp/irc.tar /tmp # reduce the data on your favorite workstation lma% tar -C $DATA_LMT -xf /tmp/irc.tar lma% SLpipeline.sh obsnum=79448 Processing SEQ in 2018S1SEQUOIACommissioning/79448 for IRC+10216 # view the pipeline results lma% xdg-open 2018S1SEQUOIACommissioning/79448/ This opens a directory using your favorite file browser, you can inspect figures, and there will be two ADMIT directories, each with an **index.html** that can be inspected the ADMIT way (or any other way). An alternative would be a direct rsync conection between e.g. cln and lma: .. code-block:: cln% cd $DATA_LMT cln% rsync -avR `lmtar.py 79447 79448` lma:/lma1/lmt/data_lmt for which we have a script, which also works from any directory: .. code-block:: cln% rsync_lma 79448 note that this script only needs the main (Map) obsnum, the calibration (Cal) is automatically included. 3. To re-run: edit settings in ``2018S1SEQUOIACommissioning/79448/lmtoy_79448.rc`` ,and re-run: .. code-block:: lma% SLpipeline.sh obsnum=79448 Re-Processing SEQ in 2018S1SEQUOIACommissioning/79448 for IRC+10216 Parallel Processing: parallel ----------------------------- Although the SLpipeline consists of single processor code, this is sufficient for a single ObsNum. However, to stack a large number of ObsNum's it can be useful to run run a whole data-reduction session using GNU parallel, since the pipelines are independent. Here is an example: first the serial code for the M31 project, where 3 different correlators settings cover three spectral lines: .. code-block:: # CO SLpipeline.sh obsnum=85776 SLpipeline.sh obsnum=85778 SLpipeline.sh obsnum=85824 SLpipeline.sh obsnums=85776,85778,85824 # XXX SLpipeline.sh obsnum=85818 SLpipeline.sh obsnum=85826 SLpipeline.sh obsnum=85882 SLpipeline.sh obsnums=85818,85826,85882 # YYY SLpipeline.sh obsnum=85820 SLpipeline.sh obsnum=85878 SLpipeline.sh obsnums=85820,85878 This took about 29 minutes to reduce. Now we can split this up by first running all eight single obsnum's in parallel, followed by the three combinations in parallel, viz. .. code-block:: # construct the single obsnum pipelines job echo SLpipeline.sh obsnum=85776 > job1 echo SLpipeline.sh obsnum=85778 >> job1 echo SLpipeline.sh obsnum=85824 >> job1 echo SLpipeline.sh obsnum=85818 >> job1 echo SLpipeline.sh obsnum=85826 >> job1 echo SLpipeline.sh obsnum=85882 >> job1 echo SLpipeline.sh obsnum=85820 >> job1 echo SLpipeline.sh obsnum=85878 >> job1 # construct the combination pipelines job echo SLpipeline.sh obsnums=85776,85778,85824 > job2 echo SLpipeline.sh obsnums=85818,85826,85882 >> job2 echo SLpipeline.sh obsnums=85820,85878 >> job2 # ensure you have enough true cores and memory these can be run in two steps: parallel --jobs 8 < job1 parallel --jobs 3 < job2 Using this technique, the same process took 6 minutes on a 512GB machine with 32 true cores, a speedup of almost a factor 5. Parallel Processing: slurm -------------------------- To run on a slurm based cluster, we have written a simple frontend where you can almost copy commands from the previous section, except prefix it with **sbatch_lmtoy.sh**, e.g. .. code-block:: # construct the single obsnum jobs sbatch_lmtoy.sh SLpipeline.sh obsnum=85776 sbatch_lmtoy.sh SLpipeline.sh obsnum=85778 sbatch_lmtoy.sh SLpipeline.sh obsnum=85824 sbatch_lmtoy.sh SLpipeline.sh obsnum=85818 sbatch_lmtoy.sh SLpipeline.sh obsnum=85826 sbatch_lmtoy.sh SLpipeline.sh obsnum=85882 sbatch_lmtoy.sh SLpipeline.sh obsnum=85820 sbatch_lmtoy.sh SLpipeline.sh obsnum=85878 now you have to wait until all of these are finished before the 2nd batch will do the combinations .. code-block:: # construct the combination jobs sbatch_lmtoy.sh SLpipeline.sh obsnums=85776,85778,85824 sbatch_lmtoy.sh SLpipeline.sh obsnums=85818,85826,85882 sbatch_lmtoy.sh SLpipeline.sh obsnums=85820,85878 Another option is to place these commands in a text file, exactly like was done for GNU parallel, and submit these .. code-block:: sbatch_lmtoy.sh job1 # watch and wait until job1 is done squeue -u lmtslr_umass_edu # when done, submit the next one sbatch_lmtoy.sh job2 Interactive work is discourged, but sometimes unavoidable. Here is the recommended command: .. code-block:: srun -n 1 -c 4 --mem=16G -p toltec-cpu --x11 --pty bash adjust memory an cores as needed. Web server ---------- The PI will need a password to acccess their ProjectId. It will be at something like .. code-block:: http://taps.lmtgtm.org/lmtslr/2021-S1-MX-3/ within which various **obsnum**'s will be visible, and possibly different sources and/or combinations of obsnums, .. code-block:: 85776/ # individual obsnum pipeline reduced 85778/ 85824/ 85776_85824/ # combining the 3 previous obsnums 85776_TAP.tar # TAP tar files for better (?) offline browsing 85778_TAP.tar 85824_TAP.tar 85776_SRDP.tar # full SRDP tar files for better (?) offline browsing 85778_SRDP.tar 85824_SRDP.tar 85776_85824_SRDP.tar 85776_RAW.tar # full RAW telescope data for your local $DATA_LMT tree 85778_RAW.tar # only useful if you want to re-run the pipeline 85824_RAW.tar # and only made available upon special request Future LMT SLR data reduction? ------------------------------ Here we describe the workflow in the future unified SDFITS based system. The first step is always the RAW (lmtslr or dreampy3) based conversion (*ingestion*) to SDFITS. If you are in an interactive python session, the data will be in memory in a special class, there should be no formal reason to save the SDFITS file (formerly called the *SpecFile* in lmtslr), but one is well adviced to do this. Load and Go ~~~~~~~~~~~ The initial workflow is *load-and-go* based. A number of parameters are set, a series of plots can be reviewed, including having access to the final Science Ready Data Product (SRDP). User can set new parameters and try again. An interface should exist (via dasha?) that summarizes the plots the user wants to see on screen. Vertically are the various plots the pipeline produces, horizontally are the different attempts to run the pipeline. For each pipeline run, user can download the data. The pipeline will look a little different depending if the observation was a grid (e.g. OTF) a single pointing (e.g. SEQ-Ps or RSR). The former produces a data cube, the latter a single spectrum. The user should not need to see that behind the scenes our ``data[ntime,nbeam,npol,nband,nchan]`` type of data, but occasionally this will show up in reminders how to average down the data where this could result in a higher Signal/Noise. Gridding ~~~~~~~~ For a typical OTF grid individual spectra cannot be inspected, especially with a 10Hz integration time there could be over half a million spectra! A waterfall image will give a useful overview: for each beam a time-frequency plot will easily reveal patterns, bad spectra, birdies, etc. A masking file will need to be used to mask out areas in the masking cube. It will also be useful to inspect the RMS (RMS value of a baseline fit per beam) as function of time along the OTF track, either plotted as an image (in XPOS,YPOS space), or a stacked scatter plot with RMS and TIME as variables. Stacking ~~~~~~~~ For a single pointing it will become important to inspect individual spectra. For example, for RSR with each typical 30 second integration time, there are 24 spectra (4 spectra if you would combine the 6 bands in the full RSR spectral range). Masking ~~~~~~~ A unified masking file format is being designed. Details are still being drafted in docs/masking.md, but here is a flavor of what is being considered: .. code-block:: time(12:05:10,12:30:05),chan(100,103) beam(5,7),pol(XX) select(TSYS, 250.0) select(RMS, 3.0) select(XPOS, 40.0, 50.0), select(YPOS, -30.0, -20.0) beam(1),pol(0),band(3),chan(71,71.5,GHz) user(rsr1, 1.0, 0.01) Workflow ~~~~~~~~ UMass Server has the data, a web interface will run the new-style pipeline. Data can be inspected. New parameters can be set, and re-imaged. This is being worked on (Jan 2023) The TolTECA data reduction workflow has a high level config file (yaml?) which via a command line interface steers the pipeline.