.. role:: raw-math(raw)
    :format: latex html

ADMIT Design Overview
=====================

Introduction
------------
.. _Intro:

The ALMA Data Mining Toolkit (ADMIT) is a value-added software package
which integrates with the ALMA archive and CASA to provide scientists with
quick access to traditional science data products such as moment maps,
and with new innovative tools for exploring data cubes and their many
derived products. The goals of the
package are to (1) make the scientific value of ALMA data more immediate
to all users, (2) create an analysis infrastructure that allows users to
build new tools, (3) provide new types of tools for mining the science in
ALMA data, (4) increase the scientific value of the rich data archive
that ALMA is creating, and (5) re-execute and explore robustness of
the initial pipeline results.

For each ALMA science project a set of science quality image cubes
exist. ADMIT runs a series of ADMIT Tasks (AT), which are essentially
beefed up CASA tools/tasks, and produces a set of Basic Data Products
(BDP).  ADMIT provides a wrapper around these tasks for use within the
ALMA pipeline and to have a persistent state for re-execution later on by
the end user in ADMIT's own pipeline.  ADMIT products are contained in
a compressed archive file `admit.zip`, in parallel with the existing
Alma Data Products (ADP). Examples of ADP in a project are the
Raw Visibility Data (in ASDM format) and the Science Data Cubes (in FITS
format) for each target (source) and each band.

Once users have downloaded the ADMIT files, they can preview the work
the ALMA pipeline has created, without the immediate need to download
the generally much larger ADP. They will also be able to re-run selected
portions of the ADMIT pipeline from either the (casapy) commandline, or
a GUI, and compare and improve upon the pipeline-produced results. For
this, some of the ADP's may be needed.

ADMIT Tasks 
-----------
.. _ATs:

An ADMIT Task (AT) is made up of zero or more CASA tasks.  It takes as
input zero or more BDPs and produces one or more output BDPs.  These BDPs
do not have to be of the same type. An AT also has input parameters that
control its detailed functionality. These parameters may map directly to
CASA task parameters or may be peculiar to the AT.  On disk, ATs are stored
as XML. In memory, each is stored in a specific class representation derived
from the ADMIT Task base class.  A user may have multiple instantiations
of any given AT type active in an ADMIT session, avoiding need to to save
and recall parameters for different invocations of a task.  The 
`Flow Manager
<module/admit/FlowManager.html#admit.FlowManager.FlowManager>`_ 
keeps track of the connection order of
any set of ATs (see `Workflow Management`_).   AT parameters are validated only at runtime; there is
no validation on export to or import from XML. The representations of
every AT are stored in a single XML file (admit.xml).

Basic Data Products
-------------------
.. _BDPs:

Basic Data Products are instantiated from external files or created as
output of an ADMIT Task.  As with ATs, the external data format for BDPs
is XML and in memory, BDPs are stored in a specific class representation
derived from the BDP base class.  One XML file contains one and only one
BDP.  When a BDP is instantiated by parsing of its associated XML file,
the file first undergoes validation against a document type definition
(DTD).  Validation against a DTD will ensure that the XML nodes required
to fully instantiate the BDP are present.  Since BDP types inherit from
a base class (see Figure \ref{admitarchitecture}), there is a BDP.dtd
which defines the mandatory parameters for all BDPs. For each BDP type,
there is also a DTD that defines parameters specific to that BDP type.
Individual BDP Types are described in Section \ref{s-bdptypes}.

The DTDs for each BDP type will have been autogenerated directly from the
class definition Python.  Furthermore, the DTD is included in the XML file
itself so that the BDP definition is completely self-contained.  The ensures
integrity against future changes to a BDP definition (i.e., versioning).
To allow for flexibility by end users (aka "tinkering"), extraneous nodes
not captured in the DTD will be ignored, but will not cause invalidation.
Validation against the DTD happens for free with use of standard XML
I/O Python libraries.  Data validation occurs on both write and read.

BDPs are instantiated by the ATs responsible for creating them.  Users
will rarely have need to do so themselves.   In an AT class, a BDP
instantiation from XML may look like:

.. code:: python

  import parser
  b = parser.Parser("myfile.xml")

`Parser` is a class that uses the SAX library to interpret the
XML contents.  Inside the XML is a  `BDPType` node that indicates
the BDP derived type stored in the XML file, which will be returned
by the call. This is essentially the factory design pattern.
The XML parser converts a member datum to its appropriate type, then 
assigns the datum to a variable of the same name in the class.
For example:

.. code:: xml

  <noise type="Float">3.255</noise>

is converted to:

.. code:: python

  BDP.noise = 3.255

In serializing the BDP, we use Python introspection to determine the
variable names and data types and write them out to XML, essentially
reversing the process.  Individual BDP Types are described in
:ref:`BDP-Designs`

Workflow Management
-------------------
.. _Workflow:

We take an task-centric view of workflow in which an ADMIT Task (AT)
is an arbitrary M-to-N mapping of BDPs, possibly with additional
internal attributes and methods.  BDPs are passive
data containers, without parents or children, and are owned by the AT
which produces them. Hence, ATs also function as BDP containers.
The Flow Manager (FM) maintains a full list of ATs and how they are
connected, allowing it to keep all BDPs up to date as ATs are modified.

Conceptually, a connection maps the output(s) of one AT to the input(s) of another AT. The Flow Manager creates an overall connection map, where a single 
connection is specified by a six-element element tuple of integers indices: 

:math:`(project_{out},AT_{out},BDP_{out},project_{in},AT_{in},BDP_{in})`

In most cases, the input project and output project will be the same,
but in the case of multiflows they can differ.  Figure 1 illustrates the
connection map concept, while Figures 2 and 3 give an realistic example
workflow and the resulting connection map.

.. figure:: images/connection.png
    :height: 283 px
    :width: 443 px
    :scale: 110 %
    :alt:  Example Flow.
    :align: center

    **Figure 1:** The connection in this diagram connects two ATs, `a1` and `a2`,
    inside a single project `p0`.   The output BDP of `a1` is an
    input BDP of `a2`, i.e., :math:`b1 \equiv b2`.   A given output
    BDP may be the input to an arbitrary number of ATs, but can be the
    output one and only one AT.   For virtual projects, the first and
    fourth indices in the tuple, `i1` and `i2`, would differ.

.. figure:: images/flow6-AT.png
    :height: 685 px
    :width: 739 px
    :scale: 100 %
    :alt:  ADMIT Task connection example
    :align: center

    **Figure 2:** Example realistic workflow with task connection map. A
    FITS cube is ingested (a1) and processed through a number of tasks:
    summary, statistics, spectral line cut out, moment. Integers along
    the arrows indicate the output BDP index available to the next task
    in the map.

.. figure:: images/flow6-FM.png
    :height: 518 px
    :width: 466 px
    :scale: 120 %
    :alt:  ADMIT Task connection example
    :align: center

    **Figure 3:** The Flow Manager connection map and dependencies for the workflow in
    Figure 2. The tuples in the connection map give the project, ADMIT
    task, and BDP output/input indices. The Flow Manager also computes
    the dependence list of ATs so that a change in one AT will trigger
    re-execution of ATs that depend on it in order to recompute the BDPs.


Architecture Overview
---------------------
.. _Architecture:

Figure 4 is a schematic of the overall interaction
of ADMIT modules. Their functionality is described below.

.. figure:: images/AdmitArchitecture.png
    :height: 815 px
    :width: 986 px
    :scale: 100 %
    :alt:  Schematic of how ADMIT fits together.
    :align: center

    **Figure 4:** Schematic of how ADMIT fits together. 

BDP Infrastructure
~~~~~~~~~~~~~~~~~~

The BDP Base Class contains methods and member variables common to all BDPS.
Individual BDPs derive from it and add their own special features.  BDP I/O
is built-in to the base class through Python XML libraries.

Task Infrastructure
~~~~~~~~~~~~~~~~~~~

*ADMIT Task Classes*  
  An ADMIT Task Base Class allows one or more CASA tasks to be encapsulated
  within one ADMIT Task.  The base class should indicate required methods and
  member variables such that a user can write her own AT.  Individual ADMIT
  Tasks derive from the ADMIT Task Base Class.


*Task Test Suites*  
  Each Task must have at least a unit test and integration test.

*CASA Task library*  
  From which ADMIT tasks may be built.


Utilities
~~~~~~~~~
This ADMIT Function Library includes useful generic classes to handle
tables, plotting, and mathematical functions. Utilities will expand
as needed.

Toolkit Infrastructure
~~~~~~~~~~~~~~~~~~~~~~


*Data Discovery/Export/Import*
  This is the module that examines the working directory recursively for
  FITS files and ADMIT products.  It should be able to recognize vanilla,
  ALMA-style, and VLA-style directory trees and branch accordingly.  It will
  also handle the case of giving a self-contained subset of ADMIT products
  to a collaborator. It would write a single compressed file containing
  the ADMIT products in the appropriate hierarchy.

*ADMIT Object*  
  From the results of data discovery, the an ADMIT object is instantiated
  in memory as well as setting up the initial workflow for an ADMIT run
  (or re-run).  This initialization includes instantiation of any discovered
  ATs and BDPs.  One ALMA project maps to one ADMIT object.

*Memory Model*  
  How the ADMIT object and state are kept up to date.
  This is a combination of the Admit Object(s) and the Flow Manager state.
  The memory model has a one-to-one mapping to what's on disk.

*Flow Manager*  
  This is the infrastructure that strings Tasks together, manages the
  connection map, decides if an AT and/or its BDPs are out of date and
  the AT needs to be re-run, and manages task branches.

*Project Manager* 
  The PM is a container for one or more ADMIT objects, also referred to as a
  multiflow.  The PM allows for data mining across different ALMA projects.


User Interfaces
~~~~~~~~~~~~~~~
.. _UIs:

*casapy* 
  For scripting or direct AT invocation, CASA will be
  the supported environment.  ADMIT specific commands enabled
  by :code:`import admit`. There is also the :code:`\#!/usr/bin/env casarun`
  environment.


*Graphical User Interface.* 
  The GUI consists of two independent parts: 1) The BDP viewer which gives
  a data summary and 2) the Workflow viewer which shows the connections
  and state of ATs in the FlowManager, and allows simple modification of ATs in the
  flow.  These are described in detail in sections 
  :ref:`BDP-viewer-Design` and :ref:`Workflow-viewer-Design`.

Base Classes
~~~~~~~~~~~~

.. _AT-base-Design:

AT Base Class
~~~~~~~~~~~~~

.. BDP-base-Design:

BDP Base Class
~~~~~~~~~~~~~~

.. _Table-base-Design:

Table Base Class
~~~~~~~~~~~~~~~~

 *Description*

 The Table class is a container for holding data in tabluar format. The
 table can hold data in column-row format and also in plane-column-row
 format. Data can be added in instantiation, by columns, by rows, or by
 entire planes.

 *Use Case*

 This class is used to store any tabular type data. It should not be used 
 directly but through the :ref:`Table-BDP-Design` class.

.. _Image-base-Design:

Image Base Class
~~~~~~~~~~~~~~~~

 *Description*

 
 *Use Case*

.. _Line-base-Design:

Line Base Class
~~~~~~~~~~~~~~~

 *Description*


 *Use Case*

Individual AT Designs
---------------------

.. _File-AT-Design:

File_AT
~~~~~~~

*Description*

File_AT creates a File_BDP that contains a reference to an arbitrary file. 
It has no real use (yet), except for bootstrapping a Flow AT series, 
where it can be used to test the (large scale) system performance.
See also Ingest_AT for a real task that bootstraps a flow.

*Use Case*

Currently only used by Flow ATs for testing purposes, and thus useful
to bootstap a flow, since it can start from an empty directory, and
create a zero length file.  This means you can create realistic flows,
in the same sense as full science flows, and measure the overhead
of ADMIT and the FlowManager.

*Input BDPs*

There are no input BDPs for this bootstrap AT. A filename is required to
create an output BDP, using an input keyword (see below).

*Input Keywords*

* **file** -  The filename of the object. File does not have to exist yet.
  There is no default.

* **touch** -   Update timestamp (or create zero length file 
  if file did not exist yet)? Default: False.

*Output BDPs*

* **File_BDP** - containing the string pointing to the file, following
  the same convention the other file containers (Image_BDP,
  Table_BDP) do, except their overhead.  There is no formal
  requirement this should be a relative or absolute address.  A
  symlink is allowed where the OS allows this, a zero length file is
  also allowed.

*Procedure*

Simple Unix tools, such as **touch**, are used. No file existence test done.
There are no CASA dependencies, and thus no CASA tasks used.

*CASA tasks used*

none


.. _Flow-AT-Design:

Flow_AT
~~~~~~~

*Description*

There are a few simple FlowXY_AT's implemented for experimenting with creating a flow,
without the need for CASA and optionally actual files. 

This *Flow* series, together with File_AT, perform nothing more than
converting a dummy (one or more) input BDP(s) into (one or more) output BDP(s).
Optionally the file(s) associated with these BDP's can be created as zero
length files. Currently three*Flow* ATs are implemented.
A simple Flow11_AT can be used for converting a single
File_BDP. Two variadic versions are available:
Flow1N_AT with 1 input and N outputs, and FlowN1_AT with N inputs
and 1 output.


* **Flow11_AT** - one in, one out
* **Flow1N_AT** - one in, maNy out
* **FlowN1_AT** - maNy in, one out
* **FlowMN_AT** - Many in, maNy out

*Use Case*

Useful to benchmark the basic ADMIT infrastucture cost of a (complex)
flow.  

*Input BDPs*

* **File_BDP** -  containing simply the string pointing to a file. This
  file does not have to exist. FlowN1_AT can handle multiple input
  BDPs.

*Input Keywords*

Depending on which of the *FlowXY_AT* you have, the following keywords
are present:

* **file** - The filename of the output object if there is only one
  output BDP (Flow11_AT and FlowN1_AT). For Flow1N_AT the filename
  is inherited by adding an index 1..N to the input filename, so this
  keyword will be absent in this Flow1N_AT.

* **n** -  Normally variadic input or output can be determined otherwise
  (e.g. it generally depends in a complex way on user parameters), but
  for the Flow series it has to be manually set.  Only allowed for
  Flow1N_AT where the **file** keyword is not use.  Default: 2.

* **touch** - Update timestamp (or create zero length file
  if filename did not exist)? Default: False.

* **exist** - Check existence of the input BDP file(s). Default: True

*Output BDPs*

* **File_BDP** - containing simply the string pointing to the
  file, following the same convention the other file containers
  (Image_BDP, Table_BDP) do, except their overhead.
  There is no formal requirement this should be a relative or absolute
  address. A symlink is allowed where the OS allows this, a zero length
  file is also allowed.

*Procedure*

Only Unix tools, such as **touch**, are used.
There are no CASA dependencies, and thus no CASA tasks used.

*CASA tasks used*

None


.. _Ingest-AT-Design:


Ingest_AT
~~~~~~~~~

*Description*

Ingest_AT converts a FITS file into a CASA Image.
It is possible to instead keep this a
symlink to the FITS file, as some CASA tasks are able to deal with
FITS files directly, potentially cutting down on the I/O overhead.
During the conversion to a CASA Image a few extra processing
steps are possible, e.g.
primary beam correction and taking a subcube instead of the
full cube. Some of these steps may add I/O overhead.

*Use Case*

Normally this is the first step that any ADMIT needs to do
to set up a flow for a project, and this is done by ingesting
a FITS file into a CASA Image. The SpwCube_BDP contains
merely the filename of this CASA Image.


*Input BDPs*

None. This is an exceptional case (for bootstrapping) where a (FITS) filename
is turned into a BDP.


*Input Keywords*

* **file** -  The filename of the  FITS file. No default, this is the only required
  keyword. Normally 
  with extension **.fits**, we will refer to this as **basename.fits**. 
  *We leave the issue of absolute/relative/symlinked names up to the caller.
  We will allow this to be a casa image as well.*

* **pbcorr** -  Apply a primary beam correction to the input FITS file? If true,
  in CASA convention, a **basename.flux.fits** needs to be present. Normally,
  single pointings from ALMA are not primary beam corrected, but mosaiced pointings
  are and thus would need such a correction to make a noise flat image.
  *We should probably allow a way to set another filename*

* **region** - Select a subcube for conversion, using the CASA region notation. By
  default the whole cube is converted. See also **edge=** below.

* **box** - Instead of the more complex **region** selection described above, a
  much simpler *blc,trc* type description is more effective.

  *Discussion about region= vs. box=, and is edge= still needed?*

* **edge** -  Edge channel rejections. Two numbers can be given, removing the first
  and last selected number of channel. Default: 0,0  *This keyword could
  interfere with **region=**, but is likely the most common one used*

* **mask** -  If True a mask also needs to be created. 
  *need some discussion on the background of this curious option*
  *importfits also has a somewhat related zeroblanks= keyword*

* **symlink**  - True/False: if True, a symlink is kept to the fitsfile,
  instead of conversion to a CASA Image.
  This may be used in those rare cases where your whole flow can work
  with the fits file, without need to convert to a CASA Image
  Setting to True, also disables all other processing
  (mask/region/pbcorr). This keyword should be used with caution,
  because existing flows will then break if another task is added that
  cannot use a FITS file.
  [False] 

  *theoretically File_AT can also be used to create a File_BDP with
  a symlinked FITS file, but will subsequent AT's recognize this
  underloaded BDP?*

* **type** - If you ingest an *alien* object of which the origin
  is not clear, there must be a way to set the type of image
  (line, continuum, moment, alpha, ...)


*Output BDPs*
  
* **SpwCube_BDP** -  the spectral window cube.
  The default filename for this CASA Image is *cim*.
  There are no associated graphical cues for this BDP. Subsequent steps,
  such as 
  :ref:`CubeSpectrum-AT-Design` and 
  :ref:`CubeStats-AT-Design`. In the case that the input
  file was already a CASA image, and no subselection or PB correction
  was done, this can be a symlink to the input filename.
	  

*Procedure*

Ignoring the symlink option (which has limited use in a full flow),
CASA's importfits will first need to create a full copy of the FITS
file into a CASA Image, since there are no options to process a
subregion. In the case of multiple CASA programs processing (e.g. when
region selection plus primary beam correction), it may be useful to
check the performance difference between using tools and tasks.

Although using **edge=** can be a useful operation to cut down the
filesize a little bit, the catch-22 here is that - unless the user
knows - the bad channels are not known yet until CubeStats_AT has
been run, which is the next step after Ingest_AT. Potentially one can
re-run Ingest_AT.

CAVEAT: Using fits files, instead of CASA images, actually adds system
overhead each time an image has to be read. The saved time
from skipping *importfits* is quickly lost when your flow
contains a number of CASA tasks.


*CASA tasks used*

* **importfits** - if just a conversion is done. No region selection
  can be done here, only some masking operation to replace masked
  values with 0s.

* **imsubimage** - only if region processing is done.  Note box= and
  region= and chans= have common methods, our Ingest_AT should keep
  that simpler.  For example, chans="11~89" for a 100 channel cube
  would be same as edge=11,10

* **immath** - if a primary beam correction is needed, to get noise
  flat images.

Alternatively, AT's can also use the CASA tools. Especially this can be
a more efficient way to chain a number of small image operations, as
for example Ingest_AT would do if region selection and primary beam
correction are used. 
  

.. _CubeSpectrum-AT-Design:


CubeSpectrum_AT
~~~~~~~~~~~~~~~

*Description*

CubeSpectrum_AT will compute one (or more) typical spectra from a spectral
line cube.  They are stored in a CubeSpectrum_BDP, which also contains a
graph of intensity versus frequency.  For certain types of cubes, these
can be effectively used in LineID_AT to identify spectral lines.

The selection of which point will be used for the spectrum is of
course subject to user input, but some automated options can be given.
For example, the brightest point in the cube can be used. The reference
pixel can be used. The size around this point can also be choosen, but this
is not a recommended procedure, as it can affect the LineID_AT procedure,
since it will smooth out (but increase S/N) the spectrum if large velocity
gradients are present. If more S/N is needed, 
:ref:`CubeStats-AT-Design` or
:ref:`PVCorr-AT-Design`
can be used. 

Note in an earlier version, the SpectralMap_AT would allow multiple
points to be used. We deprecated this, in favor of allowing this AT to
create multiple points instead of just one. Having multiple spectra,
much like what is done in PVCorr_AT,  cross-correlations could be made
to increase the line detection success rate.

This AT used to be called PeakSpectrum in an earlier revision.

*Use Case*

This task is probably one of the first to run after ingestion, and will
quickly be able to produce a representative spectrum through the cube,
giving the user ideas how to process these data.  If spectra are taken 
through multiple points, a diagram can be combined with a CubeSum 
centered around a few spectra.

*Input BDPs*

* **SpwCube_BDP** - The cube through from which the spectra are drawn. Required.
  Special values **xpeak,ypeak** are taken from this cube.
  
* **CubeSum_BDP** - An optional BDP describing the image that represents the sum
  of all emission. The peak(s) in this map can be selected for the spectra drawn.
  This can also be a moment-0 map from a LineCube.

* **FeatureList_BDP** - An optional BDP describing features in an image or cube,
  from which the RA,DEC positions can be used to draw the spectra.

*Input Keywords*

* **points** - One (or more) points where the spectrum is computed.
  Special names are for special points:  **xpeak,ypeak** are for using 
  the position where the peak in the cube occurs.  **xref,yref** are the
  reference position (CRPIX in FITS parlor).

* **size** - Size of region to sample over. Pixels/Arcsec.  Square/Round.  
  By default a single pixel is used. No CASA regions allowed here, keep
  it simple for now.

* **smooth** - Some smoothing option applied to the spectrum. Use with
  caution if you want to use this BDP for LineID.

*Output BDPs*

* **CubeSpectrum_BDP** - A table containing one or more spectra. For a single
  spectrum the intensity vs. frequency is graphically saved. For multiple
  spectra (the original intent of the deprecated SpectralMap_AT) it should
  combine the representation of a CubeSum_BDP with those of the Spectra
  around it, with lines drawn to the points where the spectra were taken.

*Procedure*

After making a selection through which point the spectrum is taken,
grabbing the values is straightforward. For example, the imval task
in CASA will do this.

*CASA tasks used*

* imval - to extract a spectrum around a given position 

.. _CubeStats-AT-Design:

CubeStats_AT
~~~~~~~~~~~~

*Description*

CubeStats_AT will compute per-channel statistics, and can help
visualize what spectral lines there are in a cube.  They are stored in
a CubeStats_BDP.  For certain types of cubes, these can be effectively
used in LineID_AT to identify spectral lines. In addition, tasks such
as CubeSum_AT and LineCube_AT can use this if the noise depends on
the channel.

*Use Case*

Many programs needs to know (channel dependent) statistics from a cube in order to be able to
clip out the noise and get as much signal in the processing pipeline. In addition, the analysis
to compute channel based statistics needs to include some robustness, to remove signal. This
will become progressively hard if there is a lot of signal in the cube.


*Input BDPs*

* **SpwCube_BDP** - input spectral window cube, for example as created with Ingest_AT.

*Input Keywords*

* **robustness** - Several options can be given here, to select the robust RMS.  (negative half gauss fit, robust, ...)

*Output BDPs*

* **CubeStats_BDP** - A table containing various quantities (min,max,median,rms) on a per channel basis.  
  A global statistics can (optionally?) also be recorded. The graphical output contains at least two plots:

1) histogram of the distribution of Peak (P), Noise (N), and P/N, typically logarithmically; 
2) A line diagram shows the P, N and P/N (again logarithmically) as function of frequency.
   As a good diagnostic, the P/N should now not depend on frequency if only line-free channels
   are compared. It can also be a good input BDP to LineID_AT.

*Procedure*

The imstat task (or the ia.statistics tool) in CASA can compute plane
based statistics. Although the *medabsdevmed* output key is much better
than just looking at :math:`\sigma`, a signal clipping robust option may soon
be available to select a more robust way to compute what we call the
Noise column in our BDP. Line detection (if this BDP is used) is all
based on log(P/N).

*CASA tasks used*

* imstat - extract the statistics per channel, and per cube.  Note
  that the tool and task have different capabilities, for us, we need
  to the tool (ia.statistics)
  Various new robustness algorithms are now 
  implemented using the new *algorithm=* keyword.

.. _CubeSum-AT-Design:

CubeSum_AT
~~~~~~~~~~

*Description*

CubeSum_AT tries to create a representative *moment-0* map of a whole cube, by adding up all the emission, irrespective


*Use Case*

A simple *moment-0* map early in a flow can be used to select a good location for selecting 
a slice through the spectral window cube.  See PVSlice_AT. 

*Input BDPs*

* **SpwCube** - input cube
* **CubeStats** - Peak/RMS. Optional. 
  Useful to be more liberal and allow cutoff to depend on the RMS, which may be channel dependent.

*Input keywords*

* **cutoff** - Above which emission (or below which, in case of absorbtion features).
* **absorption** - Identify absorption lines? True/False

*Output BDPs*

* **CubeSum_BDP** - An image

*Procedure*

The method is a special case of Moment_AT, but we decided to keep it separate. Also
when the noise level accross the band varies, the use of a CubeStats_BDP is essential.
Althiough it is optional here, it is highly recommended.


*CASA Tasks Used*

.. _FeatureList-AT-Design:

FeatureList_AT
~~~~~~~~~~~~~~

 *Description*

 FeatureList_AT takes a LineCube, and describes the three-dimensional structure
 of the emission (and/or absorption) 

.. note:: Should we allow 2D maps as well?


 *Use Case*

 *Input BDPs*

 Depending on which BDP(s) is/are given, and a choice of keywords, line
 identification can take place.  For example, one can use both a
 CubeSpectrum and CubeStats and use a conservative AND or a more
 liberal OR where either both or any have detected a line.

  * **CubeSpectrum** - A spectrum through a selected point. For some cubes this is perfectly ok.
  * **CubeStats**  - Peak/RMS. Since this table analyzed each plane of a cube, it will more likely pick up weaker lines, which a CubeSpectrum will miss.
  * **PVCorr** - an autocorrelation table from a PVSlice_AT. This has the potential of detecting even weaker lines, but its creation via PVSlice_AT is a fine tuneable process.
.. note:: * **SpectralMap**  - This BDP may actually be absorbed in CubeSpectrum, where there is more than one spectrum, where every spectrum is tied to a location and size over which the spectrum is computed.


 *Input keywords*

  * **vlsr** - If VLSR is not know, it can be specified here. Currently must be known, either via this keyword, or via ADMIT (header). 
.. note:: Is that in admit.xml?   And what about VLSR vs. z?


  * **pmin** -  Minimum likelihood needed for line detection. This is a number between 0 and 1.  [0.90]
  * **minchan** - Minimum number of continguous channels that need to contain a signal, to combine into a line. Note this means that each channel much exceed **pmin**.  [5]

 *Output BDPs*

  * **LineList_BDP** - A single LineList is produced, which is typically used by LineCube_AT to cut a large spectral window cube into smaller LineCubes.

 *Procedure*

 The line list database will be the splatalogue subset that is already
 included with the CASA distribution.  This one can be used offline,
 e.g. via the slsearch() call in CASA.  Currently astropy's :code:`astroquery.splatalogue` 
 module has a :code:`query_lines()` function to query the full online web interface.

 Once LineID_AT is enabled to attempt actual line identifications, one of those methods will
 be selected, together with possible selections based on the astrophysical source. This will
 likely need a few more keywords.

 A few words about likelihoods.   Each procedure that creates a table is either supposed to deliver a probability, or LineID_AT must be able to compute it.   Let's say for a computed RMS noise
 :math:`\sigma`, a :math:`3\sigma` peak in a particular channel would have a likelihood
 of 0.991 (or whatever value) for CubeStats.

 *CASA Tasks Used*

 There are currently no tasks in CASA that can handle this. 

.. _LineID-AT-Design:

LineID_AT
~~~~~~~~~

 *Description*

 LineID_AT creates a :ref:`LineList-BDP-Design`, a table of spectral lines discovered or detected 
 in a spectral window cube. It does not need a spectral window cube for this, instead it
 uses derived products (currently all tables) such a :ref:`CubeSpectrum-BDP-Design`,
 :ref:`CubeStats-BDP-Design` or :ref:`PVCorr-BDP-Design`.  It is possible to have the program 
 determine the VLSR if this is not known or specified (in the current version it needs to be 
 known), but this procedure is not well determined if only a few lines are present.  Note that
 other spectral windows may then be needed to add more lines to make this fit un-ambigious.

 Cubes normally are referenced by frequency, but if already by velocity then the rest frequency 
 and vlsr/z must be given.

 For milestone 2 we will use the slsearch() CASA task, a more complete database tool is under
 development in CASA and will be implemented here when complete.

 *Use Case*

 In most cases the user will want to identify any spectral lines found in their data cube. This
 task will determine what is/is not a line and attempt to identify it from a catalog.

 *Input BDPs*

 * :ref:`CubeSpectrum-BDP-Design` - just a spectrum through a selected point.
 * :ref:`CubeStats-BDP-Design` Peak/RMS. Since this table analyses each plane of a cube, it will more likely pick up weaker lines, which a :ref:`CubeSpectrum-BDP-Design` will miss.
 * :ref:`PVCorr-BDP-Design` an autocorrelation table from a :ref:`PVSlice-AT-Design`. This has the potential of detecting even weaker lines, but its creation via :ref:`PVSlice-AT-Design` is a fine tuneable process.

 *Input Keywords*

 * **vlsr** - If VLSR is not known, it can be specified here. Currently must be known, either via this keyword, or via ADMIT (header). This could also be z, but this could be problematic for nearby sources, or for sources moving toward us.
 * **rfeq** - Rest frequency of the vlsr, units should be specified in CASA format (95.0GHz) [None]. Only required if the cube is specified in velocity rather than frequency.
 * **pmin** - Minimum sigma level needed for detection [3], based on calculated rms noise
 * **minchan** - Minimum number of contiguous channels that need to contain a signal, to combine into a line. Note this means that each channel must exceed **pmin**.  [5]

 *Output BDP*

 :ref:`LineList-BDP-Design` - A single LineList is produced, which is typically used by :ref:`LineCube-AT-Design` to cut a large spectral window cube into smaller LineCube's.

 *Output Graphics*

 None.

 *Proceedure*

 Depending on which BDP(s) is/are given, and a choice of keywords, the procedure may vary slightly.
 For example, one can use both a :ref:`CubeSpectrum-BDP-Design` and :ref:`CubeStats-BDP-Design`
 and use a conservative AND or a more liberal OR where either both or any have "detected" a line.
 The method employed to detect a line will be based on the line finding mechanisms found in the 
 pipeline. The method will be robust for spectra with sparse lines, but may not be for spectra 
 that are a forest of lines. Once lines are found the properties of each line will be determined (
 rest frequency, width, peak intensity, etc.). Using the parameters (rest frequency and width) the 
 database will be searched to find any possible line identifications.

 *CASA Tasks Used*

 * **slsearch** to attempt to identify the spectral line(s)

.. _LineCube-AT-Design:

LineCube_AT
~~~~~~~~~~~

 *Description*

 A LineCube is a small spectral cube cut from the full spectral window cube, typically centered in
 frequency/velocity on a given spectral line. LineCube_AT creates one (or more) of such cubes. It
 is done after line identification (using a :ref:`LineList-BDP-Design`), so the appropriate
 channel ranges are known.  Optionally this AT can be used without the line identification, and
 just channel ranges are given (and thus the line will be designated as something like "U-115.27".
 The line cubes are normally re-gridded onto a common velocity (km/s) scale, for later comparison,
 and thus the :ref:`CubeStats-AT-Design` will need to be re-run on these cubes. Also, normally the 
 continuum will have been subtracted these should only be the line emission, possibly with 
 contamination from other nearby lines. This is a big issue if there is a forest of lines,
 and proper separation of such lines is a topic for the future and may be documented here as well
 with a method.

 One of the goals of LineCube_AT is to creates identically sized cubes of
 different molecular transitions for easy comparison and cross referenced analysis.

 *Use Case*

 A LineCube would be used to isolate a single molecular component in frequency dimensions. The produced subimage would be one of the natural inputs for the :ref:`Moment-AT-Design` task.

 *Input BDPs*

 * :ref:`SpwCube-BDP-Design` - a (continuum subtracted) spectral window cube
 * :ref:`LineList-BDP-Design` - a LineList, as created with :ref:`LineID-AT-Design`. This LineList does not need to have identified lines. In the simple version of LineID will just designate channel ranges as "U" lines.
 * :ref:`CubeStats-BDP-Design` - (optional) CubeStats associated with the input spectral cube. This is useful if the RMS noise would depend on the channel, which can happen for wider cubes, especially near the spectral window edges. Otherwise the keyword **cutoff** will suffice.

 *Input Keywords*

 * **regrid** (optional) - The velocity to regrid the output cube channels to, units: can be specified as other CASA values are (3.0kms), but defaults to km/s [None] (i.e. no regridding)
 * **cutoff** (optional) - The cutoff value in sigma to use when generating the subcube(s), all values below this will be masked [5.0]

*Output BDP*

 :ref:`LineCube-BDP-Design` - The output from this AT will be a line cube for each input spectral line. Each spectral cube will be centered on the spectral line and regridded to a common velocity scale, if requested.

*Output Graphics*

None

*Procedure*

 For each line found in the input :ref:`LineList-BDP-Design` this AT will grab a subcube based on the input parameters.

*CASA Tasks Used*

 * **imsubimage** - to extract the final subcube from the main cube, including any masking
 * **imregrid** - if regridding in velocity is needed


.. _PVSlice-AT-Design:

PVSlice_AT
~~~~~~~~~~

 *Description*

 PVSlice_AT creates a position-velocity slice through a spectral
 window cube, which should be a representative version showing all the spectral lines.

 *Use Case*

 A position-velocity slice is a great way to visualize all the emission
 in a spectral window cube, next to CubeSum.

 *Input BDPs*

  * :ref:`SpwCube-BDP-Design` - Spectral Window Cube to take the slice through.  We normally
    mean this to be a Position-Position-Velocity (or Frequency) cube.

  * **CubeSum** - Optional. One of CubeSum or PeakPointPlot can be used to
    estimate the best slit. It will use a moment of inertia analysis to decipher the best line
    in RA-DEC for the slice.

  * **PeakPointPlot** - Optional

 *Input Keywords*

  * **center** - Center and Position angle of the line in RA/DEC.
    E.g.  129,129,30 Optional length?  Else full slice through cube is
    taken * NEED TO DESCRIBE HOW POSITION ANGLE IS DEFINED.  EAST OF NORTH?*

  * **line** - Begin and End points of the line in RA/DEC.  E.g.   30,30,140,140\\
    *WHAT ARE THE UNITS? COULD THIS BE DONE WITH A REGION KEYWORD?*

  * **major** - Place the slit along the major axis?  Optionally the minor axis slit can
    be taken. For rotation flow the major axis makes more  sense, for outflows
    the minor axis makes more sense.  Default: True.

  * **width** - Width of the slit. By default sampling through the cube is
    done, which equals a zero thickness, but a finite number of
    pixels can be choosen as well, in which case the signal within
    that width is averaged.
    Default: 0.0

  * **minval** - Minimum intensity value below which no data from CubeSum or PeakPointPlot are
    used to determine the best slit.

  * **gamma** - The factor by which intensities are weighed (intensity**gamma) to compute
    the moments of inertia from which the slit line is computed.
    Default: 1


 *Output BDP*

 * **PVSlice_BDP** - 

 *Procedure*

 Either a specific line is given manually, or it can be derived from a reference map (or table).
 By this it computes a (intensity weighted) moment of inertia, which then defines a major and minor
 axis. Spatial sampling on output is the same as on input (this means if there ever
 is a map with unequal sampling in RA and DEC, there is an issue, WSRT data?).

 *CASA Tasks Used*

 * **impv** - to extract the PV slice


.. _PVCorr-AT-Design:

PVCorr_AT
~~~~~~~~~

*Description*

PVCorr_AT uses a position-velocity slice to correlate distinct repeating
structures in this slice along the velocity (frequency) direction to find
emission or absorbtion lines.

Caveat: if the object of interest has very different types of emission regions,
e.g a nuclear and disk component which do not overlap, the detection may not work
as well.

*Use Case*

For weak lines (NGC253 has nice examples of this) none of the CubeStats or CubeSpectrum give
a reliable way to detect such lines, because they are essentially 1-dimensional cuts through the
cube and become essentially like noise.  
When looking at a Position-Velocity slice, weak lines show up quite clearly to the eye,
because they are coherent 2D structures which mimick those of more obvious and stronger lines
in the same PVSlice. The idea is to cross-correlated an area around such strong lines along the
frequency axis and compute a cross-correlation coefficient.

There is also an interesting use case for virtual projects: imagine a PV slice with only
a weak line, but in another related ADMIT object (i.e. a PVslice from another spectrum window) it 
is clearly detected. Formally a cross correlation can only be done in velocity space when the
VLSR is known, but within a spectral window the non-linear effects are small. Borrowing a
template from another spw with widely different frequencies should be used with caution.


*Input BDPs*

* **PVSlice** - The input PVSlice

* **PVslice2** - Alternative PVSlice (presumably from another virtual project)

* **CubeStats** -  Statistics on the parent cube

*Input Keywords*

* **cutoff** -  a conservative cutoff above which an area is defined for the N-th strongest line.
  Can also be given in terms of sigma in the parent cube.

* **order** - Pick the N-th strongest line in this PV slice as the template.
  Default:1 

*Output BDPs*

* **PVCorr_BDP** -  Table with cross-correlation coefficients


*Procedure*

After a template line is identified (usually the strongest line in the
PV Slice), a conservative polygon (not too low a cutoff) is defined
around this, and cross correlated along the frequency axis. Currently
this needs to be a single polygon (really?)

Given the odd shape of the emission in a PVSlice, the correlation
coefficient is not exactly at the correct *velocity*.  A correction
factor needs to be determined based on identifying the template line
with a known line frequency, and VLSR. Since line identifaction is the
next step after this, this catch-22 situation needs to be resolved,
otherwise a small systemic offset can be present.

*CASA tasks used*

none exist yet that can do this. NEMO has a program written in C, 
and the ideas in there will
need to be ported to python.


.. _PeakPointPlot-AT-Design:

PeakPointPlot_AT
~~~~~~~~~~~~~~~~


.. _Moment-AT-Design:

Moment_AT
~~~~~~~~~

.. _BDP-Designs:

Individual BDP Designs
----------------------

.. _Table-BDP-Design:

Table_BDP
~~~~~~~~~

.. _Image-BDP-Design:

Image_BDP
~~~~~~~~~

.. _Line-BDP-Design:

Line_BDP
~~~~~~~~

.. _LineList-BDP-Design:

LineList_BDP
~~~~~~~~~~~~

 *Description*

 A table of spectral lines detected in a spectrum, map, or cube. Output from the :ref:`LineID-AT-Design`.

 *Inherits From*

 :ref:`Table-BDP-Design`

 *Constituents*

 The items of this BDP will be a table with the following columns:

 * **frequency** - in GHz, precise to 5 significant figures
 * **uid** - ANAME-FFF.FFF ; e.g.  CO-115.271,   N2HP-93.173,   U-98.76
 * **formula** - CO, CO_v1, C2H, H13CO+
 * **fullname** - Carbon monoxide, formaldehyde
 * **transition** - Quantum numbers
 * **velocity** - vlsr or offset velocity (based on rest frequency)
 * **energies** - lower and upper state energies in K
 * **linestrength** - spectral line strength in D$^2$
 * **peakintensity** - the peak intensity of the line in Jy/bm
 * **peakoffset** - the offset of the peak from the rest frequency in MHz
 * **fwhm** - the FWHM of the line in km/s
 * **channels** - the channels the line spans (typically FWHM)

 Additionally an item to denote whether the velocity is vlsr or offset velocity. 
 The offset velocity will only be used when the vlsr is not known and cannot be determined. 
 The coordinates of where the line list applies along with units will also be stored.

.. _LineCube-BDP-Design:

LineCube_BDP
~~~~~~~~~~~~
 *Description*

 A Line Cube is a subcube of the initial image, This subcube represents the emission of a 
 single (or degenerate set) spectral line, both in spatial and velocity/frequency dimensions. 
 It is the output from the :ref:`LineCube-AT-Design`.

 *Inherits From*

 :ref:`SpwCube-BDP-Design`

 *Constituents*

 An image of the output cube, a SPW cube, a listing of the molecule, transition, energies, rest freq, and line width.

.. _SpwCube-BDP-Design:

SpwCube_BDP
~~~~~~~~~~~

.. _CubeSpectrum-BDP-Design:

CubeSpectrum_BDP
~~~~~~~~~~~~~~~~

.. _CubeStats-BDP-Design:

CubeStats_BDP
~~~~~~~~~~~~~

.. _PVCorr-BDP-Design:

PVCorr_BDP
~~~~~~~~~~

.. _GUI-Design:

Graphical User Interface Design
-------------------------------

.. _BDP-viewer-Design:

BDP Viewer Design
~~~~~~~~~~~~~~~~~

The purpose of the BDP viewer is to give the user an always up-to-date
summary of the data that have been produced by ADMIT for the target
ingested file(s).  In this model, each view window is associated with
one ADMIT object.  As the user produces data by creating and executing
ATs through a flow, the view representing the data will be refreshed.
Tasks which create images or plots will also create thumbnails which
can be shown in the view and clicked on to be expanded or to get more information.
We adopt a web browser as the basic view platform because

    i) Much of the interaction, format (e.g., tabbed views), and bookkeeping required is already provided by a web browser platform, decreasing greatly the amount of code we have to write.

    ii) Users are quite familiar with browsers, making learning the BDP viewer easier. Furthermore, they can user their browser of choice.

    iii) By doing so, we can adopt the look and feel of the ALMA pipline web log, keeping to a style with which ALMA users will be familiar.

    iv) Python provides an easy-to-use built-in HTTP server (BaseHTTPServer), so no external Python package is needed.

    v) Writing the data summary in HTML allows us flexibility to test various view styles and quickly modify the style as our needs change.

When an ADMIT object is instantiated, it will start a BDP view server on a unique
localhost port in a separate thread and report to the user the localhost web address
on which to view the BDPs for that ADMIT object.  The server will be given the data
output directory as its document root.  Upon execution, each AT will invoked a method
on the ADMIT object that updates an HTML file(s) in the document root with the newly
available items.  Users can monitor different ADMIT objects in different browser tabs.

The HTML files will include the well-known Javascript code 
`Live.js <http://livejs.com>`__ which auto-refreshes pages when something has changed.
This script polls once per second for changes in local HTML, CSS, or Javascript.
If we find resource usage is an issue, we could increase the poll interval. We can also
modify the script to monitor other file types, but it is simpler to use a timestamp
in a comment tag in index.html that gets modified at at each update.  

Following the ALMA Archive Pipeline web pages, we will use the `Bootstrap Framework
<http://getbootstrap.com>`__ for the layout style.   The general layout will be
grid-like.  Each AT writes its summary information in its own division ("div" in CSS
language) in the web page, and divisions are added at the bottom as they are created.
So the order of divisions on the web page are essentially workflow order (for simple
flows).  In each division, the BDP info will be laid out horizontally as thumbnails or
links to secondary pages. For multiflows, we may want a customization in which all ADMIT
objects in the multiflow are represented on the page with clear graphical separation.

.. _Workflow-viewer-Design:

Workflow Viewer Design
~~~~~~~~~~~~~~~~~~~~~~
  
**Lisa to put FlowManger conceptual design here.**


Example Code
------------

This is working example that ingests a FITS file, makes moments, saves the
ADMIT state, restores it, modifies the moment parameters and then remakes 
the moments.


.. literalinclude:: ../../admit/test/milestone1.py
   :language: python
   :linenos: