Multiflow Project Guide

Standard ADMIT projects involve the analysis of individual FITS cubes by way of user-created data flows—pipelines (or trees) composed of ADMIT tasks (ATs), each manipulating data products (BDPs) in a well-defined manner. In some cases, however, one wishes to combine results from several sources or observations (each with their own unique flow) into a single data flow; e.g., to calculate the OverlapIntegral of several Moment maps, or to apply a specific analysis sequence to similar BDPs spanning several projects, without having to modify each project individually. Projects whose flows involve data products from multiple projects are called multiflows in ADMIT (not to be confused with Multiflow Computer, R.I.P.).

Multiflows are constructed by linking one or more tasks from other projects into the multiflow project, thereby allowing any of their outputs to be data inputs to the flow. Linked tasks always reside at the root of a multiflow and their own inputs can not be modified (as they are already determined by their parent flow). In normal flows, by contrast, only ATs taking no BDP inputs (e.g., Ingest) may appear as roots of the flow. Otherwise, multiflows are constructed and operate similarly to other flows.

At the Python scripting level, a linked task is simply a task reference obtained from an existing AT via a call to its link() method. One can then add() the cloned object to another (now multiflow) project as usual, except that no source connections may be specified. Note that linking an AT also locks its parent flow in the sense that no operation resulting in removal of the original AT (or tasks it depends upon) will be permitted, as this would render the dependent multiflow project non-functional. Deleting a linked task from the multiflow project will unlink() the AT and unlock its parent project (if it no longer contains any linked tasks).

Data localization in multiflows requires special attention. In normal (uniflow) projects, all data products are written to a single ADMIT project directory, typically named in parallel with the corresponding FITS cube at the root of the flow. If linked ATs are modified in a multiflow (by changing one or more of their keyword values), the tasks will be re-executed when the flow is run. Data outputs from linked tasks, however, are always written to their original, parent project directories. Other tasks in the multiflow write their data to the working project directory, which may be either the multiflow project directory or a parent project directory as described below. Note that executing a linked task from a multiflow will mark its dependent tasks (if any) in the parent project as out of date, but will not re-execute them; this minimizes multiflow run time.

ADMIT supports two basic types of multiflows:

  1. N-to-1 flows, where data products from N projects are combined in a single multiflow.
  2. 1-to-N flows, where a single multiflow is automatically applied to compatible data products residing in N individual flows.

These are discussed in more detail in the following sections.

N-to-1 Multiflows

In this class of multiflows, the project will contain N linked ATs at the root of the flow. It does not matter whether the linked tasks are all from a single parent project, distinct parents, or somewhere in between, except that linked ATs from the same parent project should not depend on each other (unless such ATs are treated read-only in the multiflow). Once added to the multiflow, the flow can be extended, executed and manipulated like any other flow. The results of this type of multiflow are written to the multiflow project directory—i.e., the working project is the multiflow project.

1-to-N Multiflows

This type of multiflow is essentially a template or generator flow that is automatically applied to each AT in a list of tasks, as if the user had manually duplicated and attached the sub-flow to each AT in its parent project. To construct a template multiflow, one first constructs an N-to-1 multiflow (where N = 1 here), using any one of the ATs present in the task list (or any AT of compatible type) as the linked AT at the root of the flow. After the standard N-to-1 multiflow is created and (optionally) tested, the resulting flow can be sequentially applied to a list of tasks (with arbitrary parent projects) in an automated, batch processing mode. All tasks in the list must have output signatures (numbers and types of BDPs) compatible with the prototype (linked) task used to initiate the multiflow. In batch mode, the working project is the parent project of the current AT and all associated flow output is written to its originating project directory. Outside of batch match, executing the flow will direct output to the multiflow project directory (as for any other N-to-1 multiflow).

Table Of Contents

Previous topic

The ALMA Data Mining Toolkit

Next topic

Python Scripting Interface

This Page