Deployment#

Arcana provides tools for deploying pipelines in Docker containers that can be run in XNAT’s container service. Pipelines can be built done on an individual basis or as part of a wider a suite (e.g. Australian Imaging Service Pipelines). As well as building Docker images, the deployment workflow includes procedures to test and generate documentation.

Command definitions#

The XNAT container service uses command configuration files saved in the org.nrg.commands image label to resolve metadata for the pipelines that available on a given Docker image. The XnatViaCS.generate_xnat_command() method is used to generate the JSON metadata to be saved in this field.

There are four key fields that will determine the functionality of the command (the rest are metadata fields that are exposed to the XNAT UI):

  • pydra_task

  • inputs

  • outputs

  • parameters

The pydra_task keyword argument should be the path to an installed Python module containing a Pydra task followed by a colon and the name of the task, e.g. pydra.tasks.fsl.preprocess.fast:Fast. Note that Arcana will attempt to resolve the package that contains the Pydra task and install the same version (including local development versions) within the Anaconda environment in the image. inputs and parameters expose text boxes in the XNAT dialog when the pipelines are run. outputs determine where the outputs will be stored in the XNAT data tree.

Inputs prompt the user to enter selection criteria for input data and are used by the entrypoint of the Docker containers to add source columns to the dataset (see Frames: Rows and Columns). They are specified by 4-tuple consisting of

  • name of field in the pydra task input interface

  • format required by pydra task

  • description of input that will be exposed to the XNAT UI

  • the row row_frequency of the column (see Spaces and Frames: Rows and Columns)

Parameters are passed directly through the pipeline add method (see Pydra workflows) that is run in the container, and consist of a 2-tuple with

  • name of field in the pydra task input interface

  • description of parameters that will be exposed to the XNAT UI

Outputs do not show up in the XNAT dialog and are specified by a 3-tuple:

  • name of field in the pydra task output interface

  • format produced by pydra task

  • destination path (slashes are permitted interpreted as a relative path from the derivatives root)

from arcana.data.stores.medimage import XnatViaCS
from arcana.data.spaces.medimage import Clinical
from arcana.data.formats.medimage import NiftiGz


xnat_command = XnatViaCS.generate_xnat_command(
    pipeline_name='example_pipeline',
    pydra_task='pydra.tasks.fsl.preprocess.fast:FAST',
    image_tag='example/0.1',
    description=(
        "FAST (FMRIB's Automated Segmentation Tool) segments a 3D image of "
        "the brain into different tissue types (Grey Matter, White Matter, "
        "CSF, etc.), whilst also correcting for spatial intensity variations "
        "(also known as bias field or RF inhomogeneities)."),
    version='6.0-1',
    info_url='https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FAST',
    inputs=[
        ('in_files', NiftiGz, 'File to segment', 'session'),
        ('number_of_classes', int, 'Number of classes', 'session')],
    outputs=[
        ('tissue_class_files', NiftiGz, 'fast/tissue-classes'),
        ('partial_volume_map', NiftiGz, 'fast/partial-volumes'),
        ('partial_volume_files', NiftiGz, 'fast/partial-volume-files'),
        ('bias_field', NiftiGz, 'fast/bias-field'),
        ('probability_maps', NiftiGz, 'fast/probability-map')],
    parameters=[
        ('use_priors', 'Use priors'),
        ('bias_lowpass', 'Low-pass filter bias field')],
    configuration=[  # If different from the Pydra task
        ('output_biasfield', True),
        ('output_biascorrected', True),
        ('bias_lowpass', 5.0)],
    row_frequency='session')

When working with the CLI, command configurations are stored in YAML format, with keys matching the arguments of XnatViaCS.generate_xnat_command().

Note

image_tag and registry are omitted from the YAML representation of the commands as they are provided by the image configuration (see Building)

Building#

Dockerfiles for pipeline images are created using Neurodocker and can therefore work with any Debian/Ubuntu or Red-Hat based images (using a value for package_manager keyword argument of "apt" for Debian based or "yum" for Red-Hat based). Arcana installs itself into the Docker image within an Anaconda environment named “arcana”. Therefore, it won’t typically conflict with packages on existing Docker images for third-party pipelines unless they are also installed using Anaconda.

Extending the YAML format used to define the command configurations, the full configuration required to build an XNAT docker image looks like

pkg_name: FSL
pkg_version: &pkg_version '6.0.1'
wrapper_version: '1'
authors:
    - name: Thomas G. Close
      email: thomas.close@sydney.edu.au
base_image: !join [ 'brainlife/fsl:', *pkg_version ]
info_url: https://fsl.fmrib.ox.ac.uk/fsl/fslwiki
package_manager: apt
system_packages:
package_templates:
- name: dcm2niix
    version: v1.0.20201102
python_packages:
    - name: pydra-dcm2niix
commands:
    pipeline_name: fast
    pydra_task: pydra.tasks.fsl.preprocess.fast:FAST
    description:
        FAST (FMRIBs Automated Segmentation Tool) segments a 3D image of
        the brain into different tissue types (Grey Matter, White Matter,
        CSF, etc.), whilst also correcting for spatial intensity variations
        (also known as bias field or RF inhomogeneities).
    version: 1
    info_url: https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FAST
    inputs:
        - name: in_files
          format: medimage:NiftiGzX
          stored_format: medimage:Dicom
          description: Anatomical image to segment into different tissues
    outputs:
        - name: tissue_classes
          format: medimage:NiftiGz
          path: fast/tissue-classes
        - name: probability_maps
          format: medimage:NiftiGz
          path: fast/probability-map
    parameters:
        - name: use_priors
          description: Use priors in tissue estimation
        - name: bias_lowpass
          description: Low-pass filter bias field
    configuration:
        - output_biasfield: true
        - bias_lowpass: 5.0
    row_frequency: session

where fields in the top-level YAML are provided as arguments to XnatViaCS.generate_dockerfile(), i.e.

from arcana.data.stores.medimage import XnatViaCS

xnat_command = XnatViaCS.generate_dockerfile(
    xnat_commands=[xnat_command],  # List of commands available on the image generated by XnatViaCS.generate_xnat_command()
    python_packages=[
        ('pydra-fsl', '0.1.0')],  # Required python packages (aside from arcana and its dependencies)
    maintainer='your-email@your-institute.org',  # maintainer of wrapper (i.e. not pipeline unless they are the same)
    base_image='brainlife/fsl',  # base Docker image
    package_manager='apt',  # package manager of base image
    packages=[],  # system packages to install (i.e. with 'apt')
    extra_labels={},  # extra labels you might want to put into the image
    arcana_extras=[]  # install extras for Arcana package (e.g. 'test'
))

The CLI command to build the image from the YAML configuration is

$ arcana deploy build 'your-pipeline-config.yml'
Successfully built "FSL" image with ["fast"] commands

To build a suite of pipelines from a series of YAML files stored in a directory tree simply provide the root directory instead and Arcana will walk the sub-directories and attempt to build any YAML files it finds, e.g.

$ arcana deploy build 'config-root-dir'
./config-root-dir/mri/neuro/fsl.yml: FSL [fast]
./config-root-dir/mri/neuro/mrtrix3.yml: MRtrix3 [dwi2fod, dwi2tensor, tckgen]
./config-root-dir/mri/neuro/freesurfer.yml: Freesurfer [recon-all]
...

Testing#

After an image has been built successfully, it can be tested against previously generated results to check for consistency with previous versions. This can be particularly useful when updating dependency versions. Tests that don’t match previous results within a given tolerance will be flagged for manual review.

To avoid expensive runs when not necessarily (particularly within CI/CD pipelines), in the case that the provenance data saved along the generated reference data will be checked before running the pipelines. If the provenance data would be unchanged (including software dependency versions), then the pipeline test will be skipped.

Test data, both inputs to the pipeline and reference data to check against pipeline outputs, need to be stored in separate directories for each command. Under the pipeline data directory, there should be one or more subdirectories for different tests of the pipeline, and in each of these subdirectories there should be an inputs and an outputs directory, and optionally a YAML file named parameters.yml. Inside the inputs directory there should be file-groups named after each input of the pipeline, and likewise in the outputs directory there should be file-groups named after each output of the pipeline. Any field inputs or outputs should be placed alongside the file-groups in a JSON file called __fields__.json.

Specifying two tests (‘test1’ and ‘test2’) for the FSL FAST example given above (see Building) the directory structure would look like:

FAST
├── test1
│   ├── inputs
│   │   └── in_files.nii.gz
│   ├── outputs
|   │   └── fast
|   │       ├── tissue_class_files.nii.gz
|   │       ├── partial_volumes.nii.gz
|   │       ├── partial-volume-files.nii.gz
|   │       ├── bias-field.nii.gz
|   │       └── probability-map.nii.gz
│   └── parameters.yml
└── test2
    ├── inputs
    │   └── in_files.nii.gz
    ├── outputs
    │   └── fast
    │       ├── tissue_class_files.nii.gz
    │       ├── partial_volumes.nii.gz
    │       ├── partial-volume-files.nii.gz
    │       ├── bias-field.nii.gz
    │       └── probability-map.nii.gz
    └── parameters.yml

To run a test via the CLI point the test command to the YAML configuration file and the data directory containing the test data, e.g.

$ arcana deploy test ./fast.yml ./fast-data
Pipeline test 'test1' ran successfully and outputs matched saved
Pipeline test 'test2' ran successfully and outputs matched saved

To run tests over a suite of image configurations in a directory containing a number of YAML configuration files (i.e. same as building) simply provide the directory to arcana deploy test instead of the path to the YAML config file and supply a directory tree containing the test data, with matching sub-directory structure to the configuration dir. For example, given the following directory structure for the configuration files

mri
└── neuro
    ├── fsl.yml
    ├── mrtrix3.yml
    ...

The test data should be laid out like

mri-data
└── neuro
    ├── fsl
    │   └── fast
    |       ├── test1
    |       │   ├── inputs
    |       │   │   └── in_files.nii.gz
    |       │   ├── outputs
    |       |   │   └── fast
    |       |   │       ├── tissue_class_files.nii.gz
    |       |   │       ├── partial_volumes.nii.gz
    |       |   │       ├── partial-volume-files.nii.gz
    |       |   │       ├── bias-field.nii.gz
    |       |   │       └── probability-map.nii.gz
    |       │   └── parameters.yml
    |       └── test2
    |           ├── inputs
    |           │   └── in_files.nii.gz
    |           ├── outputs
    |           │   └── fast
    |           │       ├── tissue_class_files.nii.gz
    |           │       ├── partial_volumes.nii.gz
    |           │       ├── partial-volume-files.nii.gz
    |           │       ├── bias-field.nii.gz
    |           │       └── probability-map.nii.gz
    |           └── parameters.yml
    └── mrtrix3
        ├── dwi2fod
        |   ├── test1
        |   |   ├── inputs
    ...

Like in the case of a single YAML configuration file, the CLI command to test a suite of image/command configurations is.

$ arcana deploy test ./mri ./mri-data --output test-results.json
...E..F..

While not strictly necessary, it is strongly advised to store test data alongside image/command configurations inside some kind of version control. However, storing large files inside vanilla Git repositories is not recommended, therefore, you will probably want to use one of the extensions designed for dealing with large files:

  • git-lfs - integrates with GitHub but GitHub requires you to pay for storage/egest

  • git-annex - complicated to set up and use, even for experienced Git users, but much more flexible in your storage options.

Autodocs#

Documentation can be automatically generated using from the pipeline configuration YAML files (see Building) using

$ arcana deploy docs <path-to-yaml-or-directory> <docs-output-dir>

Generated HTML documents will be placed in the output dir, with pipelines organised hierarchically to match the structure of the source directory.