Getting started#

Pydra#

Pipelines in Arcana are implemented in and executed with the Pydra dataflow engine, so before getting started with Arcana it is a good idea to familiarise yourself with Pydra’s syntax and concepts. There is short Jupyter notebook tutorial available at https://github.com/nipype/pydra-tutorial, which is a nice place to start with this after the reading the official docs.

Software requirements#

Arcana requires a recent version of Python (>=3.8) to run. So you may need to upgrade your Python version before it is installed. The best way to install Python depends on your OS:

  • Windows - it is very strongly recommended to use Anaconda to install Python because it will manage C dependencies as well

  • Mac - either Homebrew or Anaconda are good options

  • Linux - the native package manager should work ok unless you are using an old Linux distribution that doesn’t support Python 3.8, in which case Linuxbrew is a good option

To deploy Arcana pipelines to Docker images XNAT’s container service, Docker needs to be installed. Please see the Docker docs for how to do this, https://docs.docker.com/engine/install/ for your system.

One of the main strengths of Pydra is the ability to link 3rd party tools together into coherent workflows. 3rd party tools are best run within software containers (e.g. Docker or Singularity), but in cases where that isn’t possible (i.e. when nested within other containers without access to Docker socket or on some high-performance computing clusters) you will obviously need to have installed these dependencies on the system and ensure they are on the system path.

Two command-line tools that the the arcana-medimage sub-package uses for implicit file-format conversions are

Both these packages can be installed using Home/LinuxBrew (you will need to tap MRtrix3/mrtrix3) and Anaconda (use the conda-forge and mrtrix3 repositories for Dcm2Niix and MRtrix3 respectively).

Installation#

Arcana can be installed along with its Python dependencies from the Python Package Index using Pip3

$ pip3 install arcana

Basic usage#

Arcana is implemented in Python, and can be accessed either via it’s API or via the command-line interface (CLI).

The basic usage pattern is

  1. Define a dataset to work with (see Datasets)

  2. Specify columns in the dataset to access data from and store data to (see Frames: Rows and Columns)

  3. Connect a Pydra task or workflow, or an analysis class between columns (see Analysis classes)

  4. Select derivatives to generate (see Generating derivatives)

For example, given a dataset stored within the /data/my-dataset directory, which contains two-layers of sub-directories, for subjects and sessions respectively, FSL’s Brain Extraction Tool (BET) can be executed over all sessions using the command line interface

# Define dataset
$ arcana dataset define 'file///data/my-project' subject session

# Add source column to select a single T1-weighted image in each session subdirectory
$ arcana dataset add-source 'file///data/my-dataset' T1w '.*mprage.*' medimage:Dicom --regex

# Add sink column to store brain mask
$ arcana dataset add-sink 'file///data/my-dataset' brain_mask medimage:NiftiGz

# Apply BET Pydra task, connecting it between the source and sink
$ arcana apply pipeline 'file///data/my-dataset' pydra.tasks.fsl.preprocess.bet:BET \
  --arg name brain_extraction \
  --input T1w in_file medimage:NiftiGz \
  --output brain_mask out_file .

# Derive brain masks for all imaging sessions in dataset
$ arcana derive column 'file///data/my-dataset' brain_maskAPI

This code will iterate over all imaging sessions in the directory tree, find and convert T1-weighted images (which contain ‘mprage’ in their names) from DICOM into the required gzipped NIfTI format, and then execute BET on the converted files before they are saved back into the directory structure at <subject-id>/<session-id>/derivs/brain_mask.nii.gz.

Alternatively, the same steps can be performed using the Python API:

# Import arcana module
from pydra.tasks.fsl.preprocess.bet import BET
from arcana.core.data import Dataset
from arcana.data.spaces.medimage import Clinical
from arcana.data.formats.medimage import Dicom, NiftiGz

# Define dataset
my_dataset = Dataset.load('file///data/my-dataset', space=Clinical,
                          hierarchy=['subject', 'session'])

# Add source column to select a single T1-weighted image in each session subdirectory
my_dataset.add_source('T1w', '.*mprage.*', format=Dicom, is_regex=True)

# Add sink column to store brain mask
my_dataset.add_sink('brain_mask', 'derivs/brain_mask', format=NiftiGz)

# Apply BET Pydra task, connecting it between the source and sink
my_dataset.apply_pipeline(
    BET(name='brain_extraction'),
    inputs=[('T1w', 'in_file', NiftiGz)],  # Specify required input format
    outputs=[('brain_mask', 'out_file')])  # Output format matches stored so can be omitted

# Derive brain masks for all imaging sessions in dataset
my_dataset['brain_mask'].derive()

Applying an Analysis class instead of a Pydra task/workflow follows the same steps up to ‘add-source’ (sinks are automatically added by the analysis class). The following example applies methods for analysing T1-weighted MRI images to the dataset, then calls the methods calculates the average cortical thickness for each session of each subject.

$ arcana apply analysis 'file///data/my-project' bids.mri:T1wAnalysis
$ arcana derive column 'file///data/my-project' avg_cortical_thickness

Doing the same steps via the Python API provides convenient access to the generated data, which a histogram of the distribution over all subjects at Timepoint ‘T3’ can be plotted.

import matplotlib.pyplot as plt
from arcana.analyses.bids.mri import T1wAnalysis

# Apply the T1wAnalysis class to the dataset
my_dataset.apply(T1wAnalysis())

# Generate the average cortical thickness derivative that was added by
# the T1wAnalysis class
my_dataset['avg_cortical_thickness'].derive()

# Get all members at the 'T3' timepoint. Indexing of a column can either
# be a single arg in order to use the IDs for the row_frequency of the column
# ('session') in this case, or the rank of the data space
plt.histogram(my_dataset['avg_cortical_thickness']['T3', None, :])

Note

When referencing objects within the arcana package from the CLI such as file-format classes or data spaces (see Spaces), the standard arcana.*. prefix can be dropped, e.g. medimage:Dicom instead of the full path arcana.data.formats.medimage:Dicom. Classes installed outside of the Arcana package, should be referred to with their full import path.

Licence#

Arcana >=v2.0 is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (see LICENCE). Non-commercial usage is permitted freely on the condition that Arcana is appropriately acknowledged in related publications. Commercial usage is encouraged, but permission from the authors for specific uses must be granted first (see AUTHORS).