Alternative storage systems#

Alternative storage systems can be implemented by writing a new subclass of DataStore. The developers are interested in adding support for new systems, so if you would help to use Arcana with a different storage system please create an issue for it in the GitHub Issue Tracker.

Required methods#

When subclassing DataStore, the following abstract methods must be overridden to implement the appropriate functionality of the data store. For a reference implementation please see arcana.dirtree.data.SimpleStore.

class arcana.core.data.store.DataStore[source]
abstract find_items(row)[source]

Find all data items within a data row and populate the DataRow object with them using the add_file_group and add_field methods.

Parameters

row (DataRow) – The data row to populate with items

abstract find_rows(dataset)[source]

Find all data rows for a dataset in the store and populate the Dataset object using its add_row method.

Parameters

dataset (Dataset) – The dataset to populate with rows

abstract get_field_value(field)[source]

Extract and return the value of the field from the store

Parameters

field (Field) – The field to retrieve the value for

Returns

value – The value of the Field

Return type

int | float | str | ty.List[int] | ty.List[float] | ty.List[str]

abstract get_file_group_paths(file_group, cache_only=False)[source]

Cache the file_group locally (if required) and return the locations of the cached primary file and side cars

Parameters
  • file_group (FileGroup) – The file_group to cache locally

  • cache_only (bool) – Whether to attempt to extract the file groups from the local cache (if applicable) and raise an error otherwise

Returns

fs_paths – The file-system path to the cached files

Return type

list[str]

Raises

ArcanaCacheError – If cache_only is set and there is a mismatch between the cached and remote versions

abstract load_dataset_definition(dataset_id: str, name: str) Dict[str, Any][source]

Load definition of a dataset saved within the store

Parameters
  • dataset_id (str) – The ID (e.g. file-system path, XNAT project ID) of the project

  • name (str) – Name for the dataset definition to distinguish it from other definitions for the same directory/project

Returns

definition – A dct Dataset object that was saved in the data store

Return type

dict[str, Any]

abstract put_field_value(field, value)[source]

Inserts or updates the fields into the store

Parameters

field (Field) – The field to insert into the store

abstract put_file_group_paths(file_group, fs_paths)[source]

Inserts or updates the file_group into the store

Parameters
  • file_group (FileGroup) – The file_group to insert into the store

  • fs_paths (list[Path]) – The file-system paths to the files/directories to sync

Returns

cached_paths – The paths of the files where they are cached in the file system

Return type

list[str]

abstract save_dataset_definition(dataset_id: str, definition: Dict[str, Any], name: str)[source]

Save definition of dataset within the store

Parameters
  • dataset_id (str) – The ID/path of the dataset within the store

  • definition (dict[str, Any]) – A dictionary containing the dct Dataset to be saved. The dictionary is in a format ready to be dumped to file as JSON or YAML.

  • name (str) – Name for the dataset definition to distinguish it from other definitions for the same directory/project

Optional methods#

The following methods are not strictly necessary to override, but can offer significant performance boosts by avoiding unnecessary downloads in the case of DataStore.get_checksums() and unnecessary remote connections in the case of DataStore.connect() and DataStore.disconnect() (by caching the connection between multiple calls).

class arcana.core.data.store.DataStore[source]
connect()[source]

If a connection session is required to the store manage it here

disconnect()[source]

If a connection session is required to the store manage it here

get_checksums(file_group)[source]

Override this method to return checksums for files that are stored with remote files (e.g. in XNAT). If no checksums are stored in the store then just leave this method to just access the file and recalculate them.

Parameters

file_group (FileGroup) – The file_group to return the checksums for

Returns

checksums – A dictionary with keys corresponding to the relative paths of all files in the file_group from the base path and values equal to the MD5 hex digest. The primary file in the file-set (i.e. the one that the path points to) should be specified by ‘.’.

Return type

dct[str, str]