Alternative storage systems#
Alternative storage systems can be implemented by writing a new subclass of
DataStore
. The developers are interested in adding support for new systems,
so if you would help to use Arcana with a different storage system please
create an issue for it in the GitHub Issue Tracker.
Required methods#
When subclassing DataStore
, the following abstract methods must be
overridden to implement the appropriate functionality of the data store. For
a reference implementation please see arcana.data.stores.common.FileSystem
.
- class arcana.core.data.store.DataStore[source]
- abstract find_items(row)[source]
Find all data items within a data row and populate the DataRow object with them using the add_file_group and add_field methods.
- Parameters
row (DataRow) – The data row to populate with items
- abstract find_rows(dataset)[source]
Find all data rows for a dataset in the store and populate the Dataset object using its add_row method.
- Parameters
dataset (Dataset) – The dataset to populate with rows
- abstract get_field_value(field)[source]
Extract and return the value of the field from the store
- abstract get_file_group_paths(file_group, cache_only=False)[source]
Cache the file_group locally (if required) and return the locations of the cached primary file and side cars
- Parameters
- Returns
fs_paths – The file-system path to the cached files
- Return type
- Raises
ArcanaCacheError – If cache_only is set and there is a mismatch between the cached and remote versions
- abstract load_dataset_definition(dataset_id: str, name: str) Dict[str, Any] [source]
Load definition of a dataset saved within the store
- Parameters
- Returns
definition – A dct Dataset object that was saved in the data store
- Return type
- abstract put_field_value(field, value)[source]
Inserts or updates the fields into the store
- Parameters
field (Field) – The field to insert into the store
- abstract put_file_group_paths(file_group, fs_paths)[source]
Inserts or updates the file_group into the store
- abstract save_dataset_definition(dataset_id: str, definition: Dict[str, Any], name: str)[source]
Save definition of dataset within the store
- Parameters
dataset_id (str) – The ID/path of the dataset within the store
definition (dict[str, Any]) – A dictionary containing the dct Dataset to be saved. The dictionary is in a format ready to be dumped to file as JSON or YAML.
name (str) – Name for the dataset definition to distinguish it from other definitions for the same directory/project
Optional methods#
The following methods are not strictly necessary to override, but can offer
significant performance boosts by avoiding unnecessary downloads in the
case of DataStore.get_checksums()
and unnecessary remote connections
in the case of DataStore.connect()
and DataStore.disconnect()
(by caching the connection between multiple calls).
- class arcana.core.data.store.DataStore[source]
- connect()[source]
If a connection session is required to the store manage it here
- disconnect()[source]
If a connection session is required to the store manage it here
- get_checksums(file_group)[source]
Override this method to return checksums for files that are stored with remote files (e.g. in XNAT). If no checksums are stored in the store then just leave this method to just access the file and recalculate them.
- Parameters
file_group (FileGroup) – The file_group to return the checksums for
- Returns
checksums – A dictionary with keys corresponding to the relative paths of all files in the file_group from the base path and values equal to the MD5 hex digest. The primary file in the file-set (i.e. the one that the path points to) should be specified by ‘.’.
- Return type