Storage

Overview

Players in strax’s storage system take on one of three roles:
  • StorageFrontend: Find data locations, and communicate this to one or more StorageBackend instances;

  • StorageBackend: load pieces of data, and create instances of Saver;

  • Saver: save pieces of data to a specific location.

As an example, a StorageFrontend could talk to a database that tracks which data is stored where. A StorageBackend then retrieves data from local disks, while another might retrieve it remotely using SSH or other transfer systems. The front-end decides which backend is appropriate for a given request. Finally, a Savers guides the process of writing a particular piece of data to disk or databases (potentially from multiple cores), compressing and rechunking as needed.

To implement a new way of storing and/or tracking data, you must implement (subclass) all or some of these classes. This means subclassing them and overriding a few specific methods (called ‘abstract methods’ because they raise NotImplementedError if they are not overridden).

Keys

In strax, a piece of data is identified by a DataKey, consisting of three components:
  • The run id

  • The data type

  • The complete lineage of the data. This includes, for the data type itself, and all types it depends on (and their dependencies, and so forth): * The plugin class name that produced the data; * The version string of the plugin; * The values of all configuration options the plugin took (whether they were explicitly specified or left as default).

When you ask for data using Context.get_xxx, the context will produce a key like this, and pass it to the StorageFrontend. It then looks for a filename or database collection name that matches this key – something a StorageBackend understands. which is therefore generically called a backend key. The matching between DataKey and backend key can be done very strictly, or more loosely, depending on how the context is configured. This way you can choose to be completely sure about what data you get, or be more flexible and load whatever is available. TODO: ref context documentation.

Run-level metadata

Metadata can be associated with a run, but no particular data type. The StorageFrontend must take care of saving and loading these.

Such run-level metadata can be crucial in providing run-dependent default setting for configuration options, for example, calibrated quantities necessary for data processing (e.g. electron lifetime and PMT gains).