strax.storage package

Submodules

strax.storage.common module

Base classes for storage backends, frontends, and savers in strax.

Please see the developer documentation for more details on strax’ storage hierarchy.

exception strax.storage.common.DataCorrupted[source]

Bases: Exception

exception strax.storage.common.DataExistsError(at, message='')[source]

Bases: Exception

Raised when attempting to write a piece of data that is already written.

class strax.storage.common.DataKey(run_id, data_type, lineage)[source]

Bases: object

Request for data to a storage registry.

Instances of this class uniquely identify a single piece of strax data abstractly – that is, it describes the full history of algorithms that have to be run to reproduce it.

It is used for communication between the main Context class and storage frontends.

data_type: str
lineage: dict
property lineage_hash

Deterministic hash of the lineage.

run_id: str
exception strax.storage.common.DataNotAvailable[source]

Bases: Exception

Raised when requested data is not available.

exception strax.storage.common.EmptyDataWarning[source]

Bases: UserWarning

exception strax.storage.common.RunMetadataNotAvailable[source]

Bases: Exception

class strax.storage.common.Saver(metadata, saver_timeout=300)[source]

Bases: object

Interface for saving a data type.

Must work even if forked. Do NOT add unpickleable things as attributes (such as loggers)!

allow_fork = True
allow_rechunk = True
close(wait_for: list | tuple = ())[source]
closed = False
got_exception = None
is_forked = False
save(chunk: Chunk, chunk_i: int, executor=None)[source]

Save a chunk, returning future to wait on or None.

save_from(source: Generator, rechunk=True, executor=None)[source]

Iterate over source and save the results under key along with metadata.

class strax.storage.common.StorageBackend[source]

Bases: object

Storage backend for strax data.

This is a ‘dumb’ interface to data. Each bit of data stored is described by backend-specific keys (e.g. directory names). Finding and assigning backend keys is the responsibility of the StorageFrontend.

The backend class name + backend_key must together uniquely identify a piece of data. So don’t make __init__ take options like ‘path’ or ‘host’, these have to be hardcoded (or made part of the key).

get_metadata(backend_key: DataKey | str, **kwargs) dict[source]

Get the metadata using the backend_key and the Backend specific _get_metadata method. When an unforeseen error occurs, raises an strax.DataCorrupted error. Any kwargs are passed on to _get_metadata.

Parameters:

backend_key – The key the backend should look for (can be string or strax.DataKey)

Returns:

metadata for the data associated to the requested backend-key

Raises:
  • strax.DataCorrupted – This backend is not able to read the metadata but it should exist

  • strax.DataNotAvailable – When there is no data associated with this backend-key

loader(backend_key, time_range=None, chunk_number=None, executor=None)[source]

Iterates over strax data in backend_key.

Parameters:
  • time_range – 2-length arraylike of (start,exclusive end) of desired data. Will return all data that partially overlaps with the range. Default is None, which means get the entire run.

  • chunk_number – Chunk number to load exclusively

  • executor – Executor to push load/decompress operations to

saver(key, metadata, **kwargs)[source]

Return saver for data described by key.

class strax.storage.common.StorageFrontend(readonly=False, provide_run_metadata=None, overwrite='if_broken', take_only=(), exclude=())[source]

Bases: object

Interface to something that knows data-locations and run-level metadata.

For example, a runs database, or a data directory on the file system.

backends: list
can_define_runs = False
define_run(name, sub_run_spec, **metadata)[source]
find(key: DataKey, write=False, check_broken=True, allow_incomplete=False, fuzzy_for=(), fuzzy_for_options=())[source]

Return (str: backend class name, backend-specific) key to get at / write data, or raise exception.

Parameters:
  • key – DataKey of data to load {data_type: (plugin_name, version, {config_option: value, …}, …}

  • write – Set to True if writing new data. The data is immediately registered, so you must follow up on the write!

  • check_broken – If True, raise DataNotAvailable if data has not been complete written, or writing terminated with an exception.

find_several(keys, **kwargs)[source]

Return list with backend keys or False for several data keys.

Options are as for find()

get_metadata(key, allow_incomplete=False, fuzzy_for=(), fuzzy_for_options=())[source]

Retrieve data-level metadata for the specified key.

Other parameters are the same as for .find

loader(key: DataKey, time_range=None, allow_incomplete=False, fuzzy_for=(), fuzzy_for_options=(), chunk_number=None, executor=None)[source]

Return loader for data described by DataKey.

Parameters:
  • key – DataKey describing data

  • time_range – 2-length arraylike of (start, exclusive end) of row numbers to get. Default is None, which means get the entire run.

  • allow_incomplete – Allow loading of data which has not been completely written to disk yet.

  • fuzzy_for – list/tuple of plugin names for which no plugin name, version, or option check is performed.

  • fuzzy_for_options – list/tuple of configuration options for which no check is performed.

  • chunk_number – Chunk number to load exclusively.

  • executor – Executor for pushing load computation to

provide_run_metadata = False
provide_superruns = False
remove(key)[source]

Removes a registration.

Does not delete any actual data

run_metadata(run_id, projection=None)[source]

Return run metadata dictionary, or raise RunMetadataNotAvailable.

saver(key, metadata, **kwargs)[source]

Return saver for data described by DataKey.

storage_type = 1
write_run_metadata(run_id, metadata)[source]

Stores metadata for run_id.

Silently overwrites any previously stored run-level metadata.

class strax.storage.common.StorageType(value)[source]

Bases: IntEnum

Class attribute of how far/close data is when fetched from a given storage frontend.

This is used to prioritize which frontend will be asked first for data (prevents loading data from slow frontends when fast frontends might also have the data)

COMPRESSED = 3
LOCAL = 1
MEMORY = 0
ONLINE = 2
REMOTE = 4
TAPE = 10

strax.storage.files module

class strax.storage.files.DataDirectory(path='.', *args, deep_scan=False, **kwargs)[source]

Bases: StorageFrontend

Simplest registry: single directory with FileStore data sitting in subdirectories.

Run-level metadata is stored in loose json files in the directory.

backend_key(dirname)[source]
can_define_runs = True
provide_run_metadata = False
provide_superruns = True
static raise_if_non_compatible_run_id(run_id)[source]
remove(key)[source]

Removes a registration.

Does not delete any actual data

run_metadata(run_id, projection=None)[source]

Return run metadata dictionary, or raise RunMetadataNotAvailable.

write_run_metadata(run_id, metadata)[source]

Stores metadata for run_id.

Silently overwrites any previously stored run-level metadata.

class strax.storage.files.FileSaver(dirname, metadata, **kwargs)[source]

Bases: Saver

Saves data to compressed binary files.

json_options = {'indent': 4, 'sort_keys': True}
class strax.storage.files.FileSytemBackend(*args, set_target_chunk_mb: int | None = None, **kwargs)[source]

Bases: StorageBackend

Store data locally in a directory of binary files.

Files are named after the chunk number (without extension). Metadata is stored in a file called metadata.json.

exception strax.storage.files.InvalidFolderNameFormat[source]

Bases: Exception

strax.storage.files.dirname_to_prefix(dirname)[source]

Return filename prefix from dirname.

strax.storage.mongo module

I/O format for MongoDB.

This plugin is designed with data monitoring in mind, to put smaller amounts of extracted data into a database for quick access. However it should work with any plugin.

Note that there is no check to make sure the 16MB document size limit is respected!

class strax.storage.mongo.MongoBackend(uri: str, database: str, col_name: str)[source]

Bases: StorageBackend

Mongo storage backend.

class strax.storage.mongo.MongoFrontend(uri: str, database: str, col_name: str, *args, **kwargs)[source]

Bases: StorageFrontend

MongoDB storage frontend.

property collection
storage_type = 2
class strax.storage.mongo.MongoSaver(key: str, metadata: dict, col: Collection, **kwargs)[source]

Bases: Saver

allow_rechunk = False

strax.storage.zipfiles module

class strax.storage.zipfiles.ZipDirectory(path='.', *args, readonly=True, **kwargs)[source]

Bases: StorageFrontend

ZipFile-based storage frontend for strax.

All data for one run is assumed to be in a single zip file <run_id>.zip, with the same file/directory structure as created by FileStore.

We cannot write zip files directly (this would result in concurrency hell), instead these zip files are made by zipping stuff from FileSytemBackend.

remove(key)[source]

Removes a registration.

Does not delete any actual data

run_metadata(run_id)[source]

Return run metadata dictionary, or raise RunMetadataNotAvailable.

storage_typ = 3
write_run_metadata(run_id, metadata)[source]

Stores metadata for run_id.

Silently overwrites any previously stored run-level metadata.

static zip_dir(input_dir, output_zipfile, delete=False)[source]

Zips subdirectories of input_dir to output_zipfile (without compression).

Travels into subdirectories, but not sub-subdirectories. Skips any other files in directory. :param delete: If True, delete original directories

class strax.storage.zipfiles.ZipFileBackend[source]

Bases: StorageBackend

saver(*args, **kwargs)[source]

Return saver for data described by key.

Module contents