strax.storage package
Submodules
strax.storage.common module
Base classes for storage backends, frontends, and savers in strax.
Please see the developer documentation for more details on strax’ storage hierarchy.
- exception strax.storage.common.DataExistsError(at, message='')[source]
Bases:
Exception
Raised when attempting to write a piece of data that is already written.
- class strax.storage.common.DataKey(run_id, data_type, lineage)[source]
Bases:
object
Request for data to a storage registry.
Instances of this class uniquely identify a single piece of strax data abstractly – that is, it describes the full history of algorithms that have to be run to reproduce it.
It is used for communication between the main Context class and storage frontends.
- property lineage_hash
Deterministic hash of the lineage.
- exception strax.storage.common.DataNotAvailable[source]
Bases:
Exception
Raised when requested data is not available.
- exception strax.storage.common.EmptyDataWarning[source]
Bases:
UserWarning
- class strax.storage.common.Saver(metadata, saver_timeout=300)[source]
Bases:
object
Interface for saving a data type.
Must work even if forked. Do NOT add unpickleable things as attributes (such as loggers)!
- allow_fork = True
- allow_rechunk = True
- closed = False
- got_exception = None
- is_forked = False
- class strax.storage.common.StorageBackend[source]
Bases:
object
Storage backend for strax data.
This is a ‘dumb’ interface to data. Each bit of data stored is described by backend-specific keys (e.g. directory names). Finding and assigning backend keys is the responsibility of the StorageFrontend.
The backend class name + backend_key must together uniquely identify a piece of data. So don’t make __init__ take options like ‘path’ or ‘host’, these have to be hardcoded (or made part of the key).
- get_metadata(backend_key: DataKey | str, **kwargs) dict [source]
Get the metadata using the backend_key and the Backend specific _get_metadata method. When an unforeseen error occurs, raises an strax.DataCorrupted error. Any kwargs are passed on to _get_metadata.
- Parameters:
backend_key – The key the backend should look for (can be string or strax.DataKey)
- Returns:
metadata for the data associated to the requested backend-key
- Raises:
strax.DataCorrupted – This backend is not able to read the metadata but it should exist
strax.DataNotAvailable – When there is no data associated with this backend-key
- loader(backend_key, time_range=None, chunk_number=None, executor=None)[source]
Iterates over strax data in backend_key.
- Parameters:
time_range – 2-length arraylike of (start,exclusive end) of desired data. Will return all data that partially overlaps with the range. Default is None, which means get the entire run.
chunk_number – Chunk number to load exclusively
executor – Executor to push load/decompress operations to
- class strax.storage.common.StorageFrontend(readonly=False, provide_run_metadata=None, overwrite='if_broken', take_only=(), exclude=())[source]
Bases:
object
Interface to something that knows data-locations and run-level metadata.
For example, a runs database, or a data directory on the file system.
- can_define_runs = False
- find(key: DataKey, write=False, check_broken=True, allow_incomplete=False, fuzzy_for=(), fuzzy_for_options=())[source]
Return (str: backend class name, backend-specific) key to get at / write data, or raise exception.
- Parameters:
key – DataKey of data to load {data_type: (plugin_name, version, {config_option: value, …}, …}
write – Set to True if writing new data. The data is immediately registered, so you must follow up on the write!
check_broken – If True, raise DataNotAvailable if data has not been complete written, or writing terminated with an exception.
- find_several(keys, **kwargs)[source]
Return list with backend keys or False for several data keys.
Options are as for find()
- get_metadata(key, allow_incomplete=False, fuzzy_for=(), fuzzy_for_options=())[source]
Retrieve data-level metadata for the specified key.
Other parameters are the same as for .find
- loader(key: DataKey, time_range=None, allow_incomplete=False, fuzzy_for=(), fuzzy_for_options=(), chunk_number=None, executor=None)[source]
Return loader for data described by DataKey.
- Parameters:
key – DataKey describing data
time_range – 2-length arraylike of (start, exclusive end) of row numbers to get. Default is None, which means get the entire run.
allow_incomplete – Allow loading of data which has not been completely written to disk yet.
fuzzy_for – list/tuple of plugin names for which no plugin name, version, or option check is performed.
fuzzy_for_options – list/tuple of configuration options for which no check is performed.
chunk_number – Chunk number to load exclusively.
executor – Executor for pushing load computation to
- provide_run_metadata = False
- provide_superruns = False
- run_metadata(run_id, projection=None)[source]
Return run metadata dictionary, or raise RunMetadataNotAvailable.
- storage_type = 1
- class strax.storage.common.StorageType(value)[source]
Bases:
IntEnum
Class attribute of how far/close data is when fetched from a given storage frontend.
This is used to prioritize which frontend will be asked first for data (prevents loading data from slow frontends when fast frontends might also have the data)
- COMPRESSED = 3
- LOCAL = 1
- MEMORY = 0
- ONLINE = 2
- REMOTE = 4
- TAPE = 10
strax.storage.files module
- class strax.storage.files.DataDirectory(path='.', *args, deep_scan=False, **kwargs)[source]
Bases:
StorageFrontend
Simplest registry: single directory with FileStore data sitting in subdirectories.
Run-level metadata is stored in loose json files in the directory.
- can_define_runs = True
- provide_run_metadata = False
- provide_superruns = True
- class strax.storage.files.FileSaver(dirname, metadata, **kwargs)[source]
Bases:
Saver
Saves data to compressed binary files.
- json_options = {'indent': 4, 'sort_keys': True}
- class strax.storage.files.FileSytemBackend(*args, set_target_chunk_mb: int | None = None, **kwargs)[source]
Bases:
StorageBackend
Store data locally in a directory of binary files.
Files are named after the chunk number (without extension). Metadata is stored in a file called metadata.json.
strax.storage.mongo module
I/O format for MongoDB.
This plugin is designed with data monitoring in mind, to put smaller amounts of extracted data into a database for quick access. However it should work with any plugin.
Note that there is no check to make sure the 16MB document size limit is respected!
- class strax.storage.mongo.MongoBackend(uri: str, database: str, col_name: str)[source]
Bases:
StorageBackend
Mongo storage backend.
strax.storage.zipfiles module
- class strax.storage.zipfiles.ZipDirectory(path='.', *args, readonly=True, **kwargs)[source]
Bases:
StorageFrontend
ZipFile-based storage frontend for strax.
All data for one run is assumed to be in a single zip file <run_id>.zip, with the same file/directory structure as created by FileStore.
We cannot write zip files directly (this would result in concurrency hell), instead these zip files are made by zipping stuff from FileSytemBackend.
- storage_typ = 3
- class strax.storage.zipfiles.ZipFileBackend[source]
Bases:
StorageBackend