strax.processing package
Submodules
strax.processing.data_reduction module
Functions to perform in-place pulse-level data reduction.
- class strax.processing.data_reduction.ReductionLevel(value)[source]
Bases:
IntEnumIdentifies what type of data reduction has been used on a record.
- BASELINE_CUT = 1
- HITS_ONLY = 2
- METADATA_ONLY = 4
- NO_REDUCTION = 0
- WAVEFORM_REPLACED = 3
- strax.processing.data_reduction.cut_baseline(records, n_before=48, n_after=30)[source]
Replace first n_before and last n_after samples of pulses by 0.
- strax.processing.data_reduction.cut_outside_hits(records, hits, left_extension=2, right_extension=15)[source]
Return records with waveforms zeroed if not within left_extension or right_extension of hits. These extensions properly account for breaking of pulses into records.
If you pass an incomplete (e.g. cut) set of records, we will not save data around hits found in the removed records, even if this stretches into records that you did pass.
strax.processing.general module
- strax.processing.general.abs_time_to_prev_next_interval(things, intervals)[source]
Function which determines the time difference of things to previous and next interval, e.g., events to veto intervals. Assumes that things do not overlap.
- Parameters:
things – Numpy structured array containing strax time fields
intervals – Numpy structured array containing time fields
- Returns:
Two integer arrays with the time difference to the previous and next intervals.
- strax.processing.general.from_break(x, safe_break, not_before=0, left=True, tolerant=False)[source]
Return records on side of a break at least safe_break long If there is no such break, return the best break found.
- strax.processing.general.fully_contained_in(things, containers)[source]
Return array of len(things) with index of interval in containers for which things are fully contained in a container, or -1 if no such exists.
We assume all things and containers are sorted by time. If containers are overlapping, the first container of the thing is chosen.
- strax.processing.general.overlap_indices(a1, n_a, b1, n_b)[source]
Given interval [a1, a1 + n_a), and [b1, b1 + n_b) of integers, return indices [a_start, a_end), [b_start, b_end) of overlapping region.
- strax.processing.general.sort_by_time(x)[source]
Sorts things.
Either by time or by time, then channel if both fields are in the given array.
- strax.processing.general.split_by_containment(things, containers)[source]
Return list of thing-arrays contained in each container. Result is returned as a numba.typed.List or list if containers are empty.
Assumes everything is sorted, and containers are non-overlapping.
- strax.processing.general.split_touching_windows(things, containers, window=0)[source]
Split things by their containers and return a list of length containers.
- Parameters:
things – Sorted array of interval-like data
containers – Sorted array of interval-like data
window – threshold distance for touching check.
- For example:
window = 0: things must overlap one sample
- window = -1: things can start right after container ends
(i.e. container endtime equals the thing starttime, since strax endtimes are exclusive)
- Returns:
- strax.processing.general.touching_windows(things, containers, window=0)[source]
Return array of (start, exclusive end) indices into things which extend to within window of the container, for each container in containers.
- Parameters:
things – Sorted array of interval-like data. We assume all things and containers are sorted by time. When endtime are not sorted, it will return indices of the first and last things which are touching the container.
containers – Sorted array of interval-like data. Containers are allowed to overlap.
window – threshold distance for touching check.
- For example:
window = 0: things must overlap one sample
- window = -1: things can start right after container ends
(i.e. container endtime equals the thing starttime, since strax endtimes are exclusive)
strax.processing.hitlets module
- strax.processing.hitlets.concat_overlapping_hits(hits, extensions, pmt_channels, start, end)[source]
Function which concatenates hits which may overlap after left and right hit extension. Assumes that hits are sorted correctly.
- Note:
This function only updates time, and length of the hit.
- Parameters:
hits – Hits in records.
extensions – Tuple of the left and right hit extension.
pmt_channels – Tuple of the detectors first and last PMT
start – Startime of the chunk
end – Endtime of the chunk
- Returns:
array with concataneted hits.
- strax.processing.hitlets.conditional_entropy(hitlets, template='flat', square_data=False)[source]
Function which estimates the conditional entropy based on the specified template.
In order to compute the conditional entropy each hitlet will be aligned such that its maximum falls into the same sample as for the template. If the maximum is ambiguous the first maximum is taken.
- Parameters:
hitlets – Hitlets for which the entropy shall be computed. Can be any data_kind which offers the fields data and length.
template – Template to compare the data with. Can be either specified as “flat” to use a flat distribution or as a numpy array containing any normalized template.
square_data – If true data will be squared and normalized before estimating the entropy. Otherwise the data will only be normalized.
- Returns:
Array containing the entropy values for each hitlet.
- Note:
The template has to be normalized such that its total area is 1. Independently of the specified options, only samples for which the content is greater zero are used to compute the entropy.
In case of the non-squared case negative samples are omitted in the calculation.
- strax.processing.hitlets.create_hitlets_from_hits(hits, save_outside_hits, channel_range, chunk_start=None, chunk_end=None)[source]
Function which creates hitlets from a bunch of hits.
- Parameters:
hits – Hits found in records.
save_outside_hits – Tuple with left and right hit extension.
channel_range – Detectors change range from channel map.
chunk_start – (optional) start time of a chunk. Ensures that no hitlet is earlier than this timestamp.
chunk_end – (optional) end time of a chunk. Ensures that no hitlet ends later than this timestamp.
- Returns:
Hitlets with temporary fields (data, max_goodness_of_split…)
- strax.processing.hitlets.get_fwxm(hitlet, fraction=0.5)[source]
Estimates the left and right edge of a specific height percentage.
- Parameters:
hitlet – Single hitlet
fraction – Level for which the width shall be computed.
- Returns:
Two floats, left edge and right edge in ns
- Notes:
The function searches for the last sample below and above the specified height level on the left and right hand side of the maximum. When the samples are found the width is estimated based upon a linear interpolation between the respective samples. In case, that the samples cannot be found for either one of the sides the corresponding outer most bin edges are used: left 0; right last sample + 1.
- strax.processing.hitlets.get_hitlets_data(hitlets, records, to_pe, min_hitlet_sample=200)[source]
Function which searches for every hitlet in a given chunk the corresponding records data. Additionally compute the total area of the signal.
- Parameters:
hitlets – Hitlets found in a chunk of records.
records – Records of the chunk.
to_pe – Array with area conversion factors from adc/sample to pe/sample. Please make sure that to_pe has the correct shape. The array index should match the channel number.
min_hitlet_sample – minimal length of the hitlet data field. prevents numba compiling from running into race conditions.
- Returns:
Hitlets including data stored in the “data” field (if it did not exists before it will be added.)
- strax.processing.hitlets.highest_density_region_width(data, fractions_desired, dt=1, fractional_edges=False, _buffer_size=100)[source]
Function which computes the left and right edge based on the outer most sample for the highest density region of a signal.
- Args:
data: Data of a signal, e.g. hitlet or peak including zero length encoding fractions_desired: Area fractions for which HDR should be computed dt: Sample length in ns fractional_edges: If true computes width as fractional time _buffer_size: Maximal number of allowed intervals
- Returns:
np.ndarray: Array of shape (len(fractions_desired), 2) containing left and right edges
strax.processing.peak_building module
- strax.processing.peak_building.find_hit_integration_bounds(hits, excluded_intervals, records, save_outside_hits, n_channels, allow_bounds_beyond_records=False)[source]
Update (lone) hits to include integration bounds. Please note that time and length of the original hit are not changed!
- Parameters:
hits – Hits or lone hits which should be extended by integration bounds.
excluded_intervals – Regions in which hits should not extend to. E.g. Peaks for lone hits. If not needed just put a zero length strax.time_fields array.
records – Records in which hits were found.
save_outside_hits – Hit extension to the left and right in ns not samples!!
n_channels – Number of channels for given detector.
allow_bounds_beyond_records – If true extend left/ right_integration beyond record boundaries. E.g. to negative samples for left side.
- strax.processing.peak_building.find_peak_groups(peaks, gap_threshold, left_extension=0, right_extension=0, max_duration=1000000000)[source]
Return boundaries of groups of peaks separated by gap_threshold, extended left and right.
- Parameters:
peaks – Peaks to group
gap_threshold – Minimum gap between peaks
left_extension – Extend groups by this many ns left
right_extension – “ “ right
max_duration – max duration time of merged peak in ns
- Returns:
time, endtime arrays of group boundaries
- strax.processing.peak_building.find_peaks(hits, adc_to_pe, gap_threshold=300, left_extension=20, right_extension=150, min_area=0, min_channels=2, max_duration=10000000, _result_buffer=None, result_dtype=None)[source]
Return peaks made from grouping hits together Assumes all hits have the same dt.
- Parameters:
hits – Hit (or any interval) to group
left_extension – Extend peaks by this many ns left
right_extension – Extend peaks by this many ns right
gap_threshold – No hits for this much ns means new peak
min_area – Peaks with less than min_area are not returned
min_channels – Peaks with less contributing channels are not returned
max_duration – max duration time of merged peak in ns
- strax.processing.peak_building.integrate_lone_hits(lone_hits, records, peaks, save_outside_hits, n_channels)[source]
Update the area of lone_hits to the integral in ADCcounts x samples.
- Parameters:
lone_hits – Hits outside of peaks
records – Records in which hits and peaks were found
peaks – Peaks
save_outside_hits – (left, right) TIME with wich we should extend
the integration window of hits the integration region :param n_channels: number of channels
TODO: this doesn’t extend the integration range beyond record boundaries
- strax.processing.peak_building.simple_summed_waveform(records, containers, to_pe)[source]
Computes simple (downsampled) summed waveform based on raw data touching a certain container.
- Parameters:
container – Things for which summed waveform should be computed. Must contain data field of desired length.
records – Record infromation which should be used to compute summed waveform.
- Note: To keep this function simple the floating part of the baseline
is only added if the data field in records is not zero. This will lead to a biased representation of the summed waveform! However, this bias is small for shape estimates, but the total charge of the signal should be estimated in an unbiased way.
- strax.processing.peak_building.store_downsampled_waveform(p, waveform_buffer, store_data_top=False, store_data_start=False, waveform_buffer_top=array([1.], dtype=float32))[source]
Downsample the waveform in buffer and store it in p[‘data’] and in p[‘data_top’] if indicated to do so.
- Parameters:
p – Row of a strax peak array, or compatible type. Note that p[‘dt’] is adjusted to match the downsampling.
waveform_buffer – numpy array containing sum waveform during the peak at the input peak’s sampling resolution p[‘dt’].
store_data_top – Boolean which indicates whether to also store into p[‘data_top’] When downsampling results in a fractional number of samples, the peak is shortened rather than extended. This causes data loss, but it is necessary to prevent overlaps between peaks.
store_data_start – Boolean which indicates whether to store the first samples of the waveform in the peak.
- strax.processing.peak_building.sum_waveform(peaks, hits, records, record_links, adc_to_pe, n_top_channels=0, store_data_top=False, store_data_start=False, select_peaks_indices=None)[source]
Compute sum waveforms for all peaks in peaks. Only builds summed waveform other regions in which hits were found. This is required to avoid any bias due to zero-padding and baselining. Will downsample sum waveforms if they do not fit in per-peak buffer.
- Parameters:
peaks – Peaks for which the summed waveform should be build.
hits – Hits which are inside peaks. Must be sorted according to record_i.
records – Records to be used to build peaks.
record_links – Tuple of previous and next records.
n_top_channels – Number of top array channels.
store_data_top – Boolean which indicates whether to store the top array waveform in the peak.
store_data_start – Boolean which indicates whether to store the first samples of the waveform in the peak.
select_peaks_indices – Indices of the peaks for partial processing. In the form of np.array([np.int, np.int, ..]). If None (default), all the peaks are used for the summation. Assumes all peaks AND pulses have the same dt!
strax.processing.peak_merging module
- strax.processing.peak_merging.add_lone_hits(peaks, lone_hits, to_pe, n_top_channels=0, store_data_top=False, store_data_start=False)[source]
Function which adds information from lone hits to peaks if lone hit is (fully) inside a peak (e.g. after merging.). Modifies peak area and data inplace.
- Parameters:
peaks – Numpy array of peaks
lone_hits – Numpy array of lone_hits
to_pe – Gain values to convert lone hit area into PE.
n_top_channels – Number of top array channels.
store_data_top – Boolean which indicates whether to store the top array waveform in the peak.
store_data_start – Boolean which indicates whether to store the first samples of the waveform in the peak.
- strax.processing.peak_merging.gcd_of_array(values)[source]
Return the GCD of all elements in the array.
- strax.processing.peak_merging.merge_peaks(peaks, start_merge_at, end_merge_at, merged=None, max_buffer=100000)[source]
Merge specified peaks with their neighbors, return merged peaks.
- Parameters:
peaks – Record array of strax peak dtype.
start_merge_at – Indices to start merge at
end_merge_at – EXCLUSIVE indices to end merge at
max_buffer – Maximum number of samples in the sum_waveforms and other waveforms of the resulting peaks (after merging). Peaks must be constructed based on the properties of constituent peaks, it being too time-consuming to revert to records/hits.
strax.processing.peak_properties module
- strax.processing.peak_properties.compute_area_fraction_top(peaks, n_top_channels)[source]
Compute the area fraction top for peaks.
- strax.processing.peak_properties.compute_index_of_fraction(peak, fractions_desired, result)[source]
Store the (fractional) indices at which peak reaches fractions_desired of their area in result.
- Parameters:
peak – single strax peak(let) or other data-bearing dtype
fractions_desired – array of floats between 0 and 1
- Returns:
len(fractions_desired) array of floats
- strax.processing.peak_properties.compute_properties(peaks, n_top_channels=0, select_peaks_indices=None)[source]
Compute properties: median_time, width, area_decile_from_midpoint, center_time, and area_fraction_top for peaks.
- Parameters:
peaks – single strax peak(let) or other data-bearing dtype
select_peaks_indices – array of integers informing which peaks to compute default to None in which case compute for all peaks
- strax.processing.peak_properties.compute_widths(peaks)[source]
Compute widths in ns at desired area fractions for peaks.
- Parameters:
peaks – single strax peak(let) or other data-bearing dtype
- strax.processing.peak_properties.index_of_fraction(peaks, fractions_desired)[source]
Return the (fractional) indices at which the peaks reach fractions_desired of their area.
- Parameters:
peaks – strax peak(let)s or other data-bearing dtype
fractions_desired – array of floats between 0 and 1
- Returns:
(len(peaks), len(fractions_desired)) array of floats
strax.processing.peak_splitting module
- strax.processing.peak_splitting.natural_breaks_gof(w, dt, normalize=False, split_low=False, filter_wing_width=0)[source]
Return natural breaks goodness of split/fit for the waveform w a sharp peak gives ~0, two widely separate peaks ~1.
- strax.processing.peak_splitting.split_peaks(peaks, hits, records, rlinks, to_pe, algorithm='local_minimum', data_type='peaks', n_top_channels=0, store_data_top=False, store_data_start=False, **kwargs)[source]
Return peaks split according to algorithm, with waveforms summed and widths computed.
- Note:
Can also be used for hitlets splitting with local_minimum splitter. Just put hitlets instead of peaks.
- Parameters:
peaks – Original peaks. Sum waveform must have been built
and properties must have been computed (if you use them) :param hits: Hits found in records. (or None in case of hitlets
splitting.)
- Parameters:
records – Records from which peaks were built
rlinks – strax.record_links for given records (or None in case of hitlets splitting.)
to_pe – ADC to PE conversion factor array (of n_channels)
algorithm – ‘local_minimum’ or ‘natural_breaks’.
data_type – ‘peaks’ or ‘hitlets’. Specifies whether to use sum_wavefrom or get_hitlets_data to compute the waveform of the new split peaks/hitlets.
n_top_channels – Number of top array channels.
result_dtype – dtype of the result.
store_data_top – Boolean which indicates whether to store the top array waveform in the peak.
store_data_start – Boolean which indicates whether to store the first samples of the waveform in the peak.
Any other options are passed to the algorithm.
strax.processing.pulse_processing module
Functions that perform processing on pulses (other than data reduction functions, which are in data_reduction.py)
- strax.processing.pulse_processing.baseline(records, baseline_samples=40, flip=True, allow_sloppy_chunking=False, fallback_baseline=16000)[source]
Determine baseline as the average of the first baseline_samples of each pulse. Subtract the pulse data from int(baseline), and store the baseline mean and rms.
- Parameters:
baseline_samples – number of samples at start of pulse to average to determine the baseline.
flip – If true, flip sign of data
allow_sloppy_chunking – Allow use of the fallback_baseline in case the 0th fragment of a pulse is missing
fallback_baseline – Fallback baseline (ADC counts) Assumes records are sorted in time (or at least by channel, then time). Assumes record_i information is accurate – so don’t cut pulses before baselining them!
- strax.processing.pulse_processing.filter_records(r, ir)[source]
Apply filter with impulse response ir over the records r. Assumes the filter origin is at the impulse response maximum.
- Parameters:
ws – Waveform matrix, must be float
ir – Impulse response, must have odd length. Will normalize.
prev_r – Previous record map from strax.record_links
next_r – Next record map from strax.record_links
- strax.processing.pulse_processing.filter_waveforms(ws, ir, prev_r, next_r)[source]
Convolve filter with impulse response ir over each row of ws. Assumes the filter origin is at the impulse response maximum.
- Parameters:
ws – Waveform matrix, must be float
ir – Impulse response, must have odd length.
prev_r – Previous record map from strax.record_links
next_r – Next record map from strax.record_links
- strax.processing.pulse_processing.find_hits(records, min_amplitude: int | ndarray = 15, min_height_over_noise: int | ndarray = 0)[source]
Return hits (intervals >= threshold) found in records. Hits that straddle record boundaries are split (perhaps we should fix this?)
NB: returned hits are NOT sorted yet!
strax.processing.statistics module
- strax.processing.statistics.highest_density_region(data, fractions_desired, only_upper_part=False, _buffer_size=10)[source]
Compute highest density region for a given sampled distribution.
This function splits only the stable sort operation into Python, keeping all other computations numba-accelerated for maximum performance.
- Args:
data: Sampled distribution fractions_desired: Area/probability for which HDR should be computed only_upper_part: If True, only compute area between max and current height _buffer_size: Size of result buffer (max number of allowed intervals)
- Returns:
- tuple: (res, res_amp) where res contains interval indices and res_amp contains
amplitudes for desired fractions