strax.processing package

Submodules

strax.processing.data_reduction module

Functions to perform in-place pulse-level data reduction.

class strax.processing.data_reduction.ReductionLevel(value)[source]

Bases: IntEnum

Identifies what type of data reduction has been used on a record.

BASELINE_CUT = 1
HITS_ONLY = 2
METADATA_ONLY = 4
NO_REDUCTION = 0
WAVEFORM_REPLACED = 3
strax.processing.data_reduction.cut_baseline(records, n_before=48, n_after=30)[source]

Replace first n_before and last n_after samples of pulses by 0.

strax.processing.data_reduction.cut_outside_hits(records, hits, left_extension=2, right_extension=15)[source]

Return records with waveforms zeroed if not within left_extension or right_extension of hits. These extensions properly account for breaking of pulses into records.

If you pass an incomplete (e.g. cut) set of records, we will not save data around hits found in the removed records, even if this stretches into records that you did pass.

strax.processing.general module

exception strax.processing.general.NoBreakFound[source]

Bases: Exception

strax.processing.general.abs_time_to_prev_next_interval(things, intervals)[source]

Function which determines the time difference of things to previous and next interval, e.g., events to veto intervals. Assumes that things do not overlap.

Parameters:
  • things – Numpy structured array containing strax time fields

  • intervals – Numpy structured array containing time fields

Returns:

Two integer arrays with the time difference to the previous and next intervals.

strax.processing.general.diff(data)[source]

Return time differences between items in data.

strax.processing.general.endtime(x)[source]

Return endtime of intervals x.

strax.processing.general.from_break(x, safe_break, not_before=0, left=True, tolerant=False)[source]

Return records on side of a break at least safe_break long If there is no such break, return the best break found.

strax.processing.general.fully_contained_in(things, containers)[source]

Return array of len(things) with index of interval in containers for which things are fully contained in a container, or -1 if no such exists.

We assume all things and containers are sorted by time. If containers are overlapping, the first container of the thing is chosen.

strax.processing.general.overlap_indices(a1, n_a, b1, n_b)[source]

Given interval [a1, a1 + n_a), and [b1, b1 + n_b) of integers, return indices [a_start, a_end), [b_start, b_end) of overlapping region.

strax.processing.general.sort_by_time(x)[source]

Sorts things.

Either by time or by time, then channel if both fields are in the given array.

strax.processing.general.split_by_containment(things, containers)[source]

Return list of thing-arrays contained in each container. Result is returned as a numba.typed.List or list if containers are empty.

Assumes everything is sorted, and containers are non-overlapping.

strax.processing.general.split_touching_windows(things, containers, window=0)[source]

Split things by their containers and return a list of length containers.

Parameters:
  • things – Sorted array of interval-like data

  • containers – Sorted array of interval-like data

  • window – threshold distance for touching check.

For example:
  • window = 0: things must overlap one sample

  • window = -1: things can start right after container ends

    (i.e. container endtime equals the thing starttime, since strax endtimes are exclusive)

Returns:

strax.processing.general.touching_windows(things, containers, window=0)[source]

Return array of (start, exclusive end) indices into things which extend to within window of the container, for each container in containers.

Parameters:
  • things – Sorted array of interval-like data. We assume all things and containers are sorted by time. When endtime are not sorted, it will return indices of the first and last things which are touching the container.

  • containers – Sorted array of interval-like data. Containers are allowed to overlap.

  • window – threshold distance for touching check.

For example:
  • window = 0: things must overlap one sample

  • window = -1: things can start right after container ends

    (i.e. container endtime equals the thing starttime, since strax endtimes are exclusive)

strax.processing.hitlets module

strax.processing.hitlets.concat_overlapping_hits(hits, extensions, pmt_channels, start, end)[source]

Function which concatenates hits which may overlap after left and right hit extension. Assumes that hits are sorted correctly.

Note:

This function only updates time, and length of the hit.

Parameters:
  • hits – Hits in records.

  • extensions – Tuple of the left and right hit extension.

  • pmt_channels – Tuple of the detectors first and last PMT

  • start – Startime of the chunk

  • end – Endtime of the chunk

Returns:

array with concataneted hits.

strax.processing.hitlets.conditional_entropy(hitlets, template='flat', square_data=False)[source]

Function which estimates the conditional entropy based on the specified template.

In order to compute the conditional entropy each hitlet will be aligned such that its maximum falls into the same sample as for the template. If the maximum is ambiguous the first maximum is taken.

Parameters:
  • hitlets – Hitlets for which the entropy shall be computed. Can be any data_kind which offers the fields data and length.

  • template – Template to compare the data with. Can be either specified as “flat” to use a flat distribution or as a numpy array containing any normalized template.

  • square_data – If true data will be squared and normalized before estimating the entropy. Otherwise the data will only be normalized.

Returns:

Array containing the entropy values for each hitlet.

Note:

The template has to be normalized such that its total area is 1. Independently of the specified options, only samples for which the content is greater zero are used to compute the entropy.

In case of the non-squared case negative samples are omitted in the calculation.

strax.processing.hitlets.create_hitlets_from_hits(hits, save_outside_hits, channel_range, chunk_start=None, chunk_end=None)[source]

Function which creates hitlets from a bunch of hits.

Parameters:
  • hits – Hits found in records.

  • save_outside_hits – Tuple with left and right hit extension.

  • channel_range – Detectors change range from channel map.

  • chunk_start – (optional) start time of a chunk. Ensures that no hitlet is earlier than this timestamp.

  • chunk_end – (optional) end time of a chunk. Ensures that no hitlet ends later than this timestamp.

Returns:

Hitlets with temporary fields (data, max_goodness_of_split…)

strax.processing.hitlets.get_fwxm(hitlet, fraction=0.5)[source]

Estimates the left and right edge of a specific height percentage.

Parameters:
  • hitlet – Single hitlet

  • fraction – Level for which the width shall be computed.

Returns:

Two floats, left edge and right edge in ns

Notes:

The function searches for the last sample below and above the specified height level on the left and right hand side of the maximum. When the samples are found the width is estimated based upon a linear interpolation between the respective samples. In case, that the samples cannot be found for either one of the sides the corresponding outer most bin edges are used: left 0; right last sample + 1.

strax.processing.hitlets.get_hitlets_data(hitlets, records, to_pe, min_hitlet_sample=200)[source]

Function which searches for every hitlet in a given chunk the corresponding records data. Additionally compute the total area of the signal.

Parameters:
  • hitlets – Hitlets found in a chunk of records.

  • records – Records of the chunk.

  • to_pe – Array with area conversion factors from adc/sample to pe/sample. Please make sure that to_pe has the correct shape. The array index should match the channel number.

  • min_hitlet_sample – minimal length of the hitlet data field. prevents numba compiling from running into race conditions.

Returns:

Hitlets including data stored in the “data” field (if it did not exists before it will be added.)

strax.processing.hitlets.highest_density_region_width(data, fractions_desired, dt=1, fractional_edges=False, _buffer_size=100)[source]

Function which computes the left and right edge based on the outer most sample for the highest density region of a signal.

Args:

data: Data of a signal, e.g. hitlet or peak including zero length encoding fractions_desired: Area fractions for which HDR should be computed dt: Sample length in ns fractional_edges: If true computes width as fractional time _buffer_size: Maximal number of allowed intervals

Returns:

np.ndarray: Array of shape (len(fractions_desired), 2) containing left and right edges

strax.processing.hitlets.hitlet_properties(hitlets)[source]

Computes additional hitlet properties such as amplitude, FHWM, etc.

strax.processing.peak_building module

strax.processing.peak_building.find_hit_integration_bounds(hits, excluded_intervals, records, save_outside_hits, n_channels, allow_bounds_beyond_records=False)[source]

Update (lone) hits to include integration bounds. Please note that time and length of the original hit are not changed!

Parameters:
  • hits – Hits or lone hits which should be extended by integration bounds.

  • excluded_intervals – Regions in which hits should not extend to. E.g. Peaks for lone hits. If not needed just put a zero length strax.time_fields array.

  • records – Records in which hits were found.

  • save_outside_hits – Hit extension to the left and right in ns not samples!!

  • n_channels – Number of channels for given detector.

  • allow_bounds_beyond_records – If true extend left/ right_integration beyond record boundaries. E.g. to negative samples for left side.

strax.processing.peak_building.find_peak_groups(peaks, gap_threshold, left_extension=0, right_extension=0, max_duration=1000000000)[source]

Return boundaries of groups of peaks separated by gap_threshold, extended left and right.

Parameters:
  • peaks – Peaks to group

  • gap_threshold – Minimum gap between peaks

  • left_extension – Extend groups by this many ns left

  • right_extension – “ “ right

  • max_duration – max duration time of merged peak in ns

Returns:

time, endtime arrays of group boundaries

strax.processing.peak_building.find_peaks(hits, adc_to_pe, gap_threshold=300, left_extension=20, right_extension=150, min_area=0, min_channels=2, max_duration=10000000, _result_buffer=None, result_dtype=None)[source]

Return peaks made from grouping hits together Assumes all hits have the same dt.

Parameters:
  • hits – Hit (or any interval) to group

  • left_extension – Extend peaks by this many ns left

  • right_extension – Extend peaks by this many ns right

  • gap_threshold – No hits for this much ns means new peak

  • min_area – Peaks with less than min_area are not returned

  • min_channels – Peaks with less contributing channels are not returned

  • max_duration – max duration time of merged peak in ns

strax.processing.peak_building.integrate_lone_hits(lone_hits, records, peaks, save_outside_hits, n_channels)[source]

Update the area of lone_hits to the integral in ADCcounts x samples.

Parameters:
  • lone_hits – Hits outside of peaks

  • records – Records in which hits and peaks were found

  • peaks – Peaks

  • save_outside_hits – (left, right) TIME with wich we should extend

the integration window of hits the integration region :param n_channels: number of channels

TODO: this doesn’t extend the integration range beyond record boundaries

strax.processing.peak_building.simple_summed_waveform(records, containers, to_pe)[source]

Computes simple (downsampled) summed waveform based on raw data touching a certain container.

Parameters:
  • container – Things for which summed waveform should be computed. Must contain data field of desired length.

  • records – Record infromation which should be used to compute summed waveform.

Note: To keep this function simple the floating part of the baseline

is only added if the data field in records is not zero. This will lead to a biased representation of the summed waveform! However, this bias is small for shape estimates, but the total charge of the signal should be estimated in an unbiased way.

strax.processing.peak_building.store_downsampled_waveform(p, waveform_buffer, store_data_top=False, store_data_start=False, waveform_buffer_top=array([1.], dtype=float32))[source]

Downsample the waveform in buffer and store it in p[‘data’] and in p[‘data_top’] if indicated to do so.

Parameters:
  • p – Row of a strax peak array, or compatible type. Note that p[‘dt’] is adjusted to match the downsampling.

  • waveform_buffer – numpy array containing sum waveform during the peak at the input peak’s sampling resolution p[‘dt’].

  • store_data_top – Boolean which indicates whether to also store into p[‘data_top’] When downsampling results in a fractional number of samples, the peak is shortened rather than extended. This causes data loss, but it is necessary to prevent overlaps between peaks.

  • store_data_start – Boolean which indicates whether to store the first samples of the waveform in the peak.

strax.processing.peak_building.sum_waveform(peaks, hits, records, record_links, adc_to_pe, n_top_channels=0, store_data_top=False, store_data_start=False, select_peaks_indices=None)[source]

Compute sum waveforms for all peaks in peaks. Only builds summed waveform other regions in which hits were found. This is required to avoid any bias due to zero-padding and baselining. Will downsample sum waveforms if they do not fit in per-peak buffer.

Parameters:
  • peaks – Peaks for which the summed waveform should be build.

  • hits – Hits which are inside peaks. Must be sorted according to record_i.

  • records – Records to be used to build peaks.

  • record_links – Tuple of previous and next records.

  • n_top_channels – Number of top array channels.

  • store_data_top – Boolean which indicates whether to store the top array waveform in the peak.

  • store_data_start – Boolean which indicates whether to store the first samples of the waveform in the peak.

  • select_peaks_indices – Indices of the peaks for partial processing. In the form of np.array([np.int, np.int, ..]). If None (default), all the peaks are used for the summation. Assumes all peaks AND pulses have the same dt!

strax.processing.peak_merging module

strax.processing.peak_merging.add_lone_hits(peaks, lone_hits, to_pe, n_top_channels=0, store_data_top=False, store_data_start=False)[source]

Function which adds information from lone hits to peaks if lone hit is (fully) inside a peak (e.g. after merging.). Modifies peak area and data inplace.

Parameters:
  • peaks – Numpy array of peaks

  • lone_hits – Numpy array of lone_hits

  • to_pe – Gain values to convert lone hit area into PE.

  • n_top_channels – Number of top array channels.

  • store_data_top – Boolean which indicates whether to store the top array waveform in the peak.

  • store_data_start – Boolean which indicates whether to store the first samples of the waveform in the peak.

strax.processing.peak_merging.gcd_of_array(values)[source]

Return the GCD of all elements in the array.

strax.processing.peak_merging.merge_peaks(peaks, start_merge_at, end_merge_at, merged=None, max_buffer=100000)[source]

Merge specified peaks with their neighbors, return merged peaks.

Parameters:
  • peaks – Record array of strax peak dtype.

  • start_merge_at – Indices to start merge at

  • end_merge_at – EXCLUSIVE indices to end merge at

  • max_buffer – Maximum number of samples in the sum_waveforms and other waveforms of the resulting peaks (after merging). Peaks must be constructed based on the properties of constituent peaks, it being too time-consuming to revert to records/hits.

strax.processing.peak_merging.replace_merged(orig, merge)[source]

Return sorted array of ‘merge’ and members of ‘orig’ that do not touch any of merge.

Parameters:
  • orig – Array of interval-like objects (e.g. peaks)

  • merge – Array of interval-like objects (e.g. peaks)

strax.processing.peak_properties module

strax.processing.peak_properties.compute_area_fraction_top(peaks, n_top_channels)[source]

Compute the area fraction top for peaks.

strax.processing.peak_properties.compute_index_of_fraction(peak, fractions_desired, result)[source]

Store the (fractional) indices at which peak reaches fractions_desired of their area in result.

Parameters:
  • peak – single strax peak(let) or other data-bearing dtype

  • fractions_desired – array of floats between 0 and 1

Returns:

len(fractions_desired) array of floats

strax.processing.peak_properties.compute_properties(peaks, n_top_channels=0, select_peaks_indices=None)[source]

Compute properties: median_time, width, area_decile_from_midpoint, center_time, and area_fraction_top for peaks.

Parameters:
  • peaks – single strax peak(let) or other data-bearing dtype

  • select_peaks_indices – array of integers informing which peaks to compute default to None in which case compute for all peaks

strax.processing.peak_properties.compute_widths(peaks)[source]

Compute widths in ns at desired area fractions for peaks.

Parameters:

peaks – single strax peak(let) or other data-bearing dtype

strax.processing.peak_properties.index_of_fraction(peaks, fractions_desired)[source]

Return the (fractional) indices at which the peaks reach fractions_desired of their area.

Parameters:
  • peaks – strax peak(let)s or other data-bearing dtype

  • fractions_desired – array of floats between 0 and 1

Returns:

(len(peaks), len(fractions_desired)) array of floats

strax.processing.peak_splitting module

strax.processing.peak_splitting.natural_breaks_gof(w, dt, normalize=False, split_low=False, filter_wing_width=0)[source]

Return natural breaks goodness of split/fit for the waveform w a sharp peak gives ~0, two widely separate peaks ~1.

strax.processing.peak_splitting.split_peaks(peaks, hits, records, rlinks, to_pe, algorithm='local_minimum', data_type='peaks', n_top_channels=0, store_data_top=False, store_data_start=False, **kwargs)[source]

Return peaks split according to algorithm, with waveforms summed and widths computed.

Note:

Can also be used for hitlets splitting with local_minimum splitter. Just put hitlets instead of peaks.

Parameters:

peaks – Original peaks. Sum waveform must have been built

and properties must have been computed (if you use them) :param hits: Hits found in records. (or None in case of hitlets

splitting.)

Parameters:
  • records – Records from which peaks were built

  • rlinks – strax.record_links for given records (or None in case of hitlets splitting.)

  • to_pe – ADC to PE conversion factor array (of n_channels)

  • algorithm – ‘local_minimum’ or ‘natural_breaks’.

  • data_type – ‘peaks’ or ‘hitlets’. Specifies whether to use sum_wavefrom or get_hitlets_data to compute the waveform of the new split peaks/hitlets.

  • n_top_channels – Number of top array channels.

  • result_dtype – dtype of the result.

  • store_data_top – Boolean which indicates whether to store the top array waveform in the peak.

  • store_data_start – Boolean which indicates whether to store the first samples of the waveform in the peak.

Any other options are passed to the algorithm.

strax.processing.peak_splitting.symmetric_moving_average(a, wing_width)[source]

Return the moving average of a, over windows of length [2 * wing_width + 1] centered on each sample.

(i.e. the window covers each sample itself, plus a ‘wing’ of width wing_width on either side)

strax.processing.pulse_processing module

Functions that perform processing on pulses (other than data reduction functions, which are in data_reduction.py)

strax.processing.pulse_processing.baseline(records, baseline_samples=40, flip=True, allow_sloppy_chunking=False, fallback_baseline=16000)[source]

Determine baseline as the average of the first baseline_samples of each pulse. Subtract the pulse data from int(baseline), and store the baseline mean and rms.

Parameters:
  • baseline_samples – number of samples at start of pulse to average to determine the baseline.

  • flip – If true, flip sign of data

  • allow_sloppy_chunking – Allow use of the fallback_baseline in case the 0th fragment of a pulse is missing

  • fallback_baseline – Fallback baseline (ADC counts) Assumes records are sorted in time (or at least by channel, then time). Assumes record_i information is accurate – so don’t cut pulses before baselining them!

strax.processing.pulse_processing.filter_records(r, ir)[source]

Apply filter with impulse response ir over the records r. Assumes the filter origin is at the impulse response maximum.

Parameters:
  • ws – Waveform matrix, must be float

  • ir – Impulse response, must have odd length. Will normalize.

  • prev_r – Previous record map from strax.record_links

  • next_r – Next record map from strax.record_links

strax.processing.pulse_processing.filter_waveforms(ws, ir, prev_r, next_r)[source]

Convolve filter with impulse response ir over each row of ws. Assumes the filter origin is at the impulse response maximum.

Parameters:
  • ws – Waveform matrix, must be float

  • ir – Impulse response, must have odd length.

  • prev_r – Previous record map from strax.record_links

  • next_r – Next record map from strax.record_links

strax.processing.pulse_processing.find_hits(records, min_amplitude: int | ndarray = 15, min_height_over_noise: int | ndarray = 0)[source]

Return hits (intervals >= threshold) found in records. Hits that straddle record boundaries are split (perhaps we should fix this?)

NB: returned hits are NOT sorted yet!

strax.processing.pulse_processing.integrate(records)[source]

Integrate records in-place.

strax.processing.pulse_processing.raw_to_records(raw_records)[source]
strax.processing.pulse_processing.record_length_from_dtype(dtype)[source]

Return (prev_r, next_r), each arrays of indices of previous/next record in the same pulse, or -1 if this is not applicable.

strax.processing.pulse_processing.zero_out_of_bounds(records)[source]

Set waveforms to zero out of pulse bounds.

strax.processing.statistics module

strax.processing.statistics.highest_density_region(data, fractions_desired, only_upper_part=False, _buffer_size=10)[source]

Compute highest density region for a given sampled distribution.

This function splits only the stable sort operation into Python, keeping all other computations numba-accelerated for maximum performance.

Args:

data: Sampled distribution fractions_desired: Area/probability for which HDR should be computed only_upper_part: If True, only compute area between max and current height _buffer_size: Size of result buffer (max number of allowed intervals)

Returns:
tuple: (res, res_amp) where res contains interval indices and res_amp contains

amplitudes for desired fractions

Module contents