strax.processing package

Submodules

strax.processing.data_reduction module

Functions to perform in-place pulse-level data reduction.

class strax.processing.data_reduction.ReductionLevel(value)[source]

Bases: IntEnum

Identifies what type of data reduction has been used on a record.

BASELINE_CUT = 1
HITS_ONLY = 2
METADATA_ONLY = 4
NO_REDUCTION = 0
WAVEFORM_REPLACED = 3
strax.processing.data_reduction.cut_baseline(records, n_before=48, n_after=30)[source]

Replace first n_before and last n_after samples of pulses by 0.

strax.processing.data_reduction.cut_outside_hits(records, hits, left_extension=2, right_extension=15)[source]

Return records with waveforms zeroed if not within left_extension or right_extension of hits. These extensions properly account for breaking of pulses into records.

If you pass an incomplete (e.g. cut) set of records, we will not save data around hits found in the removed records, even if this stretches into records that you did pass.

strax.processing.general module

exception strax.processing.general.NoBreakFound[source]

Bases: Exception

strax.processing.general.abs_time_to_prev_next_interval(things, intervals)[source]

Function which determines the time difference of things to previous and next interval, e.g., events to veto intervals. Assumes that things do not overlap.

Parameters:
  • things – Numpy structured array containing strax time fields

  • intervals – Numpy structured array containing time fields

Returns:

Two integer arrays with the time difference to the previous and next intervals.

strax.processing.general.endtime(x)[source]

Return endtime of intervals x.

strax.processing.general.from_break(x, safe_break, not_before=0, left=True, tolerant=False)[source]

Return records on side of a break at least safe_break long If there is no such break, return the best break found.

strax.processing.general.fully_contained_in(things, containers)[source]

Return array of len(things) with index of interval in containers for which things are fully contained in a container, or -1 if no such exists.

We assume all things and containers are sorted by time. If containers are overlapping, the first container of the thing is chosen.

strax.processing.general.overlap_indices(a1, n_a, b1, n_b)[source]

Given interval [a1, a1 + n_a), and [b1, b1 + n_b) of integers, return indices [a_start, a_end), [b_start, b_end) of overlapping region.

strax.processing.general.sort_by_time(x)[source]

Sorts things.

Either by time or by time, then channel if both fields are in the given array.

strax.processing.general.split_by_containment(things, containers)[source]

Return list of thing-arrays contained in each container. Result is returned as a numba.typed.List or list if containers are empty.

Assumes everything is sorted, and containers are non-overlapping.

strax.processing.general.split_touching_windows(things, containers, window=0)[source]

Split things by their containers and return a list of length containers.

Parameters:
  • things – Sorted array of interval-like data

  • containers – Sorted array of interval-like data

  • window – threshold distance for touching check.

For example:
  • window = 0: things must overlap one sample

  • window = -1: things can start right after container ends

    (i.e. container endtime equals the thing starttime, since strax endtimes are exclusive)

Returns:

strax.processing.general.touching_windows(things, containers, window=0)[source]

Return array of (start, exclusive end) indices into things which extend to within window of the container, for each container in containers.

Parameters:
  • things – Sorted array of interval-like data. We assume all things and containers are sorted by time. When endtime are not sorted, it will return indices of the first and last things which are touching the container.

  • containers – Sorted array of interval-like data. Containers are allowed to overlap.

  • window – threshold distance for touching check.

For example:
  • window = 0: things must overlap one sample

  • window = -1: things can start right after container ends

    (i.e. container endtime equals the thing starttime, since strax endtimes are exclusive)

strax.processing.hitlets module

strax.processing.hitlets.concat_overlapping_hits(hits, extensions, pmt_channels, start, end)[source]

Function which concatenates hits which may overlap after left and right hit extension. Assumes that hits are sorted correctly.

Note:

This function only updates time, and length of the hit.

Parameters:
  • hits – Hits in records.

  • extensions – Tuple of the left and right hit extension.

  • pmt_channels – Tuple of the detectors first and last PMT

  • start – Startime of the chunk

  • end – Endtime of the chunk

Returns:

array with concataneted hits.

strax.processing.hitlets.conditional_entropy(hitlets, template='flat', square_data=False)[source]

Function which estimates the conditional entropy based on the specified template.

In order to compute the conditional entropy each hitlet will be aligned such that its maximum falls into the same sample as for the template. If the maximum is ambiguous the first maximum is taken.

Parameters:
  • hitlets – Hitlets for which the entropy shall be computed. Can be any data_kind which offers the fields data and length.

  • template – Template to compare the data with. Can be either specified as “flat” to use a flat distribution or as a numpy array containing any normalized template.

  • square_data – If true data will be squared and normalized before estimating the entropy. Otherwise the data will only be normalized.

Returns:

Array containing the entropy values for each hitlet.

Note:

The template has to be normalized such that its total area is 1. Independently of the specified options, only samples for which the content is greater zero are used to compute the entropy.

In case of the non-squared case negative samples are omitted in the calculation.

strax.processing.hitlets.create_hitlets_from_hits(hits, save_outside_hits, channel_range, chunk_start=None, chunk_end=None)[source]

Function which creates hitlets from a bunch of hits.

Parameters:
  • hits – Hits found in records.

  • save_outside_hits – Tuple with left and right hit extension.

  • channel_range – Detectors change range from channel map.

  • chunk_start – (optional) start time of a chunk. Ensures that no hitlet is earlier than this timestamp.

  • chunk_end – (optional) end time of a chunk. Ensures that no hitlet ends later than this timestamp.

Returns:

Hitlets with temporary fields (data, max_goodness_of_split…)

strax.processing.hitlets.get_fwxm(hitlet, fraction=0.5)[source]

Estimates the left and right edge of a specific height percentage.

Parameters:
  • hitlet – Single hitlet

  • fraction – Level for which the width shall be computed.

Returns:

Two floats, left edge and right edge in ns

Notes:

The function searches for the last sample below and above the specified height level on the left and right hand side of the maximum. When the samples are found the width is estimated based upon a linear interpolation between the respective samples. In case, that the samples cannot be found for either one of the sides the corresponding outer most bin edges are used: left 0; right last sample + 1.

strax.processing.hitlets.get_hitlets_data(hitlets, records, to_pe, min_hitlet_sample=200)[source]

Function which searches for every hitlet in a given chunk the corresponding records data. Additionally compute the total area of the signal.

Parameters:
  • hitlets – Hitlets found in a chunk of records.

  • records – Records of the chunk.

  • to_pe – Array with area conversion factors from adc/sample to pe/sample. Please make sure that to_pe has the correct shape. The array index should match the channel number.

  • min_hitlet_sample – minimal length of the hitlet data field. prevents numba compiling from running into race conditions.

Returns:

Hitlets including data stored in the “data” field (if it did not exists before it will be added.)

strax.processing.hitlets.hitlet_properties(hitlets)[source]

Computes additional hitlet properties such as amplitude, FHWM, etc.

strax.processing.peak_building module

strax.processing.peak_building.find_hit_integration_bounds(hits, excluded_intervals, records, save_outside_hits, n_channels, allow_bounds_beyond_records=False)[source]

Update (lone) hits to include integration bounds. Please note that time and length of the original hit are not changed!

Parameters:
  • hits – Hits or lone hits which should be extended by integration bounds.

  • excluded_intervals – Regions in which hits should not extend to. E.g. Peaks for lone hits. If not needed just put a zero length strax.time_fields array.

  • records – Records in which hits were found.

  • save_outside_hits – Hit extension to the left and right in ns not samples!!

  • n_channels – Number of channels for given detector.

  • allow_bounds_beyond_records – If true extend left/ right_integration beyond record boundaries. E.g. to negative samples for left side.

strax.processing.peak_building.find_peak_groups(peaks, gap_threshold, left_extension=0, right_extension=0, max_duration=1000000000)[source]

Return boundaries of groups of peaks separated by gap_threshold, extended left and right.

Parameters:
  • peaks – Peaks to group

  • gap_threshold – Minimum gap between peaks

  • left_extension – Extend groups by this many ns left

  • right_extension – “ “ right

  • max_duration – max duration time of merged peak in ns

Returns:

time, endtime arrays of group boundaries

strax.processing.peak_building.find_peaks(hits, adc_to_pe, gap_threshold=300, left_extension=20, right_extension=150, min_area=0, min_channels=2, max_duration=10000000, _result_buffer=None, result_dtype=None)[source]

Return peaks made from grouping hits together Assumes all hits have the same dt.

Parameters:
  • hits – Hit (or any interval) to group

  • left_extension – Extend peaks by this many ns left

  • right_extension – Extend peaks by this many ns right

  • gap_threshold – No hits for this much ns means new peak

  • min_area – Peaks with less than min_area are not returned

  • min_channels – Peaks with less contributing channels are not returned

  • max_duration – max duration time of merged peak in ns

strax.processing.peak_building.integrate_lone_hits(lone_hits, records, peaks, save_outside_hits, n_channels)[source]

Update the area of lone_hits to the integral in ADCcounts x samples.

Parameters:
  • lone_hits – Hits outside of peaks

  • records – Records in which hits and peaks were found

  • peaks – Peaks

  • save_outside_hits – (left, right) TIME with wich we should extend

the integration window of hits the integration region :param n_channels: number of channels

TODO: this doesn’t extend the integration range beyond record boundaries

strax.processing.peak_building.store_downsampled_waveform(p, wv_buffer, store_in_data_top=False, wv_buffer_top=array([1.], dtype=float32))[source]

Downsample the waveform in buffer and store it in p[‘data’] and in p[‘data_top’] if indicated to do so.

Parameters:
  • p – Row of a strax peak array, or compatible type. Note that p[‘dt’] is adjusted to match the downsampling.

  • wv_buffer – numpy array containing sum waveform during the peak at the input peak’s sampling resolution p[‘dt’].

  • store_in_data_top – Boolean which indicates whether to also store into p[‘data_top’] When downsampling results in a fractional number of samples, the peak is shortened rather than extended. This causes data loss, but it is necessary to prevent overlaps between peaks.

strax.processing.peak_building.sum_waveform(peaks, hits, records, record_links, adc_to_pe, n_top_channels=0, select_peaks_indices=None)[source]

Compute sum waveforms for all peaks in peaks. Only builds summed waveform other regions in which hits were found. This is required to avoid any bias due to zero-padding and baselining. Will downsample sum waveforms if they do not fit in per-peak buffer.

Parameters:
  • peaks – Peaks for which the summed waveform should be build.

  • hits – Hits which are inside peaks. Must be sorted according to record_i.

  • records – Records to be used to build peaks.

  • record_links – Tuple of previous and next records.

  • n_top_channels – Number of top array channels.

  • select_peaks_indices – Indices of the peaks for partial processing. In the form of np.array([np.int, np.int, ..]). If None (default), all the peaks are used for the summation. Assumes all peaks AND pulses have the same dt!

strax.processing.peak_merging module

strax.processing.peak_merging.add_lone_hits(peaks, lone_hits, to_pe, n_top_channels=0)[source]

Function which adds information from lone hits to peaks if lone hit is inside a peak (e.g. after merging.). Modifies peak area and data inplace.

Parameters:
  • peaks – Numpy array of peaks

  • lone_hits – Numpy array of lone_hits

  • to_pe – Gain values to convert lone hit area into PE.

  • n_top_channels – Number of top array channels.

strax.processing.peak_merging.merge_peaks(peaks, start_merge_at, end_merge_at, max_buffer=100000)[source]

Merge specified peaks with their neighbors, return merged peaks.

Parameters:
  • peaks – Record array of strax peak dtype.

  • start_merge_at – Indices to start merge at

  • end_merge_at – EXCLUSIVE indices to end merge at

  • max_buffer – Maximum number of samples in the sum_waveforms and other waveforms of the resulting peaks (after merging). Peaks must be constructed based on the properties of constituent peaks, it being too time-consuming to revert to records/hits.

strax.processing.peak_merging.replace_merged(orig, merge)[source]

Return sorted array of ‘merge’ and members of ‘orig’ that do not touch any of merge.

Parameters:
  • orig – Array of interval-like objects (e.g. peaks)

  • merge – Array of interval-like objects (e.g. peaks)

strax.processing.peak_properties module

strax.processing.peak_properties.compute_index_of_fraction(peak, fractions_desired, result)[source]

Store the (fractional) indices at which peak reaches fractions_desired of their area in result.

Parameters:
  • peak – single strax peak(let) or other data-bearing dtype

  • fractions_desired – array of floats between 0 and 1

Returns:

len(fractions_desired) array of floats

strax.processing.peak_properties.compute_widths(peaks, select_peaks_indices=None)[source]

Compute widths in ns at desired area fractions for peaks.

Parameters:
  • peaks – single strax peak(let) or other data-bearing dtype

  • select_peaks_indices – array of integers informing which peaks to compute default to None in which case compute for all peaks

strax.processing.peak_properties.index_of_fraction(peaks, fractions_desired)[source]

Return the (fractional) indices at which the peaks reach fractions_desired of their area.

Parameters:
  • peaks – strax peak(let)s or other data-bearing dtype

  • fractions_desired – array of floats between 0 and 1

Returns:

(len(peaks), len(fractions_desired)) array of floats

strax.processing.peak_splitting module

strax.processing.peak_splitting.natural_breaks_gof(w, dt, normalize=False, split_low=False, filter_wing_width=0)[source]

Return natural breaks goodness of split/fit for the waveform w a sharp peak gives ~0, two widely separate peaks ~1.

strax.processing.peak_splitting.split_peaks(peaks, hits, records, rlinks, to_pe, algorithm='local_minimum', data_type='peaks', n_top_channels=0, **kwargs)[source]

Return peaks split according to algorithm, with waveforms summed and widths computed.

Note:

Can also be used for hitlets splitting with local_minimum splitter. Just put hitlets instead of peaks.

Parameters:

peaks – Original peaks. Sum waveform must have been built

and properties must have been computed (if you use them) :param hits: Hits found in records. (or None in case of hitlets

splitting.)

Parameters:
  • records – Records from which peaks were built

  • rlinks – strax.record_links for given records (or None in case of hitlets splitting.)

  • to_pe – ADC to PE conversion factor array (of n_channels)

  • algorithm – ‘local_minimum’ or ‘natural_breaks’.

  • data_type – ‘peaks’ or ‘hitlets’. Specifies whether to use sum_wavefrom or get_hitlets_data to compute the waveform of the new split peaks/hitlets.

  • n_top_channels – Number of top array channels.

  • result_dtype – dtype of the result.

Any other options are passed to the algorithm.

strax.processing.peak_splitting.symmetric_moving_average(a, wing_width)[source]

Return the moving average of a, over windows of length [2 * wing_width + 1] centered on each sample.

(i.e. the window covers each sample itself, plus a ‘wing’ of width wing_width on either side)

strax.processing.pulse_processing module

Functions that perform processing on pulses (other than data reduction functions, which are in data_reduction.py)

strax.processing.pulse_processing.baseline(records, baseline_samples=40, flip=True, allow_sloppy_chunking=False, fallback_baseline=16000)[source]

Determine baseline as the average of the first baseline_samples of each pulse. Subtract the pulse data from int(baseline), and store the baseline mean and rms.

Parameters:
  • baseline_samples – number of samples at start of pulse to average to determine the baseline.

  • flip – If true, flip sign of data

  • allow_sloppy_chunking – Allow use of the fallback_baseline in case the 0th fragment of a pulse is missing

  • fallback_baseline – Fallback baseline (ADC counts) Assumes records are sorted in time (or at least by channel, then time). Assumes record_i information is accurate – so don’t cut pulses before baselining them!

strax.processing.pulse_processing.filter_records(r, ir)[source]

Apply filter with impulse response ir over the records r. Assumes the filter origin is at the impulse response maximum.

Parameters:
  • ws – Waveform matrix, must be float

  • ir – Impulse response, must have odd length. Will normalize.

  • prev_r – Previous record map from strax.record_links

  • next_r – Next record map from strax.record_links

strax.processing.pulse_processing.filter_waveforms(ws, ir, prev_r, next_r)[source]

Convolve filter with impulse response ir over each row of ws. Assumes the filter origin is at the impulse response maximum.

Parameters:
  • ws – Waveform matrix, must be float

  • ir – Impulse response, must have odd length.

  • prev_r – Previous record map from strax.record_links

  • next_r – Next record map from strax.record_links

strax.processing.pulse_processing.find_hits(records, min_amplitude: int | ndarray = 15, min_height_over_noise: int | ndarray = 0)[source]

Return hits (intervals >= threshold) found in records. Hits that straddle record boundaries are split (perhaps we should fix this?)

NB: returned hits are NOT sorted yet!

strax.processing.pulse_processing.integrate(records)[source]

Integrate records in-place.

strax.processing.pulse_processing.raw_to_records(raw_records)[source]
strax.processing.pulse_processing.record_length_from_dtype(dtype)[source]

Return (prev_r, next_r), each arrays of indices of previous/next record in the same pulse, or -1 if this is not applicable.

strax.processing.pulse_processing.zero_out_of_bounds(records)[source]

Set waveforms to zero out of pulse bounds.

strax.processing.statistics module

strax.processing.statistics.highest_density_region(data, fractions_desired, only_upper_part=False, _buffer_size=10)[source]

Computes for a given sampled distribution the highest density region of the desired fractions. Does not assume anything on the normalisation of the data.

Parameters:
  • data – Sampled distribution

  • fractions_desired – numpy.array Area/probability for which the hdr should be computed.

  • _buffer_size – Size of the result buffer. The size is equivalent to the maximal number of allowed intervals.

  • only_upper_part – Boolean, if true only computes area/probability between maximum and current height.

Returns:

two arrays: The first one stores the start and inclusive endindex of the highest density region. The second array holds the amplitude for which the desired fraction was reached.

Note:

Also goes by the name highest posterior density. Please note, that the right edge corresponds to the right side of the sample. Hence the corresponding index is -= 1.

Module contents