`athena.transform.feats.spectrum`¶

This model extracts spetrum features per frame.

Module Contents¶

Classes¶

Spectrum

Compute spectrum features of every frame in speech.

class athena.transform.feats.spectrum.Spectrum(config: dict)¶

Bases: athena.transform.feats.base_frontend.BaseFrontend

Compute spectrum features of every frame in speech.

Parameters: config – contains ten optional parameters.

Shape:

output: \((T, F)\).

Examples::

>>> config = {'window_length': 0.25, 'window_type': 'hann'}
>>> spectrum_op = Spectrum.params(config).instantiate()
>>> spectrum_out = spectrum_op('test.wav', 16000)

classmethod params(config=None)¶

Set params.

Parameters

config – contains the following ten optional parameters:
'window_length' – Window length in seconds. (float, default = 0.025),
'frame_length' – Hop length in seconds. (float, default = 0.010),
'snip_edges' – If 1, the last frame (shorter than window_length) will be cutoff. If 2, 1 // 2 frame_length data will be padded to data. (int, default = 1),
'preEph_coeff' – Coefficient for use in frame-signal preemphasis. (float, default = 0.97),
'window_type' – Type of window (“hamm”|”hann”|”povey”|”rect”|”blac”|”tria”). (string, default = “povey”)
'remove_dc_offset' – Subtract mean from waveform on each frame. (bool, default = true)
'is_fbank' – If true, compute power spetrum without frame energy. If false, using the frame energy instead of the square of the constant component of the signal. (bool, default = false)
'output_type' – If 1, return power spectrum. If 2, return log-power spectrum. If 3, return magnitude spectrum. (int, default = 2)
'upper_frequency_limit' – High cutoff frequency for mel bins (if <= 0, offset from Nyquist) (float, default = 0)
'dither' – Dithering constant (0.0 means no dither). (float, default = 1) [add robust to training]

Note

Return an object of class HParams, which is a set of hyperparameters as name-value pairs.

call(audio_data, sample_rate=None)¶

Caculate power spectrum or log power spectrum of audio data.