athena.transform.feats.fbank

This model extracts Fbank features per frame.

Module Contents

Classes

Fbank

Computing filter banks is applying triangular filters on a Mel-scale to the power

class athena.transform.feats.fbank.Fbank(config: dict)

Bases: athena.transform.feats.base_frontend.BaseFrontend

Computing filter banks is applying triangular filters on a Mel-scale to the power spectrum to extract frequency bands.

Parameters

config – contains thirteen optional parameters.

Shape:
  • output: \((T, F, C)\).

Examples::
>>> config = {'filterbank_channel_count': 40, 'remove_dc_offset': True}
>>> fbank_op = Fbank.params(config).instantiate()
>>> fbank_out = fbank_op('test.wav', 16000)
classmethod params(config=None)

Set params.

Parameters
  • config – contains the following thirteen optional parameters:

  • 'window_length' – Window length in seconds. (float, default = 0.025)

  • 'frame_length' – Hop length in seconds. (float, default = 0.010)

  • 'snip_edges' – If 1, the last frame (shorter than window_length) will be cutoff. If 2, 1 // 2 frame_length data will be padded to data. (int, default = 1)

  • 'preEph_coeff' – Coefficient for use in frame-signal preemphasis. (float, default = 0.97)

  • 'window_type' – Type of window (“hamm”|”hann”|”povey”|”rect”|”blac”|”tria”). (string, default = “povey”)

  • 'remove_dc_offset' – Subtract mean from waveform on each frame. (bool, default = true)

  • 'is_fbank' – If true, compute power spetrum without frame energy. If false, using the frame energy instead of the square of the constant component of the signal. (bool, default = true)

  • 'is_log10' – If true, using log10 to fbank. If false, using loge. (bool, default = false)

  • 'output_type' – If 1, return power spectrum. If 2, return log-power spectrum. (int, default = 1)

  • 'upper_frequency_limit' – High cutoff frequency for mel bins (if <= 0, offset from Nyquist) (float, default = 0)

  • 'lower_frequency_limit' – Low cutoff frequency for mel bins (float, default = 20)

  • 'filterbank_channel_count' – Number of triangular mel-frequency bins. (float, default = 23)

  • 'dither' – Dithering constant (0.0 means no dither). (float, default = 1) [add robust to training]

Note

Return an object of class HParams, which is a set of hyperparameters as name-value pairs.

call(audio_data, sample_rate)

Caculate fbank features of audio data.

Parameters
  • audio_data – the audio signal from which to compute spectrum.

  • sample_rate – the sample rate of the signal we working with.

Shape:
  • audio_data: \((1, N)\)

  • sample_rate: float

dim()
num_channels()