athena.transform

Subpackages

Submodules

Package Contents

Classes

AudioFeaturizer

Interface of audio features extractions. The kernels of features are based on

Functions

compute_cmvn(audio_feature[, mean, variance, local_cmvn])

read_wav(wavfile[, audio_channels])

Read wav from file. Can be called directly without ReadWav class.

class athena.transform.AudioFeaturizer(config={'type': 'Fbank'})

Interface of audio features extractions. The kernels of features are based on Kaldi (Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit[C]//IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society, 2011 (CONF). ) and Librosa.

Now Transform supports the following 7 features: Spectrum, MelSpectrum, Framepow, Pitch, Mfcc, Fbank, FbankPitch.

Parameters

config – a dictionary contains parameters of feature extraction.

Examples::
>>> fbank_op = AudioFeaturizer(config={'type':'Fbank', 'filterbank_channel_count':40,
>>> 'lower_frequency_limit': 60, 'upper_frequency_limit':7600})
>>> fbank_out = fbank_op('test.wav')
property dim

Return the dimension of the feature, if only ReadWav, return 1. Else, see the docs in the feature class.

property num_channels

return the channel of the feature

__call__(audio=None, sr=None, speed=1.0)

Extract feature from audio data.

Parameters
  • audio – filename of wav or audio data.

  • sr – the sample rate of the signal we working with. (default=None)

  • speed – adjust audio speed. (default=1.0)

Shape:
  • audio: string or array with \((1, L)\).

  • sr: int

  • speed: float

  • output: see the docs in the feature class.

__impl(audio=None, sr=None, speed=1.0)

Call OP of features to extract features.

Parameters
  • audio – a tensor of audio data or audio file

  • sr – a tensor of sample rate

athena.transform.compute_cmvn(audio_feature, mean=None, variance=None, local_cmvn=False)
athena.transform.read_wav(wavfile, audio_channels=1)

Read wav from file. Can be called directly without ReadWav class.

Examples::
>>> audio_data, sample_rate = read_wav('test.wav', audio_channels=1)