athena.transform
¶
Subpackages¶
athena.transform.feats
athena.transform.feats.ops
athena.transform.feats.add_rir_noise_aecres
athena.transform.feats.add_rir_noise_aecres_test
athena.transform.feats.base_frontend
athena.transform.feats.cmvn
athena.transform.feats.cmvn_test
athena.transform.feats.fbank
athena.transform.feats.fbank_pitch
athena.transform.feats.fbank_pitch_test
athena.transform.feats.fbank_test
athena.transform.feats.framepow
athena.transform.feats.framepow_test
athena.transform.feats.mel_spectrum
athena.transform.feats.mel_spectrum_test
athena.transform.feats.mfcc
athena.transform.feats.mfcc_test
athena.transform.feats.pitch
athena.transform.feats.pitch_test
athena.transform.feats.read_wav
athena.transform.feats.read_wav_test
athena.transform.feats.spectrum
athena.transform.feats.spectrum_test
athena.transform.feats.write_wav
athena.transform.feats.write_wav_test
Submodules¶
Package Contents¶
Classes¶
Interface of audio features extractions. The kernels of features are based on |
Functions¶
|
|
|
Read wav from file. Can be called directly without ReadWav class. |
- class athena.transform.AudioFeaturizer(config={'type': 'Fbank'})¶
Interface of audio features extractions. The kernels of features are based on Kaldi (Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit[C]//IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society, 2011 (CONF). ) and Librosa.
Now Transform supports the following 7 features: Spectrum, MelSpectrum, Framepow, Pitch, Mfcc, Fbank, FbankPitch.
- Parameters
config – a dictionary contains parameters of feature extraction.
- Examples::
>>> fbank_op = AudioFeaturizer(config={'type':'Fbank', 'filterbank_channel_count':40, >>> 'lower_frequency_limit': 60, 'upper_frequency_limit':7600}) >>> fbank_out = fbank_op('test.wav')
- property dim¶
Return the dimension of the feature, if only ReadWav, return 1. Else, see the docs in the feature class.
- property num_channels¶
return the channel of the feature
- __call__(audio=None, sr=None, speed=1.0)¶
Extract feature from audio data.
- Parameters
audio – filename of wav or audio data.
sr – the sample rate of the signal we working with. (default=None)
speed – adjust audio speed. (default=1.0)
- Shape:
audio: string or array with \((1, L)\).
sr: int
speed: float
output: see the docs in the feature class.
- __impl(audio=None, sr=None, speed=1.0)¶
Call OP of features to extract features.
- Parameters
audio – a tensor of audio data or audio file
sr – a tensor of sample rate
- athena.transform.compute_cmvn(audio_feature, mean=None, variance=None, local_cmvn=False)¶
- athena.transform.read_wav(wavfile, audio_channels=1)¶
Read wav from file. Can be called directly without ReadWav class.
- Examples::
>>> audio_data, sample_rate = read_wav('test.wav', audio_channels=1)