athena.data.datasets.preprocess
¶
preprecessing for speech features
This code is modified from https://github.com/wenet-e2e/wenet.git.
Module Contents¶
Classes¶
Implementation of specaugument from paper "SpecAugment: A Simple Data |
- class athena.data.datasets.preprocess.SpecAugment(preprocess_config)¶
Implementation of specaugument from paper “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition”
- Parameters
preprocess_config –
it contains configs below:
time_warping: warped time parameter, should be in (0, time / 2), a random horizontal center point in (W, time_steps - W) will be warped either to left or right by a distance chosen from range [0, W) randomly. time_masking: masked time range, should be in (0, time_steps), the final masked time steps will be [t_0, t_0 + t), t is random from[0, T), t_0 is random from [0, time_steps - t) frequency_masking: masked frequency range, should be in (0, dimension), the final masked frequencies will be [f_0, f_0 + f), f is random from[0, F), f_0 is random from [0, dimension - f) mask_cols: masking operation is executed mask_cols times in each axis
- default_config¶
- __call__(feat)¶
spec augment preprocess for audio features
- Parameters
feat – audio features, shape should be [time_steps, dimension, channels]
- Returns
processed features
- spec_augmentation(x)¶
Deep copy x and do spec augmentation then return it
- Parameters
x – input feature, T * F 2D
num_t_mask – number of time mask to apply
num_f_mask – number of freq mask to apply
max_t – max width of time mask
max_f – max width of freq mask
max_w – max width of time warp
- Returns
augmented feature