`athena.data.datasets.preprocess`¶

preprecessing for speech features

This code is modified from https://github.com/wenet-e2e/wenet.git.

Module Contents¶

Classes¶

SpecAugment

Implementation of specaugument from paper "SpecAugment: A Simple Data

class athena.data.datasets.preprocess.SpecAugment(preprocess_config)¶

Implementation of specaugument from paper “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition”

Parameters

preprocess_config –

it contains configs below:

time_warping: warped time parameter, should be in (0, time / 2),
    a random horizontal center point in (W, time_steps - W) will be warped
    either to left or right by a distance chosen from range [0, W) randomly.
time_masking: masked time range, should be in (0, time_steps),
    the final masked time steps will be [t_0, t_0 + t),
    t is random from[0, T), t_0 is random from [0, time_steps - t)
frequency_masking: masked frequency range, should be in (0, dimension),
    the final masked frequencies will be [f_0, f_0 + f),
    f is random from[0, F), f_0 is random from [0, dimension - f)
mask_cols: masking operation is executed mask_cols times in each axis

default_config¶

__call__(feat)¶

spec augment preprocess for audio features

Parameters: feat – audio features, shape should be [time_steps, dimension, channels]
Returns: processed features

spec_augmentation(x)¶

Deep copy x and do spec augmentation then return it

Parameters

x – input feature, T * F 2D
num_t_mask – number of time mask to apply
num_f_mask – number of freq mask to apply
max_t – max width of time mask
max_f – max width of freq mask
max_w – max width of time warp

Returns

augmented feature

athena.data.datasets.preprocess¶

Module Contents¶

Classes¶

`athena.data.datasets.preprocess`¶