`athena.models.asr.mtl_seq2seq`¶

a implementation of multi-task model with attention and ctc loss

Module Contents¶

Classes¶

MtlTransformerCtc

In speech recognition, adding CTC loss to Attention-based seq-to-seq model is known to

class athena.models.asr.mtl_seq2seq.MtlTransformerCtc(data_descriptions, config=None)¶

Bases: athena.models.base.BaseModel

In speech recognition, adding CTC loss to Attention-based seq-to-seq model is known to help convergence. It usually gives better results than using attention alone.

SUPPORTED_MODEL¶

default_config¶

call(samples, training=None)¶: call function in keras layers

get_loss(outputs, samples, training=None)¶: get loss used for training

compute_logit_length(input_length)¶: compute the logit length

reset_metrics()¶: reset the metrics

restore_from_pretrained_model(pretrained_model, model_type='')¶: A more general-purpose interface for pretrained model restoration :param pretrained_model: checkpoint path of mpc model :param model_type: the type of pretrained model to restore

_forward_encoder_log_ctc(samples, training: bool = None)¶

decode(samples, hparams, lm_model=None)¶

Initialization of the model for decoding, decoder is called here to create predictions

Parameters

samples – the data source to be decoded
hparams – decoding configs are included here
lm_model – lm model

Returns:

predictions: the corresponding decoding results

enable_tf_funtion()¶

ctc_forward_chunk_freeze(encoder_out)¶

encoder_ctc_forward_chunk_freeze(chunk_xs, offset, required_cache_size, subsampling_cache, elayers_output_cache, conformer_cnn_cache)¶

encoder_forward_chunk_freeze(chunk_xs, offset, required_cache_size, subsampling_cache, elayers_output_cache, conformer_cnn_cache)¶

get_subsample_rate()¶

get_init()¶

encoder_forward_chunk_by_chunk_freeze(speech: tensorflow.Tensor, decoding_chunk_size: int, num_decoding_left_chunks: int = -1) → Tuple[tensorflow.Tensor, tensorflow.Tensor]¶

Forward input chunk by chunk with chunk_size like a streaming: fashion

Here we should pay special attention to computation cache in the streaming style forward chunk by chunk. Three things should be taken into account for computation in the current network:

transformer/conformer encoder layers output cache

convolution in conformer

convolution in subsampling

However, we don’t implement subsampling cache for:

We can control subsampling module to output the right result by overlapping input instead of cache left context, even though it wastes some computation, but subsampling only takes a very small fraction of computation in the whole model.
Typically, there are several covolution layers with subsampling in subsampling module, it is tricky and complicated to do cache with different convolution layers with different subsampling rate.
Currently, nn.Sequential is used to stack all the convolution layers in subsampling, we need to rewrite it to make it work with cache, which is not prefered.

Parameters

speech (tf.Tensor) – (1, max_len, dim)
chunk_size (int) – decoding chunk size

ctc_prefix_beam_search(samples, hparams, decoding_chunk_size, num_decoding_left_chunks) → List[int]¶

athena.models.asr.mtl_seq2seq¶

Module Contents¶

Classes¶

`athena.models.asr.mtl_seq2seq`¶