athena.models.asr.mtl_seq2seq

a implementation of multi-task model with attention and ctc loss

Module Contents

Classes

MtlTransformerCtc

In speech recognition, adding CTC loss to Attention-based seq-to-seq model is known to

class athena.models.asr.mtl_seq2seq.MtlTransformerCtc(data_descriptions, config=None)

Bases: athena.models.base.BaseModel

In speech recognition, adding CTC loss to Attention-based seq-to-seq model is known to help convergence. It usually gives better results than using attention alone.

SUPPORTED_MODEL
default_config
call(samples, training=None)

call function in keras layers

get_loss(outputs, samples, training=None)

get loss used for training

compute_logit_length(input_length)

compute the logit length

reset_metrics()

reset the metrics

restore_from_pretrained_model(pretrained_model, model_type='')

A more general-purpose interface for pretrained model restoration :param pretrained_model: checkpoint path of mpc model :param model_type: the type of pretrained model to restore

_forward_encoder_log_ctc(samples, training: bool = None)
decode(samples, hparams, lm_model=None)

Initialization of the model for decoding, decoder is called here to create predictions

Parameters
  • samples – the data source to be decoded

  • hparams – decoding configs are included here

  • lm_model – lm model

Returns:

predictions: the corresponding decoding results
enable_tf_funtion()
ctc_forward_chunk_freeze(encoder_out)
encoder_ctc_forward_chunk_freeze(chunk_xs, offset, required_cache_size, subsampling_cache, elayers_output_cache, conformer_cnn_cache)
encoder_forward_chunk_freeze(chunk_xs, offset, required_cache_size, subsampling_cache, elayers_output_cache, conformer_cnn_cache)
get_subsample_rate()
get_init()
encoder_forward_chunk_by_chunk_freeze(speech: tensorflow.Tensor, decoding_chunk_size: int, num_decoding_left_chunks: int = -1) Tuple[tensorflow.Tensor, tensorflow.Tensor]
Forward input chunk by chunk with chunk_size like a streaming

fashion

Here we should pay special attention to computation cache in the streaming style forward chunk by chunk. Three things should be taken into account for computation in the current network:

  1. transformer/conformer encoder layers output cache

  2. convolution in conformer

  3. convolution in subsampling

However, we don’t implement subsampling cache for:
  1. We can control subsampling module to output the right result by overlapping input instead of cache left context, even though it wastes some computation, but subsampling only takes a very small fraction of computation in the whole model.

  2. Typically, there are several covolution layers with subsampling in subsampling module, it is tricky and complicated to do cache with different convolution layers with different subsampling rate.

  3. Currently, nn.Sequential is used to stack all the convolution layers in subsampling, we need to rewrite it to make it work with cache, which is not prefered.

Parameters
  • speech (tf.Tensor) – (1, max_len, dim)

  • chunk_size (int) – decoding chunk size