athena.models.asr.mtl_seq2seq
¶
a implementation of multi-task model with attention and ctc loss
Module Contents¶
Classes¶
In speech recognition, adding CTC loss to Attention-based seq-to-seq model is known to |
- class athena.models.asr.mtl_seq2seq.MtlTransformerCtc(data_descriptions, config=None)¶
Bases:
athena.models.base.BaseModel
In speech recognition, adding CTC loss to Attention-based seq-to-seq model is known to help convergence. It usually gives better results than using attention alone.
- SUPPORTED_MODEL¶
- default_config¶
- call(samples, training=None)¶
call function in keras layers
- get_loss(outputs, samples, training=None)¶
get loss used for training
- compute_logit_length(input_length)¶
compute the logit length
- reset_metrics()¶
reset the metrics
- restore_from_pretrained_model(pretrained_model, model_type='')¶
A more general-purpose interface for pretrained model restoration :param pretrained_model: checkpoint path of mpc model :param model_type: the type of pretrained model to restore
- _forward_encoder_log_ctc(samples, training: bool = None)¶
- decode(samples, hparams, lm_model=None)¶
Initialization of the model for decoding, decoder is called here to create predictions
- Parameters
samples – the data source to be decoded
hparams – decoding configs are included here
lm_model – lm model
Returns:
predictions: the corresponding decoding results
- enable_tf_funtion()¶
- ctc_forward_chunk_freeze(encoder_out)¶
- encoder_ctc_forward_chunk_freeze(chunk_xs, offset, required_cache_size, subsampling_cache, elayers_output_cache, conformer_cnn_cache)¶
- encoder_forward_chunk_freeze(chunk_xs, offset, required_cache_size, subsampling_cache, elayers_output_cache, conformer_cnn_cache)¶
- get_subsample_rate()¶
- get_init()¶
- encoder_forward_chunk_by_chunk_freeze(speech: tensorflow.Tensor, decoding_chunk_size: int, num_decoding_left_chunks: int = -1) Tuple[tensorflow.Tensor, tensorflow.Tensor] ¶
- Forward input chunk by chunk with chunk_size like a streaming
fashion
Here we should pay special attention to computation cache in the streaming style forward chunk by chunk. Three things should be taken into account for computation in the current network:
transformer/conformer encoder layers output cache
convolution in conformer
convolution in subsampling
- However, we don’t implement subsampling cache for:
We can control subsampling module to output the right result by overlapping input instead of cache left context, even though it wastes some computation, but subsampling only takes a very small fraction of computation in the whole model.
Typically, there are several covolution layers with subsampling in subsampling module, it is tricky and complicated to do cache with different convolution layers with different subsampling rate.
Currently, nn.Sequential is used to stack all the convolution layers in subsampling, we need to rewrite it to make it work with cache, which is not prefered.
- Parameters
speech (tf.Tensor) – (1, max_len, dim)
chunk_size (int) – decoding chunk size
- ctc_prefix_beam_search(samples, hparams, decoding_chunk_size, num_decoding_left_chunks) List[int] ¶