athena.models.asr.speech_u2
¶
This code is modified from https://github.com/wenet-e2e/wenet.git.
Module Contents¶
Classes¶
Base model for U2 |
|
U2 implementation of a SpeechTransformer. Model mainly consists of three parts: |
|
Conformer-U2 |
- class athena.models.asr.speech_u2.SpeechU2(data_descriptions, config=None)¶
Bases:
athena.models.base.BaseModel
Base model for U2
- default_config¶
- enable_tf_funtion()¶
- call(samples, training: bool = None)¶
call model
- compute_logit_length(input_length)¶
used for get logit length
- _forward_transformer_encoder(x, x_mask, training=None)¶
- _forward_encoder(speech: tensorflow.Tensor, speech_length: tensorflow.Tensor, decoding_chunk_size: int = -1, num_decoding_left_chunks: int = -1, simulate_streaming: bool = False, use_dynamic_left_chunk: bool = False, static_chunk_size: int = -1, training: bool = None) Tuple[tensorflow.Tensor, tensorflow.Tensor] ¶
- _encoder_forward_chunk(xs: tensorflow.Tensor, offset: int, required_cache_size: int, subsampling_cache: tensorflow.Tensor, elayers_output_cache: tensorflow.Tensor, conformer_cnn_cache: tensorflow.Tensor) Tuple[tensorflow.Tensor, tensorflow.Tensor, List[tensorflow.Tensor], List[tensorflow.Tensor]] ¶
Forward just one chunk
- Parameters
xs (tf.Tensor), shape=[B(1) – chunk input
offset (int) – current offset in encoder output time stamp
required_cache_size (int) – cache size required for next chunk compuation >=0: actual cache size <0: means all history cache is required
subsampling_cache (1) – subsampling cache
shape=[B (1) – subsampling cache
elayers_output_cache – transformer/conformer encoder layers output cache
shape=[num_layers – transformer/conformer encoder layers output cache
1 – transformer/conformer encoder layers output cache
T – transformer/conformer encoder layers output cache
d_model] – transformer/conformer encoder layers output cache
conformer_cnn_cache – conformer cnn cache
shape=[num_layers – conformer cnn cache
1 – conformer cnn cache
T – conformer cnn cache
d_model] – conformer cnn cache
- Returns
output of current input xs tf.Tensor: subsampling cache required for next chunk computation tf.Tensor: encoder layers output cache required for next
chunk computation
tf.Tensor: conformer cnn cache
- Return type
tf.Tensor
- _encoder_forward_chunk_by_chunk(speech: tensorflow.Tensor, decoding_chunk_size: int, num_decoding_left_chunks: int = -1) Tuple[tensorflow.Tensor, tensorflow.Tensor] ¶
- Forward input chunk by chunk with chunk_size like a streaming
fashion
Here we should pay special attention to computation cache in the streaming style forward chunk by chunk. Three things should be taken into account for computation in the current network:
transformer/conformer encoder layers output cache
convolution in conformer
convolution in subsampling
- However, we don’t implement subsampling cache for:
We can control subsampling module to output the right result by overlapping input instead of cache left context, even though it wastes some computation, but subsampling only takes a very small fraction of computation in the whole model.
Typically, there are several covolution layers with subsampling in subsampling module, it is tricky and complicated to do cache with different convolution layers with different subsampling rate.
Currently, nn.Sequential is used to stack all the convolution layers in subsampling, we need to rewrite it to make it work with cache, which is not prefered.
- Parameters
speech (tf.Tensor) – (1, max_len, feat_dim)
chunk_size (int) – decoding chunk size
- get_encoder_init_input()¶
- _forward_decoder(encoder_out, encoder_mask, hyps_pad, output_mask)¶
- ctc_prefix_beam_search(samples, hparams, ctc_final_layer) List[int] ¶
- attention_rescoring(samples, hparams, ctc_final_layer: tensorflow.keras.layers.Dense, lm_model: athena.models.base.BaseModel = None) List[int] ¶
- Apply attention rescoring decoding, CTC prefix beam search
is applied first to get nbest, then we resoring the nbest on attention decoder with corresponding encoder out
- Parameters
samples –
hparams – inference_config
ctc_final_layer – encoder final dense layer to output ctc prob.
lm_model –
- Returns
Attention rescoring result
- Return type
List[int]
- freeze_beam_search(samples, beam_size=1, hparams=None, lm_model=None)¶
beam search for freeze only support batch=1
- Parameters
samples – the data source to be decoded
beam_size – beam size
- beam_search(samples, hparams, lm_model=None)¶
batch beam search for transformer model
- Parameters
samples – the data source to be decoded
beam_size – beam size
lm_model – rnnlm that used for beam search
- restore_from_pretrained_model(pretrained_model, model_type='')¶
restore from pretrained model