athena.models.asr.av_conformer
¶
speech transformer implementation
Module Contents¶
Classes¶
Audio and video multimode Conformer. Model mainly consists of three parts: |
- class athena.models.asr.av_conformer.AudioVideoConformer(data_descriptions, config=None)¶
Bases:
athena.models.base.BaseModel
Audio and video multimode Conformer. Model mainly consists of three parts: the a_net for input audio fbank feature preparation, the v_net, the y_net for output preparation and the transformer itself
- default_config¶
- call(samples, training: bool = None)¶
call model
- compute_logit_length(input_length)¶
used for get logit length
- _forward_encoder(samples, training: bool = None)¶
- ctc_prefix_beam_search(samples, hparams, ctc_final_layer) List[int] ¶
- attention_rescoring(samples, hparams, ctc_final_layer: tensorflow.keras.layers.Dense, lm_model: athena.models.base.BaseModel = None) List[int] ¶
- Apply attention rescoring decoding, CTC prefix beam search
is applied first to get nbest, then we resoring the nbest on attention decoder with corresponding encoder out
- Parameters
samples –
hparams – inference_config
ctc_final_layer – encoder final dense layer to output ctc prob.
lm_model –
- Returns
Attention rescoring result
- Return type
List[int]
- beam_search(samples, hparams, lm_model=None)¶
batch beam search for transformer model
- Parameters
samples – the data source to be decoded
beam_size – beam size
lm_model – rnnlm that used for beam search
- restore_from_pretrained_model(pretrained_model, model_type='')¶
restore from pretrained model