`athena.models.asr.av_conformer`¶

speech transformer implementation

Module Contents¶

Classes¶

AudioVideoConformer

Audio and video multimode Conformer. Model mainly consists of three parts:

class athena.models.asr.av_conformer.AudioVideoConformer(data_descriptions, config=None)¶

Bases: athena.models.base.BaseModel

Audio and video multimode Conformer. Model mainly consists of three parts: the a_net for input audio fbank feature preparation, the v_net, the y_net for output preparation and the transformer itself

default_config¶

call(samples, training: bool = None)¶: call model

compute_logit_length(input_length)¶: used for get logit length

_forward_encoder(samples, training: bool = None)¶

ctc_prefix_beam_search(samples, hparams, ctc_final_layer) → List[int]¶

attention_rescoring(samples, hparams, ctc_final_layer: tensorflow.keras.layers.Dense, lm_model: athena.models.base.BaseModel = None) → List[int]¶

Apply attention rescoring decoding, CTC prefix beam search: is applied first to get nbest, then we resoring the nbest on attention decoder with corresponding encoder out

Parameters

samples –
hparams – inference_config
ctc_final_layer – encoder final dense layer to output ctc prob.
lm_model –

Returns

Attention rescoring result

Return type

List[int]

beam_search(samples, hparams, lm_model=None)¶

batch beam search for transformer model

Parameters

samples – the data source to be decoded
beam_size – beam size
lm_model – rnnlm that used for beam search

restore_from_pretrained_model(pretrained_model, model_type='')¶: restore from pretrained model

athena.models.asr.av_conformer¶

Module Contents¶

Classes¶

`athena.models.asr.av_conformer`¶