athena.models.asr.speech_conformer
¶
speech transformer implementation
Module Contents¶
Classes¶
Standard implementation of a SpeechTransformer. Model mainly consists of three parts: |
- class athena.models.asr.speech_conformer.SpeechConformer(data_descriptions, config=None)¶
Bases:
athena.models.base.BaseModel
Standard implementation of a SpeechTransformer. Model mainly consists of three parts: the x_net for input preparation, the y_net for output preparation and the transformer itself
- default_config¶
- call(samples, training: bool = None)¶
call model
- compute_logit_length(input_length)¶
used for get logit length
- _forward_encoder(speech, speech_length, training: bool = None)¶
- _forward_encoder_log_ctc(samples, final_layer, training: bool = None)¶
- ctc_prefix_beam_search(samples, hparams, ctc_final_layer) List[int] ¶
- freeze_ctc_prefix_beam_search(samples, ctc_final_layer, hparams=None, beam_size=None) List[int] ¶
- freeze_ctc_probs(samples, ctc_final_layer, hparams=None, beam_size=None) List[int] ¶
- attention_rescoring(samples, hparams, ctc_final_layer: tensorflow.keras.layers.Dense, lm_model: athena.models.base.BaseModel = None) List[int] ¶
- Apply attention rescoring decoding, CTC prefix beam search
is applied first to get nbest, then we resoring the nbest on attention decoder with corresponding encoder out
- Parameters
samples –
hparams – inference_config
ctc_final_layer – encoder final dense layer to output ctc prob.
lm_model –
- Returns
Attention rescoring result
- Return type
List[int]
- freeze_beam_search(samples, beam_size)¶
beam search for freeze only support batch=1
- Parameters
samples – the data source to be decoded
beam_size – beam size
- beam_search(samples, hparams, lm_model=None)¶
batch beam search for transformer model
- Parameters
samples – the data source to be decoded
beam_size – beam size
lm_model – rnnlm that used for beam search
- restore_from_pretrained_model(pretrained_model, model_type='')¶
restore from pretrained model