athena.models.asr.speech_conformer_ctc
¶
speech transformer implementation
Module Contents¶
Classes¶
Standard implementation of a SpeechTransformer. Model mainly consists of three parts: |
- class athena.models.asr.speech_conformer_ctc.SpeechConformerCTC(data_descriptions, config=None)¶
Bases:
athena.models.base.BaseModel
Standard implementation of a SpeechTransformer. Model mainly consists of three parts: the x_net for input preparation and the transformer itself
- default_config¶
- call(samples, training: bool = None)¶
call model
- compute_logit_length(input_length)¶
used for get logit length
- _forward_encoder(speech, speech_length, training=None)¶
- _forward_encoder_log_ctc(samples, training: bool = None)¶
- decode(samples, hparams, lm_model=None)¶
Initialization of the model for decoding, decoder is called here to create predictions
- Parameters
samples – the data source to be decoded
hparams – decoding configs are included here
lm_model – lm model
Returns:
predictions: the corresponding decoding results
- argmax(samples, hparams)¶
argmax for the Conformer CTC model
- Parameters
samples – the data source to be decoded
hparams – decoding configs are included here
- Returns::
predictions: the corresponding decoding results
- ctc_prefix_beam_search(samples, hparams, ctc_final_layer) List[int] ¶
- freeze_ctc_prefix_beam_search(samples, ctc_final_layer, hparams=None, beam_size=1) List[int] ¶
- merge_ctc_sequence(seqs, blank=-1)¶
- freeze_beam_search(samples, beam_size)¶
beam search for freeze only support batch=1
- Parameters
samples – the data source to be decoded
beam_size – beam size
- restore_from_pretrained_model(pretrained_model, model_type='')¶
restore from pretrained model