athena.models.asr.speech_conformer

speech transformer implementation

Module Contents

Classes

SpeechConformer

Standard implementation of a SpeechTransformer. Model mainly consists of three parts:

class athena.models.asr.speech_conformer.SpeechConformer(data_descriptions, config=None)

Bases: athena.models.base.BaseModel

Standard implementation of a SpeechTransformer. Model mainly consists of three parts: the x_net for input preparation, the y_net for output preparation and the transformer itself

default_config
call(samples, training: bool = None)

call model

compute_logit_length(input_length)

used for get logit length

_forward_encoder(speech, speech_length, training: bool = None)
_forward_encoder_log_ctc(samples, final_layer, training: bool = None)
freeze_ctc_probs(samples, ctc_final_layer, hparams=None, beam_size=None) List[int]
attention_rescoring(samples, hparams, ctc_final_layer: tensorflow.keras.layers.Dense, lm_model: athena.models.base.BaseModel = None) List[int]
Apply attention rescoring decoding, CTC prefix beam search

is applied first to get nbest, then we resoring the nbest on attention decoder with corresponding encoder out

Parameters
  • samples

  • hparams – inference_config

  • ctc_final_layer – encoder final dense layer to output ctc prob.

  • lm_model

Returns

Attention rescoring result

Return type

List[int]

beam search for freeze only support batch=1

Parameters
  • samples – the data source to be decoded

  • beam_size – beam size

batch beam search for transformer model

Parameters
  • samples – the data source to be decoded

  • beam_size – beam size

  • lm_model – rnnlm that used for beam search

restore_from_pretrained_model(pretrained_model, model_type='')

restore from pretrained model