athena.models.tts.tts_transformer
¶
speech transformer implementation
Module Contents¶
Classes¶
TTS version of SpeechTransformer. Model mainly consists of three parts: |
- class athena.models.tts.tts_transformer.TTSTransformer(data_descriptions, config=None)¶
Bases:
athena.models.tts.tacotron2.Tacotron2
TTS version of SpeechTransformer. Model mainly consists of three parts: the x_net for input preparation, the y_net for output preparation and the transformer itself Reference: Neural Speech Synthesis with Transformer Network
- default_config¶
- call(samples, training: bool = None)¶
- time_propagate(encoder_output, memory_mask, outs, step)¶
Synthesize one step frames
- Parameters
encoder_output – the encoder output, shape: [batch, x_steps, eunits]
memory_mask – the encoder output mask, shape: [batch, 1, 1, x_steps]
outs (TensorArray) – previous outputs
step – the current step number
Returns:
out: new frame outpus, shape: [batch, feat_dim * reduction_factor] logit: new stop token prediction logit, shape: [batch, reduction_factor] attention_weights (list): the corresponding attention weights, each element in the list represents the attention weights of one decoder layer shape: [batch, num_heads, seq_len_q, seq_len_k]
- synthesize(samples)¶
Synthesize acoustic features from the input texts
- Parameters
samples – the data source to be synthesized
Returns:
after_outs: the corresponding synthesized acoustic features attn_weights_stack: the corresponding attention weights