athena.models.tts.fastspeech
¶
Module Contents¶
Classes¶
Reference: Fastspeech: Fast, robust and controllable text to speech |
|
Length regulator for Fastspeech. |
|
Calculate duration and outputs based on teacher model |
- class athena.models.tts.fastspeech.FastSpeech(data_descriptions, config=None)¶
Bases:
athena.models.base.BaseModel
- Reference: Fastspeech: Fast, robust and controllable text to speech
(http://papers.nips.cc/paper/8580-fastspeech-fast-robust-and-controllable-text-to-speech.pdf)
- default_config¶
- set_teacher_model(teacher_model, teacher_type)¶
set teacher model and initialize duration_calculator before training
- Parameters
teacher_model – the loaded teacher model
teacher_type – the model type, e.g., tacotron2, tts_transformer
- restore_from_pretrained_model(pretrained_model, model_type='')¶
restore from pretrained model
- Parameters
pretrained_model – the loaded pretrained model
model_type – the model type, e.g: tts_transformer
- get_loss(outputs, samples, training=None)¶
get loss used for training
- _feedforward_decoder(encoder_output, duration_indexes, duration_sequences, output_length, training)¶
feed-forward decoder
- Parameters
encoder_output – encoder outputs, shape: [batch, x_steps, d_model]
duration_indexes – argmax weights calculated from duration_calculator. It is used for training only, shape: [batch, y_steps]
duration_sequences – It contains duration information for each phoneme, shape: [batch, x_steps]
output_length – the real output length
training – if it is in the training stage
Returns:
before_outs: the outputs before postnet calculation after_outs: the outputs after postnet calculation
- call(samples, training: bool = None)¶
call model
- synthesize(samples)¶
- class athena.models.tts.fastspeech.LengthRegulator¶
Bases:
tensorflow.keras.layers.Layer
Length regulator for Fastspeech.
- inference(phoneme_sequences, duration_sequences, alpha=1.0)¶
Calculate replicated sequences based on duration sequences
- Parameters
phoneme_sequences – sequences of phoneme features, shape: [batch, x_steps, d_model]
duration_sequences – durations of each frame, shape: [batch, x_steps]
alpha – Alpha value to control speed of speech.
Returns:
expanded_array: replicated sequences based on durations, shape: [batch, y_steps, d_model]
- call(phoneme_sequences, duration_indexes, output_length)¶
Calculate replicated sequences based on duration sequences
- Parameters
phoneme_sequences – sequences of phoneme features, shape: [batch, x_steps, d_model]
duration_indexes – durations of each frame, shape: [batch, y_steps]
output_length – shape: [batch]
Returns:
expanded_array: replicated sequences based on durations, shape: [batch, y_steps, d_model]
- class athena.models.tts.fastspeech.DurationCalculator(teacher_model=None, teacher_type=None)¶
Bases:
tensorflow.keras.layers.Layer
Calculate duration and outputs based on teacher model
- call(samples)¶
- Parameters
samples – samples from dataset
Returns:
durations: Batch of durations shape: [batch, max_input_length).
- _calculate_t2_attentions(samples)¶
Calculate outputs and attention weights based on tacotron2
- Parameters
samples – samples from dataset
- Returns::
after_outs: the outputs from the teacher model attn_weights: the attention weights from the teacher model
- _calculate_transformer_attentions(samples)¶
Calculate outputs and attention weights based on tts transformer
- Parameters
samples – samples from dataset
- Returns::
after_outs: the outputs from the teacher model attn_weights: the attention weights from the teacher model