`athena.models.tts.fastspeech`¶

Module Contents¶

Classes¶

`FastSpeech`	Reference: Fastspeech: Fast, robust and controllable text to speech
`LengthRegulator`	Length regulator for Fastspeech.
`DurationCalculator`	Calculate duration and outputs based on teacher model

class athena.models.tts.fastspeech.FastSpeech(data_descriptions, config=None)¶

Bases: athena.models.base.BaseModel

Reference: Fastspeech: Fast, robust and controllable text to speech: (http://papers.nips.cc/paper/8580-fastspeech-fast-robust-and-controllable-text-to-speech.pdf)

default_config¶

set_teacher_model(teacher_model, teacher_type)¶

set teacher model and initialize duration_calculator before training

Parameters

teacher_model – the loaded teacher model
teacher_type – the model type, e.g., tacotron2, tts_transformer

restore_from_pretrained_model(pretrained_model, model_type='')¶

restore from pretrained model

Parameters

pretrained_model – the loaded pretrained model
model_type – the model type, e.g: tts_transformer

get_loss(outputs, samples, training=None)¶: get loss used for training

_feedforward_decoder(encoder_output, duration_indexes, duration_sequences, output_length, training)¶

feed-forward decoder

Parameters

encoder_output – encoder outputs, shape: [batch, x_steps, d_model]
duration_indexes – argmax weights calculated from duration_calculator. It is used for training only, shape: [batch, y_steps]
duration_sequences – It contains duration information for each phoneme, shape: [batch, x_steps]
output_length – the real output length
training – if it is in the training stage

Returns:

before_outs: the outputs before postnet calculation
after_outs: the outputs after postnet calculation

call(samples, training: bool = None)¶: call model

synthesize(samples)¶

class athena.models.tts.fastspeech.LengthRegulator¶

Bases: tensorflow.keras.layers.Layer

Length regulator for Fastspeech.

inference(phoneme_sequences, duration_sequences, alpha=1.0)¶

Calculate replicated sequences based on duration sequences

Parameters

phoneme_sequences – sequences of phoneme features, shape: [batch, x_steps, d_model]
duration_sequences – durations of each frame, shape: [batch, x_steps]
alpha – Alpha value to control speed of speech.

Returns:

expanded_array: replicated sequences based on durations, shape: [batch, y_steps, d_model]

call(phoneme_sequences, duration_indexes, output_length)¶

Calculate replicated sequences based on duration sequences

Parameters

phoneme_sequences – sequences of phoneme features, shape: [batch, x_steps, d_model]
duration_indexes – durations of each frame, shape: [batch, y_steps]
output_length – shape: [batch]

Returns:

expanded_array: replicated sequences based on durations, shape: [batch, y_steps, d_model]

class athena.models.tts.fastspeech.DurationCalculator(teacher_model=None, teacher_type=None)¶

Bases: tensorflow.keras.layers.Layer

Calculate duration and outputs based on teacher model

call(samples)¶

Parameters: samples – samples from dataset

Returns:

durations: Batch of durations shape: [batch, max_input_length).

_calculate_t2_attentions(samples)¶

Calculate outputs and attention weights based on tacotron2

Parameters: samples – samples from dataset

Returns::: after_outs: the outputs from the teacher model attn_weights: the attention weights from the teacher model

_calculate_transformer_attentions(samples)¶

Calculate outputs and attention weights based on tts transformer

Parameters: samples – samples from dataset

Returns::: after_outs: the outputs from the teacher model attn_weights: the attention weights from the teacher model

athena.models.tts.fastspeech¶

Module Contents¶

Classes¶

`athena.models.tts.fastspeech`¶