athena.models.tts.fastspeech

Module Contents

Classes

FastSpeech

Reference: Fastspeech: Fast, robust and controllable text to speech

LengthRegulator

Length regulator for Fastspeech.

DurationCalculator

Calculate duration and outputs based on teacher model

class athena.models.tts.fastspeech.FastSpeech(data_descriptions, config=None)

Bases: athena.models.base.BaseModel

Reference: Fastspeech: Fast, robust and controllable text to speech

(http://papers.nips.cc/paper/8580-fastspeech-fast-robust-and-controllable-text-to-speech.pdf)

default_config
set_teacher_model(teacher_model, teacher_type)

set teacher model and initialize duration_calculator before training

Parameters
  • teacher_model – the loaded teacher model

  • teacher_type – the model type, e.g., tacotron2, tts_transformer

restore_from_pretrained_model(pretrained_model, model_type='')

restore from pretrained model

Parameters
  • pretrained_model – the loaded pretrained model

  • model_type – the model type, e.g: tts_transformer

get_loss(outputs, samples, training=None)

get loss used for training

_feedforward_decoder(encoder_output, duration_indexes, duration_sequences, output_length, training)

feed-forward decoder

Parameters
  • encoder_output – encoder outputs, shape: [batch, x_steps, d_model]

  • duration_indexes – argmax weights calculated from duration_calculator. It is used for training only, shape: [batch, y_steps]

  • duration_sequences – It contains duration information for each phoneme, shape: [batch, x_steps]

  • output_length – the real output length

  • training – if it is in the training stage

Returns:

before_outs: the outputs before postnet calculation
after_outs: the outputs after postnet calculation
call(samples, training: bool = None)

call model

synthesize(samples)
class athena.models.tts.fastspeech.LengthRegulator

Bases: tensorflow.keras.layers.Layer

Length regulator for Fastspeech.

inference(phoneme_sequences, duration_sequences, alpha=1.0)

Calculate replicated sequences based on duration sequences

Parameters
  • phoneme_sequences – sequences of phoneme features, shape: [batch, x_steps, d_model]

  • duration_sequences – durations of each frame, shape: [batch, x_steps]

  • alpha – Alpha value to control speed of speech.

Returns:

expanded_array: replicated sequences based on durations, shape: [batch, y_steps, d_model]
call(phoneme_sequences, duration_indexes, output_length)

Calculate replicated sequences based on duration sequences

Parameters
  • phoneme_sequences – sequences of phoneme features, shape: [batch, x_steps, d_model]

  • duration_indexes – durations of each frame, shape: [batch, y_steps]

  • output_length – shape: [batch]

Returns:

expanded_array: replicated sequences based on durations, shape: [batch, y_steps, d_model]
class athena.models.tts.fastspeech.DurationCalculator(teacher_model=None, teacher_type=None)

Bases: tensorflow.keras.layers.Layer

Calculate duration and outputs based on teacher model

call(samples)
Parameters

samples – samples from dataset

Returns:

durations: Batch of durations shape: [batch, max_input_length).
_calculate_t2_attentions(samples)

Calculate outputs and attention weights based on tacotron2

Parameters

samples – samples from dataset

Returns::

after_outs: the outputs from the teacher model attn_weights: the attention weights from the teacher model

_calculate_transformer_attentions(samples)

Calculate outputs and attention weights based on tts transformer

Parameters

samples – samples from dataset

Returns::

after_outs: the outputs from the teacher model attn_weights: the attention weights from the teacher model