`athena.data.datasets.tts.speech_fastspeech2`¶

audio dataset

Module Contents¶

Classes¶

SpeechFastspeech2DatasetBuilder

SpeechSynthesisDatasetBuilder

Functions¶

`average_by_duration`(x, durs)
`tf_average_by_duration`(x, durs)

athena.data.datasets.tts.speech_fastspeech2.average_by_duration(x, durs)¶

athena.data.datasets.tts.speech_fastspeech2.tf_average_by_duration(x, durs)¶

class athena.data.datasets.tts.speech_fastspeech2.SpeechFastspeech2DatasetBuilder(config=None)¶

Bases: athena.data.datasets.base.BaseDatasetBuilder

SpeechSynthesisDatasetBuilder

property num_class¶

@property

Returns: the max_index of the vocabulary
Return type: int

property feat_dim¶: return the number of feature dims

property sample_type¶

@property

Returns

sample_type of the dataset:

{
    "utt_id": tf.string,
    "input": tf.int32,
    "input_length": tf.int32,
    "output_length": tf.int32,
    "output": tf.float32,
    "speaker": tf.int32,
    "duration": tf.int32
}

Return type

dict

property sample_shape¶

@property

Returns

sample_shape of the dataset:

{
    "utt_id": tf.TensorShape([]),
    "input": tf.TensorShape([None]),
    "input_length": tf.TensorShape([]),
    "output_length": tf.TensorShape([]),
    "output": tf.TensorShape([None, feature_dim]),
    "f0": tf.TensorShape([None]),
    "energy": tf.TensorShape([None]),
    "speaker": tf.TensorShape([]),
    "duration": tf.TensorShape([None])
}

Return type

dict

property sample_signature¶

@property

Returns

sample_signature of the dataset:

{
    "utt_id": tf.TensorSpec(shape=(None), dtype=tf.string),
    "input": tf.TensorSpec(shape=(None, None), dtype=tf.int32),
    "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32),
    "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32),
    "output": tf.TensorSpec(shape=(None, None, feature_dim),
                            dtype=tf.float32),
    "f0": tf.TensorSpec(shape=(None, None), dtype=tf.float32),
    "energy": tf.TensorSpec(shape=(None, None), dtype=tf.float32),
    "speaker": tf.TensorSpec(shape=(None), dtype=tf.int32)
}

Return type

dict

default_config¶

load_duration(duration)¶

preprocess_data(file_path)¶: generate a list of tuples (audio_feature, wav_length_ms, transcript, duration, speaker).

load_audio_feature(audio_feature_file)¶

__getitem__(index)¶

compute_cmvn_if_necessary(is_necessary=True)¶: compute cmvn file

athena.data.datasets.tts.speech_fastspeech2¶

Module Contents¶

Classes¶

Functions¶

`athena.data.datasets.tts.speech_fastspeech2`¶