athena.data
¶
data
Subpackages¶
athena.data.datasets
athena.data.datasets.asr
athena.data.datasets.asr.audio_video_recognition
athena.data.datasets.asr.audio_video_recognition_batch_bins
athena.data.datasets.asr.speech_recognition
athena.data.datasets.asr.speech_recognition_batch_bins
athena.data.datasets.asr.speech_recognition_batch_bins_kaldiio
athena.data.datasets.asr.speech_recognition_kaldiio
athena.data.datasets.kws
athena.data.datasets.mpc
athena.data.datasets.tts
athena.data.datasets.vad
athena.data.datasets.base
athena.data.datasets.language_set
athena.data.datasets.preprocess
athena.data.datasets.speech_set
Submodules¶
Package Contents¶
Classes¶
SpeechDatasetBuilder |
|
SpeechRecognitionDatasetBuilder |
|
SpeechRecognitionDatasetBatchBinsBuilder |
|
SpeechRecognitionDatasetKaldiIOBuilder |
|
SpeechRecognitionDatasetBatchBinsKaldiIOBuilder |
|
SpeechRecognitionDatasetBuilder |
|
SpeechRecognitionDatasetBatchBinsBuilder |
|
SpeechSynthesisDatasetBuilder |
|
SpeechSynthesisDatasetBuilder |
|
SpeechSynthesisDatasetBuilder |
|
Dataset builder for CNN model. The builder treat every spliced frame as one image. |
|
Dataset builder for RNN model. The builder mix the spliced frame in one dim |
|
Dataset builder for RNN model. The builder mix the spliced frame in one dim |
|
LanguageDatasetBuilder |
|
SpeechDatasetBuilder |
|
MpcSpeechDatasetKaldiIOBuilder |
|
VoiceActivityDetectionDatasetKaldiIOBuilder |
|
Feature Normalizer |
|
Fastspeech2 Feature Normalizer |
|
The main text featurizer interface |
|
SentencePieceFeaturizer using tensorflow-text api |
|
TextTokenizer |
- class athena.data.SpeechDatasetBuilder(config=None)¶
Bases:
athena.data.datasets.base.SpeechBaseDatasetBuilder
SpeechDatasetBuilder
- property num_class¶
@property
- Returns
the target dim
- Return type
int
- property sample_type¶
@property
- Returns
sample_type of the dataset:
{ "input": tf.float32, "input_length": tf.int32, "output": tf.float32, "output_length": tf.int32, }
- Return type
dict
- property sample_shape¶
@property
- Returns
sample_shape of the dataset:
{ "input": tf.TensorShape( [None, self.audio_featurizer.dim, self.audio_featurizer.num_channels] ), "input_length": tf.TensorShape([]), "output": tf.TensorShape([None, None]), "output_length": tf.TensorShape([]), }
- Return type
dict
- property sample_signature¶
@property
- Returns
sample_signature of the dataset:
{ "input": tf.TensorSpec( shape=(None, None, None, None), dtype=tf.float32 ), "input_length": tf.TensorSpec(shape=([None]), dtype=tf.int32), "output": tf.TensorSpec(shape=(None, None, None), dtype=tf.float32), "output_length": tf.TensorSpec(shape=([None]), dtype=tf.int32), }
- Return type
dict
- default_config¶
- preprocess_data(file_path)¶
generate a list of tuples (wav_filename, wav_length_ms, speaker).
- __getitem__(index)¶
get a sample
- Parameters
index (int) – index of the entries
- Returns
sample:
{ "input": input_data, "input_length": input_data.shape[0], "output": output_data, "output_length": output_data.shape[0], }
- Return type
dict
- class athena.data.SpeechRecognitionDatasetBuilder(config=None)¶
Bases:
athena.data.datasets.base.SpeechBaseDatasetBuilder
SpeechRecognitionDatasetBuilder
- property num_class¶
return the max_index of the vocabulary + 1
- property sample_type¶
@property
- Returns
sample_type of the dataset:
{ "input": tf.float32, "input_length": tf.int32, "output_length": tf.int32, "output": tf.int32, "utt_id": tf.string, }
- Return type
dict
- property sample_shape¶
@property
- Returns
sample_shape of the dataset:
{ "input": tf.TensorShape([None, dim, nc]), "input_length": tf.TensorShape([]), "output_length": tf.TensorShape([]), "output": tf.TensorShape([None]), "utt_id": tf.TensorShape([]), }
- Return type
dict
- property sample_signature¶
@property
- Returns
sample_signature of the dataset:
{ "input": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32), "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32), "utt_id": tf.TensorSpec(shape=(None), dtype=tf.string), }
- Return type
dict
- default_config¶
- preprocess_data(file_path)¶
generate a list of tuples (wav_filename, wav_length_ms, transcript, speaker).
- storage_features_offline()¶
- __getitem__(index)¶
get a sample
- Parameters
index (int) – index of the entries
- Returns
sample:
{ "input": feat, "input_length": feat_length, "output_length": label_length, "output": label, "utt_id": utt_id }
- Return type
dict
- class athena.data.SpeechRecognitionDatasetBatchBinsBuilder(config=None)¶
Bases:
athena.data.datasets.asr.speech_recognition.SpeechRecognitionDatasetBuilder
SpeechRecognitionDatasetBatchBinsBuilder
- property sample_shape_batch_bins¶
@property
- Returns
sample_shape of the dataset:
{ "input": tf.TensorShape([None, None, dim, nc]), "input_length": tf.TensorShape([None]), "output_length": tf.TensorShape([None]), "output": tf.TensorShape([None, None]), }
- Return type
dict
- default_config¶
- preprocess_data(file_path)¶
- __getitem__(index)¶
- __len__()¶
- as_dataset(batch_size=16, num_threads=1)¶
return tf.data.Dataset object
- shard(num_shards, index)¶
creates a Dataset that includes only 1/num_shards of this dataset
- batch_wise_shuffle(batch_size=1, epoch=-1, seed=917)¶
Batch-wise shuffling of the data entries.
- Parameters
batch_size (int, optional) – an integer for the batch size. Defaults to 1
. (in batch_bins mode) –
- class athena.data.SpeechRecognitionDatasetKaldiIOBuilder(config=None)¶
Bases:
athena.data.datasets.base.SpeechBaseDatasetBuilder
SpeechRecognitionDatasetKaldiIOBuilder
- property num_class¶
return the max_index of the vocabulary + 1
- property sample_type¶
@property
- Returns
sample_type of the dataset:
{ "input": tf.float32, "input_length": tf.int32, "output_length": tf.int32, "output": tf.int32, }
- Return type
dict
- property sample_shape¶
@property
- Returns
sample_shape of the dataset:
{ "input": tf.TensorShape([None, dim, nc]), "input_length": tf.TensorShape([]), "output_length": tf.TensorShape([]), "output": tf.TensorShape([None]), }
- Return type
dict
- property sample_signature¶
@property
- Returns
sample_signature of the dataset:
{ "input": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32), "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32), }
- Return type
dict
- default_config¶
- preprocess_kaldi_data(file_dir, apply_sort_filter=True)¶
Generate a list of tuples (feat_key, speaker).
- __getitem__(index)¶
- compute_cmvn_if_necessary(is_necessary=True)¶
compute cmvn file
- class athena.data.SpeechRecognitionDatasetBatchBinsKaldiIOBuilder(config=None)¶
Bases:
athena.data.datasets.asr.speech_recognition_kaldiio.SpeechRecognitionDatasetKaldiIOBuilder
SpeechRecognitionDatasetBatchBinsKaldiIOBuilder
- property sample_shape_batch_bins¶
@property
- Returns
sample_shape of the dataset:
{ "input": tf.TensorShape([None, None, dim, nc]), "input_length": tf.TensorShape([None]), "output_length": tf.TensorShape([None]), "output": tf.TensorShape([None, None]), }
- Return type
dict
- default_config¶
- preprocess_kaldi_data(file_dir, apply_sort_filter=True)¶
- read_shape_file(file_dir=None)¶
- __getitem__(index)¶
- __len__()¶
- as_dataset(batch_size=16, num_threads=1)¶
return tf.data.Dataset object
- shard(num_shards, index)¶
creates a Dataset that includes only 1/num_shards of this dataset
- batch_wise_shuffle(batch_size=1, epoch=-1, seed=917)¶
Batch-wise shuffling of the data entries.
- Parameters
batch_size (int, optional) – an integer for the batch size. Defaults to 1
. (in batch_bins mode) –
- class athena.data.AudioVedioRecognitionDatasetBuilder(config=None)¶
Bases:
athena.data.datasets.base.SpeechBaseDatasetBuilder
SpeechRecognitionDatasetBuilder
- property num_class¶
return the max_index of the vocabulary + 1
- property sample_type¶
@property
- Returns
sample_type of the dataset:
{ "input": tf.float32, "input_length": tf.int32, "output_length": tf.int32, "output": tf.int32, }
- Return type
dict
- property sample_shape¶
@property
- Returns
sample_shape of the dataset:
{ "input": tf.TensorShape([None, dim, nc]), "input_length": tf.TensorShape([]), "output_length": tf.TensorShape([]), "output": tf.TensorShape([None]), }
- Return type
dict
- property sample_signature¶
@property
- Returns
sample_signature of the dataset:
{ "input": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32), "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32), }
- Return type
dict
- default_config¶
- video_scp_loader(scp_dir)¶
load video list from scp file return a dic
- image_normalizer(image)¶
- preprocess_data(file_path)¶
generate a list of tuples (wav_filename, wav_length_ms, transcript, speaker).
- storage_features_offline()¶
- __getitem__(index)¶
get a sample
- Parameters
index (int) – index of the entries
- Returns
sample:
{ "input": feat, "input_length": feat_length, "output_length": label_length, "output": label, }
- Return type
dict
- class athena.data.AudioVedioRecognitionDatasetBatchBinsBuilder(config=None)¶
Bases:
athena.data.datasets.asr.speech_recognition.SpeechRecognitionDatasetBuilder
SpeechRecognitionDatasetBatchBinsBuilder
- property sample_shape_batch_bins¶
@property
- Returns
sample_shape of the dataset:
{ "input": tf.TensorShape([None, None, dim, nc]), "video":tf.TensorShape([None, None, high, wide]), "input_length": tf.TensorShape([None]), "output_length": tf.TensorShape([None]), "output": tf.TensorShape([None, None]), "utt_id": tf.TensorShape([None]), }
- Return type
dict
- property sample_shape¶
@property
- Returns
sample_shape of the dataset:
{ "input": tf.TensorShape([None, dim, nc]), "video": tf.TensorShape([None, None, None]), "input_length": tf.TensorShape([]), "output_length": tf.TensorShape([]), "output": tf.TensorShape([None]), "utt_id": tf.TensorShape([]), }
- Return type
dict
- property sample_type¶
@property
- Returns
sample_type of the dataset:
{ "input": tf.float32, "input_length": tf.int32, "output_length": tf.int32, "output": tf.int32, "utt_id": tf.string, }
- Return type
dict
- property sample_signature¶
@property
- Returns
sample_signature of the dataset:
{ "input": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32), "video": tf.TensorSpec(shape=(None, None, None, None), dtype=tf.float32), "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32), "utt_id": tf.TensorSpec(shape=(None), dtype=tf.string), }
- Return type
dict
- default_config¶
- preprocess_data(file_path)¶
generate a list of tuples (wav_filename, wav_length_ms, transcript, speaker).
- __getitem__(index)¶
get a sample
- Parameters
index (int) – index of the entries
- Returns
sample:
{ "input": feat, "input_length": feat_length, "output_length": label_length, "output": label, "utt_id": utt_id }
- Return type
dict
- __len__()¶
- as_dataset(batch_size=16, num_threads=1)¶
return tf.data.Dataset object
- shard(num_shards, index)¶
creates a Dataset that includes only 1/num_shards of this dataset
- batch_wise_shuffle(batch_size=1, epoch=-1, seed=917)¶
Batch-wise shuffling of the data entries.
- Parameters
batch_size (int, optional) – an integer for the batch size. Defaults to 1
. (in batch_bins mode) –
- class athena.data.SpeechSynthesisDatasetBuilder(config=None)¶
Bases:
athena.data.datasets.base.SpeechBaseDatasetBuilder
SpeechSynthesisDatasetBuilder
- property num_class¶
@property
- Returns
the max_index of the vocabulary
- Return type
int
- property feat_dim¶
return the number of feature dims
- property sample_type¶
@property
- Returns
sample_type of the dataset:
{ "utt_id": tf.string, "input": tf.int32, "input_length": tf.int32, "output_length": tf.int32, "output": tf.float32, "speaker": tf.int32 }
- Return type
dict
- property sample_shape¶
@property
- Returns
sample_shape of the dataset:
{ "utt_id": tf.TensorShape([]), "input": tf.TensorShape([None]), "input_length": tf.TensorShape([]), "output_length": tf.TensorShape([]), "output": tf.TensorShape([None, feature_dim]), "speaker": tf.TensorShape([]) }
- Return type
dict
- property sample_signature¶
@property
- Returns
sample_signature of the dataset:
{ "utt_id": tf.TensorSpec(shape=(None), dtype=tf.string), "input": tf.TensorSpec(shape=(None, None), dtype=tf.int32), "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output": tf.TensorSpec(shape=(None, None, feature_dim), dtype=tf.float32), "speaker": tf.TensorSpec(shape=(None), dtype=tf.int32) }
- Return type
dict
- default_config¶
- preprocess_data(file_path)¶
generate a list of tuples (wav_filename, wav_length_ms, transcript, speaker).
- __getitem__(index)¶
- class athena.data.SpeechFastspeech2DatasetBuilder(config=None)¶
Bases:
athena.data.datasets.base.BaseDatasetBuilder
SpeechSynthesisDatasetBuilder
- property num_class¶
@property
- Returns
the max_index of the vocabulary
- Return type
int
- property feat_dim¶
return the number of feature dims
- property sample_type¶
@property
- Returns
sample_type of the dataset:
{ "utt_id": tf.string, "input": tf.int32, "input_length": tf.int32, "output_length": tf.int32, "output": tf.float32, "speaker": tf.int32, "duration": tf.int32 }
- Return type
dict
- property sample_shape¶
@property
- Returns
sample_shape of the dataset:
{ "utt_id": tf.TensorShape([]), "input": tf.TensorShape([None]), "input_length": tf.TensorShape([]), "output_length": tf.TensorShape([]), "output": tf.TensorShape([None, feature_dim]), "f0": tf.TensorShape([None]), "energy": tf.TensorShape([None]), "speaker": tf.TensorShape([]), "duration": tf.TensorShape([None]) }
- Return type
dict
- property sample_signature¶
@property
- Returns
sample_signature of the dataset:
{ "utt_id": tf.TensorSpec(shape=(None), dtype=tf.string), "input": tf.TensorSpec(shape=(None, None), dtype=tf.int32), "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output": tf.TensorSpec(shape=(None, None, feature_dim), dtype=tf.float32), "f0": tf.TensorSpec(shape=(None, None), dtype=tf.float32), "energy": tf.TensorSpec(shape=(None, None), dtype=tf.float32), "speaker": tf.TensorSpec(shape=(None), dtype=tf.int32) }
- Return type
dict
- default_config¶
- load_duration(duration)¶
- preprocess_data(file_path)¶
generate a list of tuples (audio_feature, wav_length_ms, transcript, duration, speaker).
- load_audio_feature(audio_feature_file)¶
- __getitem__(index)¶
- compute_cmvn_if_necessary(is_necessary=True)¶
compute cmvn file
- class athena.data.SpeechSynthesisTestDatasetBuilder(data_csv, config=None)¶
Bases:
athena.data.datasets.base.BaseDatasetBuilder
SpeechSynthesisDatasetBuilder
- property num_class¶
@property
- Returns
the max_index of the vocabulary
- Return type
int
- property sample_type¶
@property
- Returns
sample_type of the dataset:
{ "utt_id": tf.string, "input": tf.int32, "input_length": tf.int32, "speaker": tf.int32, }
- Return type
dict
- property sample_shape¶
@property
- Returns
sample_shape of the dataset:
{ "utt_id": tf.TensorShape([]), "input": tf.TensorShape([None]), "input_length": tf.TensorShape([]), "speaker": tf.TensorShape([]), }
- Return type
dict
- property sample_signature¶
@property
- Returns
sample_signature of the dataset:
{ "utt_id": tf.TensorSpec(shape=(None), dtype=tf.string), "input": tf.TensorSpec(shape=(None, None), dtype=tf.int32), "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "speaker": tf.TensorSpec(shape=(None), dtype=tf.int32) }
- Return type
dict
- default_config¶
- preprocess_data(file_path)¶
generate a list of tuples (utt_id, transcript).
- __getitem__(index)¶
- compute_cmvn_if_necessary(is_necessary=True)¶
compute cmvn file
- class athena.data.SpeechWakeupFramewiseDatasetKaldiIOBuilder(config=None)¶
Bases:
athena.data.datasets.base.BaseDatasetBuilder
Dataset builder for CNN model. The builder treat every spliced frame as one image. For example (21, 63) The input data format is (batch, timestep, height, width, channel) For example (b, t, 21, 63, 1) unbatch are used to split The output data format is, for example, (b, 21, 63, 1)
- property sample_type¶
example types
- property sent_sample_shape¶
- property sample_shape¶
examples shapes
- property sample_signature¶
examples signature
- default_config¶
- preprocess_data(data_dir='')¶
loading data
- __getitem__(index)¶
- splice_feature(feature, input_left_context, input_right_context)¶
splice features according to input_left_context and input_right_context input_left_context: the left features to be spliced,
repeat the first frame in case out the range
- input_right_context: the right features to be spliced,
repeat the last frame in case out the range
- Parameters
feature – the input features, shape may be [timestamp, dim, 1]
- Returns
the spliced features
- Return type
splice_feat
- class athena.data.SpeechWakeupDatasetKaldiIOBuilder(config=None)¶
Bases:
athena.data.datasets.base.BaseDatasetBuilder
Dataset builder for RNN model. The builder mix the spliced frame in one dim For example (1, 1323) The input data format is (batch, t, dim, channel) For example (b, t, 1323, 1) The output data format is (batch, timestep)
- property sample_type¶
example types
- property sample_shape¶
examples shapes
- property sample_signature¶
examples signature
- default_config¶
- preprocess_data(data_dir='')¶
loading data
- __getitem__(index)¶
- splice_feature(feature, input_left_context, input_right_context)¶
splice features according to input_left_context and input_right_context input_left_context: the left features to be spliced,
repeat the first frame in case out the range
- input_right_context: the right features to be spliced,
repeat the last frame in case out the range
- Parameters
feature – the input features, shape may be [timestamp, dim, 1]
- Returns
the spliced features
- Return type
splice_feat
- class athena.data.SpeechWakeupDatasetKaldiIOBuilderAVCE(config=None)¶
Bases:
athena.data.datasets.base.BaseDatasetBuilder
Dataset builder for RNN model. The builder mix the spliced frame in one dim For example (1, 1323) The input data format is (batch, t, dim, channel) For example (b, t, 1323, 1) The output data format is (batch, timestep)
- property sample_type¶
example types
- property sample_shape¶
examples shapes
- property sample_signature¶
examples signature
- default_config¶
- preprocess_data(data_dir='')¶
loading data
- video_scp_loader(scp_dir)¶
load video list from scp file return a dic
- __getitem__(index)¶
- splice_feature(feature, input_left_context, input_right_context)¶
splice features according to input_left_context and input_right_context input_left_context: the left features to be spliced,
repeat the first frame in case out the range
- input_right_context: the right features to be spliced,
repeat the last frame in case out the range
- Parameters
feature – the input features, shape may be [timestamp, dim, 1]
- Returns
the spliced features
- Return type
splice_feat
- class athena.data.LanguageDatasetBuilder(config=None)¶
Bases:
athena.data.datasets.base.BaseDatasetBuilder
LanguageDatasetBuilder
- property num_class¶
@property
- Returns
the max_index of the vocabulary
- Return type
int
- property input_vocab_size¶
@property
- Returns
the input vocab size
- Return type
int
- property sample_type¶
@property
- Returns
sample_type of the dataset:
{ "input": tf.int32, "input_length": tf.int32, "output": tf.int32, "output_length": tf.int32, }
- Return type
dict
- property sample_shape¶
@property
- Returns
sample_shape of the dataset:
{ "input": tf.TensorShape([None]), "input_length": tf.TensorShape([]), "output": tf.TensorShape([None]), "output_length": tf.TensorShape([]), }
- Return type
dict
- property sample_signature¶
@property
- Returns
sample_signature of the dataset:
{ "input": tf.TensorSpec(shape=(None, None), dtype=tf.int32), "input_length": tf.TensorSpec(shape=([None]), dtype=tf.int32), "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32), "output_length": tf.TensorSpec(shape=([None]), dtype=tf.int32), }
- Return type
dict
- default_config¶
- preprocess_data(file_path)¶
load csv file
- __getitem__(index)¶
get a sample
- Parameters
index (int) – index of the entries
- Returns
sample:
{ "input": input_labels, "input_length": input_length, "output": output_labels, "output_length": output_length, }
- Return type
dict
- class athena.data.MpcSpeechDatasetBuilder(config=None)¶
Bases:
athena.data.datasets.base.SpeechBaseDatasetBuilder
SpeechDatasetBuilder This data builder is a online feature extractor and is used to mcp training
- property num_class¶
@property
- Returns
the target dim
- Return type
int
- property sample_type¶
@property
- Returns
sample_type of the dataset:
{ "input": tf.float32, "input_length": tf.int32, "output": tf.float32, "output_length": tf.int32, }
- Return type
dict
- property sample_shape¶
@property
- Returns
sample_shape of the dataset:
{ "input": tf.TensorShape( [None, self.audio_featurizer.dim, self.audio_featurizer.num_channels] ), "input_length": tf.TensorShape([]), "output": tf.TensorShape([None, None]), "output_length": tf.TensorShape([]), }
- Return type
dict
- property sample_signature¶
@property
- Returns
sample_signature of the dataset:
{ "input": tf.TensorSpec( shape=(None, None, None, None), dtype=tf.float32 ), "input_length": tf.TensorSpec(shape=([None]), dtype=tf.int32), "output": tf.TensorSpec(shape=(None, None, None), dtype=tf.float32), "output_length": tf.TensorSpec(shape=([None]), dtype=tf.int32), }
- Return type
dict
- default_config¶
- preprocess_data(file_path)¶
generate a list of tuples (wav_filename, wav_length_ms, speaker).
- __getitem__(index)¶
get a sample
- Parameters
index (int) – index of the entries
- Returns
sample:
{ "input": input_data, "input_length": input_data.shape[0], "output": output_data, "output_length": output_data.shape[0], }
- Return type
dict
- class athena.data.MpcSpeechDatasetKaldiIOBuilder(config=None)¶
Bases:
athena.data.datasets.mpc.mpc_speech_set.MpcSpeechDatasetBuilder
MpcSpeechDatasetKaldiIOBuilder This data builder is a offline feature data builder and is used to mcp training
- default_config¶
- preprocess_data(file_path, apply_sort_filter=True)¶
generate a list of tuples (feat_key, speaker).
- __getitem__(index)¶
- compute_cmvn_if_necessary(is_necessary=True)¶
compute cmvn file
- class athena.data.VoiceActivityDetectionDatasetKaldiIOBuilder(config=None)¶
Bases:
athena.data.datasets.base.SpeechBaseDatasetBuilder
VoiceActivityDetectionDatasetKaldiIOBuilder
- property sample_type¶
@property
- Returns
sample_type of the dataset:
{ "input": tf.float32, "input_length": tf.int32, "output_length": tf.int32, "output": tf.int32, }
- Return type
dict
- property sample_shape¶
@property
- Returns
sample_shape of the dataset:
{ "input": tf.TensorShape([None, dim, nc]), "input_length": tf.TensorShape([]), "output_length": tf.TensorShape([]), "output": tf.TensorShape([None]), "utt": tf.TensorShape([]), }
- Return type
dict
- property sample_signature¶
@property
- Returns
sample_signature of the dataset:
{ "input": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32), "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32), utt": tf.TensorSpec(shape=(None), dtype=tf.string), }
- Return type
dict
- default_config¶
- preprocess_data(data_scps_dir)¶
generate a list of tuples (wav_filename, wav_offset, wav_length_ms, transcript, label).
- splice_feature(feature)¶
splice features according to input_left_context and input_right_context input_left_context: the left features to be spliced,
repeat the first frame in case out the range
- input_right_context: the right features to be spliced,
repeat the last frame in case out the range
- Parameters
feature – the input features, shape may be [timestamp, dim, 1]
- Returns
the spliced features
- Return type
splice_feat
- __getitem__(index)¶
get a sample
- Parameters
index (int) – index of the entries
- Returns
sample:
{ "input": feat, "input_length": feat_length, "output_length": label_length, "output": label, "utt": utt }
- Return type
dict
- class athena.data.FeatureNormalizer(cmvn_file=None)¶
Feature Normalizer
- __call__(feat_data, speaker, reverse=False)¶
- apply_cmvn(feat_data, speaker, reverse=False)¶
transform original feature to normalized feature
- compute_cmvn(entries, speakers, featurizer, feature_dim, num_cmvn_workers=1)¶
compute cmvn for filtered entries
- compute_cmvn_by_chunk_for_all_speaker(feature_dim, speakers, featurizer, entries)¶
because of memory issue, we used incremental approximation for the calculation of cmvn
- compute_cmvn_kaldiio(entries, speakers, kaldi_io_feats, feature_dim)¶
compute cmvn for filtered entries using kaldi-format data
- load_cmvn()¶
load mean and var
- save_cmvn(variable_list)¶
save cmvn variables determined by variable_list to file
- Parameters
variable_list (list) – e.g. [“speaker”, “mean”, “var”]
- class athena.data.FS2FeatureNormalizer(cmvn_file=None)¶
Bases:
FeatureNormalizer
Fastspeech2 Feature Normalizer
- __call__(feat_data, speaker, feature_type='mel', reverse=False)¶
- compute_fs2_cmvn(entries, speakers, num_cmvn_workers=1)¶
compuate cmvn of mel-spec,f0 and energy
- apply_cmvn(feat_data, speaker, feature_type='mel', reverse=False)¶
transform original feature to normalized feature
- load_cmvn()¶
load mel_mean, mel_var, f0_mean, f0_var and energy_mean, energy_var
- class athena.data.TextFeaturizer(config=None)¶
The main text featurizer interface
- property model_type¶
@property
- Returns
the model type
- property unk_index¶
@property
- Returns
the unk index
- Return type
int
- supported_model¶
- default_config¶
- load_model(model_file)¶
load model
- delete_punct(tokens)¶
delete punctuation tokens
- __len__()¶
- encode(texts)¶
convert a sentence to a list of ids, with special tokens added.
- decode(sequences)¶
conver a list of ids to a sentence
- decode_to_list(sequences, ignored_id=[])¶
- class athena.data.SentencePieceFeaturizer(spm_file)¶
SentencePieceFeaturizer using tensorflow-text api
- load_model(model_file)¶
load sentence piece model
- __len__()¶
- encode(sentence)¶
convert a sentence to a list of ids by sentence piece model
- decode(ids)¶
convert a list of ids to a sentence
- decode_to_list(ids, ignored_id=[])¶
- class athena.data.TextTokenizer(text=None)¶
TextTokenizer
- load_model(text)¶
load model
- save_vocab(vocab_file)¶
- load_csv(csv_file)¶
- __len__()¶
- encode(texts)¶
convert a sentence to a list of ids, with special tokens added.
- decode(sequences)¶
conver a list of ids to a sentence
- decode_to_list(ids, ignored_id=[])¶