athena.data.datasets.vad.vad_set_kaldiio
¶
audio dataset
Module Contents¶
Classes¶
VoiceActivityDetectionDatasetKaldiIOBuilder |
- class athena.data.datasets.vad.vad_set_kaldiio.VoiceActivityDetectionDatasetKaldiIOBuilder(config=None)¶
Bases:
athena.data.datasets.base.SpeechBaseDatasetBuilder
VoiceActivityDetectionDatasetKaldiIOBuilder
- property sample_type¶
@property
- Returns
sample_type of the dataset:
{ "input": tf.float32, "input_length": tf.int32, "output_length": tf.int32, "output": tf.int32, }
- Return type
dict
- property sample_shape¶
@property
- Returns
sample_shape of the dataset:
{ "input": tf.TensorShape([None, dim, nc]), "input_length": tf.TensorShape([]), "output_length": tf.TensorShape([]), "output": tf.TensorShape([None]), "utt": tf.TensorShape([]), }
- Return type
dict
- property sample_signature¶
@property
- Returns
sample_signature of the dataset:
{ "input": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32), "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32), utt": tf.TensorSpec(shape=(None), dtype=tf.string), }
- Return type
dict
- default_config¶
- preprocess_data(data_scps_dir)¶
generate a list of tuples (wav_filename, wav_offset, wav_length_ms, transcript, label).
- splice_feature(feature)¶
splice features according to input_left_context and input_right_context input_left_context: the left features to be spliced,
repeat the first frame in case out the range
- input_right_context: the right features to be spliced,
repeat the last frame in case out the range
- Parameters
feature – the input features, shape may be [timestamp, dim, 1]
- Returns
the spliced features
- Return type
splice_feat
- __getitem__(index)¶
get a sample
- Parameters
index (int) – index of the entries
- Returns
sample:
{ "input": feat, "input_length": feat_length, "output_length": label_length, "output": label, "utt": utt }
- Return type
dict