athena.data.datasets.asr.audio_video_recognition_batch_bins
¶
audio dataset
Module Contents¶
Classes¶
SpeechRecognitionDatasetBatchBinsBuilder |
Functions¶
|
data loader |
|
load video list from scp file |
|
Image Normalization |
- athena.data.datasets.asr.audio_video_recognition_batch_bins.data_loader(dataset_builder, batch_size=1, num_threads=1)¶
data loader
- athena.data.datasets.asr.audio_video_recognition_batch_bins.video_scp_loader(scp_dir)¶
load video list from scp file return a dic
- athena.data.datasets.asr.audio_video_recognition_batch_bins.image_normalizer(image)¶
Image Normalization http://dev.ipol.im/~nmonzon/Normalization.pdf
- class athena.data.datasets.asr.audio_video_recognition_batch_bins.AudioVedioRecognitionDatasetBatchBinsBuilder(config=None)¶
Bases:
athena.data.datasets.asr.speech_recognition.SpeechRecognitionDatasetBuilder
SpeechRecognitionDatasetBatchBinsBuilder
- property sample_shape_batch_bins¶
@property
- Returns
sample_shape of the dataset:
{ "input": tf.TensorShape([None, None, dim, nc]), "video":tf.TensorShape([None, None, high, wide]), "input_length": tf.TensorShape([None]), "output_length": tf.TensorShape([None]), "output": tf.TensorShape([None, None]), "utt_id": tf.TensorShape([None]), }
- Return type
dict
- property sample_shape¶
@property
- Returns
sample_shape of the dataset:
{ "input": tf.TensorShape([None, dim, nc]), "video": tf.TensorShape([None, None, None]), "input_length": tf.TensorShape([]), "output_length": tf.TensorShape([]), "output": tf.TensorShape([None]), "utt_id": tf.TensorShape([]), }
- Return type
dict
- property sample_type¶
@property
- Returns
sample_type of the dataset:
{ "input": tf.float32, "input_length": tf.int32, "output_length": tf.int32, "output": tf.int32, "utt_id": tf.string, }
- Return type
dict
- property sample_signature¶
@property
- Returns
sample_signature of the dataset:
{ "input": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32), "video": tf.TensorSpec(shape=(None, None, None, None), dtype=tf.float32), "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32), "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32), "utt_id": tf.TensorSpec(shape=(None), dtype=tf.string), }
- Return type
dict
- default_config¶
- preprocess_data(file_path)¶
generate a list of tuples (wav_filename, wav_length_ms, transcript, speaker).
- __getitem__(index)¶
get a sample
- Parameters
index (int) – index of the entries
- Returns
sample:
{ "input": feat, "input_length": feat_length, "output_length": label_length, "output": label, "utt_id": utt_id }
- Return type
dict
- __len__()¶
- as_dataset(batch_size=16, num_threads=1)¶
return tf.data.Dataset object
- shard(num_shards, index)¶
creates a Dataset that includes only 1/num_shards of this dataset
- batch_wise_shuffle(batch_size=1, epoch=-1, seed=917)¶
Batch-wise shuffling of the data entries.
- Parameters
batch_size (int, optional) – an integer for the batch size. Defaults to 1
. (in batch_bins mode) –