`athena.data.datasets.asr.audio_video_recognition_batch_bins`¶

audio dataset

Module Contents¶

Classes¶

AudioVedioRecognitionDatasetBatchBinsBuilder

SpeechRecognitionDatasetBatchBinsBuilder

Functions¶

`data_loader`(dataset_builder[, batch_size, num_threads])	data loader
`video_scp_loader`(scp_dir)	load video list from scp file
`image_normalizer`(image)	Image Normalization

athena.data.datasets.asr.audio_video_recognition_batch_bins.data_loader(dataset_builder, batch_size=1, num_threads=1)¶: data loader

athena.data.datasets.asr.audio_video_recognition_batch_bins.video_scp_loader(scp_dir)¶: load video list from scp file return a dic

athena.data.datasets.asr.audio_video_recognition_batch_bins.image_normalizer(image)¶: Image Normalization http://dev.ipol.im/~nmonzon/Normalization.pdf

class athena.data.datasets.asr.audio_video_recognition_batch_bins.AudioVedioRecognitionDatasetBatchBinsBuilder(config=None)¶

Bases: athena.data.datasets.asr.speech_recognition.SpeechRecognitionDatasetBuilder

SpeechRecognitionDatasetBatchBinsBuilder

property sample_shape_batch_bins¶

@property

Returns

sample_shape of the dataset:

{
    "input": tf.TensorShape([None, None, dim, nc]),
    "video":tf.TensorShape([None, None, high, wide]),
    "input_length": tf.TensorShape([None]),
    "output_length": tf.TensorShape([None]),
    "output": tf.TensorShape([None, None]),
    "utt_id": tf.TensorShape([None]),
}

Return type

dict

property sample_shape¶

@property

Returns

sample_shape of the dataset:

{
    "input": tf.TensorShape([None, dim, nc]),
    "video": tf.TensorShape([None, None, None]),
    "input_length": tf.TensorShape([]),
    "output_length": tf.TensorShape([]),
    "output": tf.TensorShape([None]),
    "utt_id": tf.TensorShape([]),
}

Return type

dict

property sample_type¶

@property

Returns

sample_type of the dataset:

{
    "input": tf.float32,
    "input_length": tf.int32,
    "output_length": tf.int32,
    "output": tf.int32,
    "utt_id": tf.string,
}

Return type

dict

property sample_signature¶

@property

Returns

sample_signature of the dataset:

{
    "input": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32),
    "video": tf.TensorSpec(shape=(None, None, None, None), dtype=tf.float32),
    "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32),
    "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32),
    "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32),
    "utt_id": tf.TensorSpec(shape=(None), dtype=tf.string),
}

Return type

dict

default_config¶

preprocess_data(file_path)¶: generate a list of tuples (wav_filename, wav_length_ms, transcript, speaker).

__getitem__(index)¶

get a sample

Parameters

index (int) – index of the entries

Returns

sample:

{
    "input": feat,
    "input_length": feat_length,
    "output_length": label_length,
    "output": label,
    "utt_id": utt_id
}

Return type

dict

__len__()¶

as_dataset(batch_size=16, num_threads=1)¶: return tf.data.Dataset object

shard(num_shards, index)¶: creates a Dataset that includes only 1/num_shards of this dataset

batch_wise_shuffle(batch_size=1, epoch=-1, seed=917)¶

Batch-wise shuffling of the data entries.

Parameters

batch_size (int, optional) – an integer for the batch size. Defaults to 1
. (in batch_bins mode) –

athena.data.datasets.asr.audio_video_recognition_batch_bins¶

Module Contents¶

Classes¶

Functions¶

`athena.data.datasets.asr.audio_video_recognition_batch_bins`¶