athena.data.datasets.asr.audio_video_recognition

audio dataset

Module Contents

Classes

AudioVedioRecognitionDatasetBuilder

SpeechRecognitionDatasetBuilder

class athena.data.datasets.asr.audio_video_recognition.AudioVedioRecognitionDatasetBuilder(config=None)

Bases: athena.data.datasets.base.SpeechBaseDatasetBuilder

SpeechRecognitionDatasetBuilder

property num_class

return the max_index of the vocabulary + 1

property sample_type

@property

Returns

sample_type of the dataset:

{
    "input": tf.float32,
    "input_length": tf.int32,
    "output_length": tf.int32,
    "output": tf.int32,
}

Return type

dict

property sample_shape

@property

Returns

sample_shape of the dataset:

{
    "input": tf.TensorShape([None, dim, nc]),
    "input_length": tf.TensorShape([]),
    "output_length": tf.TensorShape([]),
    "output": tf.TensorShape([None]),
}

Return type

dict

property sample_signature

@property

Returns

sample_signature of the dataset:

{
    "input": tf.TensorSpec(shape=(None, None, dim, nc), dtype=tf.float32),
    "input_length": tf.TensorSpec(shape=(None), dtype=tf.int32),
    "output_length": tf.TensorSpec(shape=(None), dtype=tf.int32),
    "output": tf.TensorSpec(shape=(None, None), dtype=tf.int32),
}

Return type

dict

default_config
video_scp_loader(scp_dir)

load video list from scp file return a dic

image_normalizer(image)
preprocess_data(file_path)

generate a list of tuples (wav_filename, wav_length_ms, transcript, speaker).

storage_features_offline()
__getitem__(index)

get a sample

Parameters

index (int) – index of the entries

Returns

sample:

{
    "input": feat,
    "input_length": feat_length,
    "output_length": label_length,
    "output": label,
}

Return type

dict