athena.loss

some losses

Module Contents

Classes

CTCLoss

CTC LOSS

Seq2SeqSparseCategoricalCrossentropy

Seq2SeqSparseCategoricalCrossentropy LOSS

MPCLoss

MPC LOSS

Tacotron2Loss

Tacotron2 Loss

GuidedAttentionLoss

GuidedAttention Loss to make attention alignments more monotonic

GuidedMultiHeadAttentionLoss

Guided multihead attention loss function module for multi head attention.

FastSpeechLoss

used for training of fastspeech

FastSpeech2Loss

used for training of fastspeech2

SoftmaxLoss

Softmax Loss

AMSoftmaxLoss

Additive Margin Softmax Loss

AAMSoftmaxLoss

Additive Angular Margin Softmax Loss

ProtoLoss

Prototypical Loss

AngleProtoLoss

Angular Prototypical Loss

GE2ELoss

Generalized End-to-end Loss

class athena.loss.CTCLoss(logits_time_major=False, blank_index=-1, name='CTCLoss')

Bases: tensorflow.keras.losses.Loss

CTC LOSS CTC LOSS implemented with Tensorflow

__call__(logits, samples, logit_length=None)
class athena.loss.Seq2SeqSparseCategoricalCrossentropy(num_classes, eos=-1, by_token=False, by_sequence=True, from_logits=True, label_smoothing=0.0)

Bases: tensorflow.keras.losses.CategoricalCrossentropy

Seq2SeqSparseCategoricalCrossentropy LOSS CategoricalCrossentropy calculated at each character for each sequence in a batch

__call__(logits, samples, logit_length=None)
class athena.loss.MPCLoss(name='MPCLoss')

Bases: tensorflow.keras.losses.Loss

MPC LOSS L1 loss for each masked acoustic features in a batch

__call__(logits, samples, logit_length=None)
class athena.loss.Tacotron2Loss(model, guided_attn_loss_function, regularization_weight=0.0, l1_loss_weight=0.0, mask_decoder=False, pos_weight=1.0, name='Tacotron2Loss')

Bases: tensorflow.keras.losses.Loss

Tacotron2 Loss

__call__(outputs, samples, logit_length=None)
Parameters

outputs – contain elements below: att_ws_stack: shape: [batch, y_steps, x_steps]

class athena.loss.GuidedAttentionLoss(guided_attn_weight, reduction_factor, attn_sigma=0.4, name='GuidedAttentionLoss')

Bases: tensorflow.keras.losses.Loss

GuidedAttention Loss to make attention alignments more monotonic

__call__(att_ws_stack, samples)
_create_attention_masks(input_length, output_length)

masks created by attention location

Parameters
  • input_length – shape: [batch_size]

  • output_length – shape: [batch_size]

Returns

shape: [batch_size, 1, y_steps, x_steps]

Return type

masks

_create_length_masks(input_length, output_length)

masks created by input and output length

Parameters
  • input_length – shape: [batch_size]

  • output_length – shape: [batch_size]

Returns

shape: [batch_size, 1, output_length, input_length]

Return type

masks

Examples

output_length: [6, 8] input_length: [3, 5] masks:

[[[1, 1, 1, 0, 0],

[1, 1, 1, 0, 0], [1, 1, 1, 0, 0], [1, 1, 1, 0, 0], [1, 1, 1, 0, 0], [1, 1, 1, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]],

[[1, 1, 1, 1, 1],

[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]]]

class athena.loss.GuidedMultiHeadAttentionLoss(guided_attn_weight, reduction_factor, attn_sigma=0.4, num_heads=2, num_layers=2, name='GuidedMultiHeadAttentionLoss')

Bases: GuidedAttentionLoss

Guided multihead attention loss function module for multi head attention.

__call__(att_ws_stack, samples)
class athena.loss.FastSpeechLoss(duration_predictor_loss_weight, eps=1.0, use_mask=True, teacher_guide=False)

Bases: tensorflow.keras.losses.Loss

used for training of fastspeech

__call__(outputs, samples)

Its corresponding log value is calculated to make it Gaussian. :param outputs: it contains four elements:

before_outs: outputs before postnet, shape: [batch, y_steps, feat_dim] teacher_outs: teacher outputs, shape: [batch, y_steps, feat_dim] after_outs: outputs after postnet, shape: [batch, y_steps, feat_dim] duration_sequences: duration predictions from teacher model, shape: [batch, x_steps] pred_duration_sequences: duration predictions from trained predictor

shape: [batch, x_steps]

Parameters

samples – samples from dataset

class athena.loss.FastSpeech2Loss(variant_predictor_loss_weight, eps=1.0, use_mask=True)

Bases: tensorflow.keras.losses.Loss

used for training of fastspeech2

__call__(outputs, samples)
class athena.loss.SoftmaxLoss(embedding_size, num_classes, name='SoftmaxLoss')

Bases: tensorflow.keras.losses.Loss

Softmax Loss Similar to this implementation “https://github.com/clovaai/voxceleb_trainer

__call__(outputs, samples, logit_length=None)
class athena.loss.AMSoftmaxLoss(embedding_size, num_classes, m=0.3, s=15, name='AMSoftmaxLoss')

Bases: tensorflow.keras.losses.Loss

Additive Margin Softmax Loss Reference to paper “CosFace: Large Margin Cosine Loss for Deep Face Recognition”

and “In defence of metric learning for speaker recognition”

Similar to this implementation “https://github.com/clovaai/voxceleb_trainer

__call__(outputs, samples, logit_length=None)
class athena.loss.AAMSoftmaxLoss(embedding_size, num_classes, m=0.3, s=15, easy_margin=False, name='AAMSoftmaxLoss')

Bases: tensorflow.keras.losses.Loss

Additive Angular Margin Softmax Loss Reference to paper “ArcFace: Additive Angular Margin Loss for Deep Face Recognition”

and “In defence of metric learning for speaker recognition”

Similar to this implementation “https://github.com/clovaai/voxceleb_trainer

__call__(outputs, samples, logit_length=None)
class athena.loss.ProtoLoss(name='ProtoLoss')

Bases: tensorflow.keras.losses.Loss

Prototypical Loss Reference to paper “Prototypical Networks for Few-shot Learning”

and “In defence of metric learning for speaker recognition”

Similar to this implementation “https://github.com/clovaai/voxceleb_trainer

__call__(outputs, samples=None, logit_length=None)
Parameters

outputs – [batch_size, num_speaker_utts, embedding_size]

class athena.loss.AngleProtoLoss(init_w=10.0, init_b=-5.0, name='AngleProtoLoss')

Bases: tensorflow.keras.losses.Loss

Angular Prototypical Loss Reference to paper “In defence of metric learning for speaker recognition” Similar to this implementation “https://github.com/clovaai/voxceleb_trainer

__call__(outputs, samples=None, logit_length=None)
Parameters

outputs – [batch_size, num_speaker_utts, embedding_size]

class athena.loss.GE2ELoss(init_w=10.0, init_b=-5.0, name='GE2ELoss')

Bases: tensorflow.keras.losses.Loss

Generalized End-to-end Loss Reference to paper “Generalized End-to-end Loss for Speaker Verification”

and “In defence of metric learning for speaker recognition”

Similar to this implementation “https://github.com/clovaai/voxceleb_trainer

__call__(outputs, samples=None, logit_length=None)
Parameters

outputs – [batch_size, num_speaker_utts, embedding_size]