athena.loss
¶
some losses
Module Contents¶
Classes¶
CTC LOSS |
|
Seq2SeqSparseCategoricalCrossentropy LOSS |
|
MPC LOSS |
|
Tacotron2 Loss |
|
GuidedAttention Loss to make attention alignments more monotonic |
|
Guided multihead attention loss function module for multi head attention. |
|
used for training of fastspeech |
|
used for training of fastspeech2 |
|
Softmax Loss |
|
Additive Margin Softmax Loss |
|
Additive Angular Margin Softmax Loss |
|
Prototypical Loss |
|
Angular Prototypical Loss |
|
Generalized End-to-end Loss |
- class athena.loss.CTCLoss(logits_time_major=False, blank_index=-1, name='CTCLoss')¶
Bases:
tensorflow.keras.losses.Loss
CTC LOSS CTC LOSS implemented with Tensorflow
- __call__(logits, samples, logit_length=None)¶
- class athena.loss.Seq2SeqSparseCategoricalCrossentropy(num_classes, eos=-1, by_token=False, by_sequence=True, from_logits=True, label_smoothing=0.0)¶
Bases:
tensorflow.keras.losses.CategoricalCrossentropy
Seq2SeqSparseCategoricalCrossentropy LOSS CategoricalCrossentropy calculated at each character for each sequence in a batch
- __call__(logits, samples, logit_length=None)¶
- class athena.loss.MPCLoss(name='MPCLoss')¶
Bases:
tensorflow.keras.losses.Loss
MPC LOSS L1 loss for each masked acoustic features in a batch
- __call__(logits, samples, logit_length=None)¶
- class athena.loss.Tacotron2Loss(model, guided_attn_loss_function, regularization_weight=0.0, l1_loss_weight=0.0, mask_decoder=False, pos_weight=1.0, name='Tacotron2Loss')¶
Bases:
tensorflow.keras.losses.Loss
Tacotron2 Loss
- __call__(outputs, samples, logit_length=None)¶
- Parameters
outputs – contain elements below: att_ws_stack: shape: [batch, y_steps, x_steps]
- class athena.loss.GuidedAttentionLoss(guided_attn_weight, reduction_factor, attn_sigma=0.4, name='GuidedAttentionLoss')¶
Bases:
tensorflow.keras.losses.Loss
GuidedAttention Loss to make attention alignments more monotonic
- __call__(att_ws_stack, samples)¶
- _create_attention_masks(input_length, output_length)¶
masks created by attention location
- Parameters
input_length – shape: [batch_size]
output_length – shape: [batch_size]
- Returns
shape: [batch_size, 1, y_steps, x_steps]
- Return type
masks
- _create_length_masks(input_length, output_length)¶
masks created by input and output length
- Parameters
input_length – shape: [batch_size]
output_length – shape: [batch_size]
- Returns
shape: [batch_size, 1, output_length, input_length]
- Return type
masks
Examples
output_length: [6, 8] input_length: [3, 5] masks:
- [[[1, 1, 1, 0, 0],
[1, 1, 1, 0, 0], [1, 1, 1, 0, 0], [1, 1, 1, 0, 0], [1, 1, 1, 0, 0], [1, 1, 1, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]],
- [[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]]]
- class athena.loss.GuidedMultiHeadAttentionLoss(guided_attn_weight, reduction_factor, attn_sigma=0.4, num_heads=2, num_layers=2, name='GuidedMultiHeadAttentionLoss')¶
Bases:
GuidedAttentionLoss
Guided multihead attention loss function module for multi head attention.
- __call__(att_ws_stack, samples)¶
- class athena.loss.FastSpeechLoss(duration_predictor_loss_weight, eps=1.0, use_mask=True, teacher_guide=False)¶
Bases:
tensorflow.keras.losses.Loss
used for training of fastspeech
- __call__(outputs, samples)¶
Its corresponding log value is calculated to make it Gaussian. :param outputs: it contains four elements:
before_outs: outputs before postnet, shape: [batch, y_steps, feat_dim] teacher_outs: teacher outputs, shape: [batch, y_steps, feat_dim] after_outs: outputs after postnet, shape: [batch, y_steps, feat_dim] duration_sequences: duration predictions from teacher model, shape: [batch, x_steps] pred_duration_sequences: duration predictions from trained predictor
shape: [batch, x_steps]
- Parameters
samples – samples from dataset
- class athena.loss.FastSpeech2Loss(variant_predictor_loss_weight, eps=1.0, use_mask=True)¶
Bases:
tensorflow.keras.losses.Loss
used for training of fastspeech2
- __call__(outputs, samples)¶
- class athena.loss.SoftmaxLoss(embedding_size, num_classes, name='SoftmaxLoss')¶
Bases:
tensorflow.keras.losses.Loss
Softmax Loss Similar to this implementation “https://github.com/clovaai/voxceleb_trainer”
- __call__(outputs, samples, logit_length=None)¶
- class athena.loss.AMSoftmaxLoss(embedding_size, num_classes, m=0.3, s=15, name='AMSoftmaxLoss')¶
Bases:
tensorflow.keras.losses.Loss
Additive Margin Softmax Loss Reference to paper “CosFace: Large Margin Cosine Loss for Deep Face Recognition”
and “In defence of metric learning for speaker recognition”
Similar to this implementation “https://github.com/clovaai/voxceleb_trainer”
- __call__(outputs, samples, logit_length=None)¶
- class athena.loss.AAMSoftmaxLoss(embedding_size, num_classes, m=0.3, s=15, easy_margin=False, name='AAMSoftmaxLoss')¶
Bases:
tensorflow.keras.losses.Loss
Additive Angular Margin Softmax Loss Reference to paper “ArcFace: Additive Angular Margin Loss for Deep Face Recognition”
and “In defence of metric learning for speaker recognition”
Similar to this implementation “https://github.com/clovaai/voxceleb_trainer”
- __call__(outputs, samples, logit_length=None)¶
- class athena.loss.ProtoLoss(name='ProtoLoss')¶
Bases:
tensorflow.keras.losses.Loss
Prototypical Loss Reference to paper “Prototypical Networks for Few-shot Learning”
and “In defence of metric learning for speaker recognition”
Similar to this implementation “https://github.com/clovaai/voxceleb_trainer”
- __call__(outputs, samples=None, logit_length=None)¶
- Parameters
outputs – [batch_size, num_speaker_utts, embedding_size]
- class athena.loss.AngleProtoLoss(init_w=10.0, init_b=-5.0, name='AngleProtoLoss')¶
Bases:
tensorflow.keras.losses.Loss
Angular Prototypical Loss Reference to paper “In defence of metric learning for speaker recognition” Similar to this implementation “https://github.com/clovaai/voxceleb_trainer”
- __call__(outputs, samples=None, logit_length=None)¶
- Parameters
outputs – [batch_size, num_speaker_utts, embedding_size]
- class athena.loss.GE2ELoss(init_w=10.0, init_b=-5.0, name='GE2ELoss')¶
Bases:
tensorflow.keras.losses.Loss
Generalized End-to-end Loss Reference to paper “Generalized End-to-end Loss for Speaker Verification”
and “In defence of metric learning for speaker recognition”
Similar to this implementation “https://github.com/clovaai/voxceleb_trainer”
- __call__(outputs, samples=None, logit_length=None)¶
- Parameters
outputs – [batch_size, num_speaker_utts, embedding_size]