transformer_u2

the transformer model

Module Contents

Classes

TransformerU2

A transformer model. User is able to modify the attributes as needed.

TransformerU2Encoder

TransformerEncoder is a stack of N encoder layers

TransformerU2Decoder

TransformerDecoder is a stack of N decoder layers

class transformer_u2.TransformerU2(d_model=512, nhead=8, num_encoder_layers=6, num_decoder_layers=6, dim_feedforward=2048, dropout=0.1, activation='gelu', unidirectional=False, look_ahead=0, custom_encoder=None, custom_decoder=None, conv_module_kernel_size=0, static_chunk_size: int = 0, use_dynamic_chunk: bool = False, use_dynamic_left_chunk: bool = False)

Bases: athena.layers.transformer.Transformer

A transformer model. User is able to modify the attributes as needed.

Parameters
  • d_model – the number of expected features in the encoder/decoder inputs (default=512).

  • nhead – the number of heads in the multiheadattention models (default=8).

  • num_encoder_layers – the number of sub-encoder-layers in the encoder (default=6).

  • num_decoder_layers – the number of sub-decoder-layers in the decoder (default=6).

  • dim_feedforward – the dimension of the feedforward network model (default=2048).

  • dropout – the dropout value (default=0.1).

  • activation – the activation function of encoder/decoder intermediate layer, relu or gelu (default=relu).

  • custom_encoder – custom encoder (default=None).

  • custom_decoder – custom decoder (default=None).

Examples

>>> transformer_model = Transformer(nhead=16, num_encoder_layers=12)
>>> src = tf.random.normal((10, 32, 512))
>>> tgt = tf.random.normal((20, 32, 512))
>>> out = transformer_model(src, tgt)
call(src, tgt, src_mask=None, tgt_mask=None, memory_mask=None, return_encoder_output=False, return_attention_weights=False, training=None)

Take in and process masked source/target sequences.

Parameters
  • src – the sequence to the encoder (required).

  • tgt – the sequence to the decoder (required).

  • src_mask – the additive mask for the src sequence (optional).

  • tgt_mask – the additive mask for the tgt sequence (optional).

  • memory_mask – the additive mask for the encoder output (optional).

  • src_key_padding_mask – the ByteTensor mask for src keys per batch (optional).

  • tgt_key_padding_mask – the ByteTensor mask for tgt keys per batch (optional).

  • memory_key_padding_mask – the ByteTensor mask for memory keys per batch (optional).

Shape:
  • src: \((N, S, E)\).

  • tgt: \((N, T, E)\).

  • src_mask: \((N, S)\).

  • tgt_mask: \((N, T)\).

  • memory_mask: \((N, S)\).

Note: [src/tgt/memory]_mask should be a ByteTensor where True values are positions that should be masked with float(‘-inf’) and False values will be unchanged. This mask ensures that no information will be taken from position i if it is masked, and has a separate mask for each sequence in a batch.

  • output: \((N, T, E)\).

Note: Due to the multi-head attention architecture in the transformer model, the output sequence length of a transformer is same as the input sequence (i.e. target) length of the decode.

where S is the source sequence length, T is the target sequence length, N is the batch size, E is the feature number

Examples

>>> output = transformer_model(src, tgt, src_mask=src_mask, tgt_mask=tgt_mask)
class transformer_u2.TransformerU2Encoder(encoder_layers, static_chunk_size: int = 0, use_dynamic_chunk: bool = False, use_dynamic_left_chunk: bool = False)

Bases: tensorflow.keras.layers.Layer

TransformerEncoder is a stack of N encoder layers

Parameters
  • encoder_layer – an instance of the TransformerEncoderLayer() class (required).

  • num_layers – the number of sub-encoder-layers in the encoder (required).

  • norm – the layer normalization component (optional).

Examples

>>> encoder_layer = [TransformerU2EncoderLayer(d_model=512, nhead=8)
>>>                    for _ in range(num_layers)]
>>> transformer_encoder = TransformerU2Encoder(encoder_layer)
>>> src = tf.random.gamma(10, 32, 512)
>>> out = transformer_encoder(src)
call(src, src_mask=None, training=None)

Pass the input through the endocder layers in turn.

Parameters
  • src – the sequnce to the encoder (required).

  • mask – the mask for the src sequence (optional).

set_unidirectional(uni=False)

whether to apply trianglar masks to make transformer unidirectional

class transformer_u2.TransformerU2Decoder(decoder_layers)

Bases: tensorflow.keras.layers.Layer

TransformerDecoder is a stack of N decoder layers

Parameters
  • decoder_layer – an instance of the TransformerDecoderLayer() class (required).

  • num_layers – the number of sub-decoder-layers in the decoder (required).

  • norm – the layer normalization component (optional).

Examples

>>> decoder_layer = [TransformerDecoderLayer(d_model=512, nhead=8)
>>>                     for _ in range(num_layers)]
>>> transformer_decoder = TransformerDecoder(decoder_layer)
>>> memory = tf.random.gamma(10, 32, 512)
>>> tgt = tf.random.gamma(20, 32, 512)
>>> out = transformer_decoder(tgt, memory)
call(tgt, memory, tgt_mask=None, memory_mask=None, return_attention_weights=False, training=None)

Pass the inputs (and mask) through the decoder layer in turn.

Parameters
  • tgt – the sequence to the decoder (required).

  • memory – the sequnce from the last layer of the encoder (required).

  • tgt_mask – the mask for the tgt sequence (optional).

  • memory_mask – the mask for the memory sequence (optional).