`transformer_u2`¶

the transformer model

Module Contents¶

Classes¶

`TransformerU2`	A transformer model. User is able to modify the attributes as needed.
`TransformerU2Encoder`	TransformerEncoder is a stack of N encoder layers
`TransformerU2Decoder`	TransformerDecoder is a stack of N decoder layers

class transformer_u2.TransformerU2(d_model=512, nhead=8, num_encoder_layers=6, num_decoder_layers=6, dim_feedforward=2048, dropout=0.1, activation='gelu', unidirectional=False, look_ahead=0, custom_encoder=None, custom_decoder=None, conv_module_kernel_size=0, static_chunk_size: int = 0, use_dynamic_chunk: bool = False, use_dynamic_left_chunk: bool = False)¶

Bases: athena.layers.transformer.Transformer

A transformer model. User is able to modify the attributes as needed.

Parameters

d_model – the number of expected features in the encoder/decoder inputs (default=512).
nhead – the number of heads in the multiheadattention models (default=8).
num_encoder_layers – the number of sub-encoder-layers in the encoder (default=6).
num_decoder_layers – the number of sub-decoder-layers in the decoder (default=6).
dim_feedforward – the dimension of the feedforward network model (default=2048).
dropout – the dropout value (default=0.1).
activation – the activation function of encoder/decoder intermediate layer, relu or gelu (default=relu).
custom_encoder – custom encoder (default=None).
custom_decoder – custom decoder (default=None).

Examples

>>> transformer_model = Transformer(nhead=16, num_encoder_layers=12)
>>> src = tf.random.normal((10, 32, 512))
>>> tgt = tf.random.normal((20, 32, 512))
>>> out = transformer_model(src, tgt)

call(src, tgt, src_mask=None, tgt_mask=None, memory_mask=None, return_encoder_output=False, return_attention_weights=False, training=None)¶

Take in and process masked source/target sequences.

Parameters

src – the sequence to the encoder (required).
tgt – the sequence to the decoder (required).
src_mask – the additive mask for the src sequence (optional).
tgt_mask – the additive mask for the tgt sequence (optional).
memory_mask – the additive mask for the encoder output (optional).
src_key_padding_mask – the ByteTensor mask for src keys per batch (optional).
tgt_key_padding_mask – the ByteTensor mask for tgt keys per batch (optional).
memory_key_padding_mask – the ByteTensor mask for memory keys per batch (optional).

Shape:

src: \((N, S, E)\).
tgt: \((N, T, E)\).
src_mask: \((N, S)\).
tgt_mask: \((N, T)\).
memory_mask: \((N, S)\).

Note: [src/tgt/memory]_mask should be a ByteTensor where True values are positions that should be masked with float(‘-inf’) and False values will be unchanged. This mask ensures that no information will be taken from position i if it is masked, and has a separate mask for each sequence in a batch.

output: \((N, T, E)\).

Note: Due to the multi-head attention architecture in the transformer model, the output sequence length of a transformer is same as the input sequence (i.e. target) length of the decode.

where S is the source sequence length, T is the target sequence length, N is the batch size, E is the feature number

Examples

>>> output = transformer_model(src, tgt, src_mask=src_mask, tgt_mask=tgt_mask)

class transformer_u2.TransformerU2Encoder(encoder_layers, static_chunk_size: int = 0, use_dynamic_chunk: bool = False, use_dynamic_left_chunk: bool = False)¶

Bases: tensorflow.keras.layers.Layer

TransformerEncoder is a stack of N encoder layers

Parameters

encoder_layer – an instance of the TransformerEncoderLayer() class (required).
num_layers – the number of sub-encoder-layers in the encoder (required).
norm – the layer normalization component (optional).

Examples

>>> encoder_layer = [TransformerU2EncoderLayer(d_model=512, nhead=8)
>>>                    for _ in range(num_layers)]
>>> transformer_encoder = TransformerU2Encoder(encoder_layer)
>>> src = tf.random.gamma(10, 32, 512)
>>> out = transformer_encoder(src)

call(src, src_mask=None, training=None)¶

Pass the input through the endocder layers in turn.

Parameters

src – the sequnce to the encoder (required).
mask – the mask for the src sequence (optional).

set_unidirectional(uni=False)¶: whether to apply trianglar masks to make transformer unidirectional

class transformer_u2.TransformerU2Decoder(decoder_layers)¶

Bases: tensorflow.keras.layers.Layer

TransformerDecoder is a stack of N decoder layers

Parameters

decoder_layer – an instance of the TransformerDecoderLayer() class (required).
num_layers – the number of sub-decoder-layers in the decoder (required).
norm – the layer normalization component (optional).

Examples

>>> decoder_layer = [TransformerDecoderLayer(d_model=512, nhead=8)
>>>                     for _ in range(num_layers)]
>>> transformer_decoder = TransformerDecoder(decoder_layer)
>>> memory = tf.random.gamma(10, 32, 512)
>>> tgt = tf.random.gamma(20, 32, 512)
>>> out = transformer_decoder(tgt, memory)

call(tgt, memory, tgt_mask=None, memory_mask=None, return_attention_weights=False, training=None)¶

Pass the inputs (and mask) through the decoder layer in turn.

Parameters

tgt – the sequence to the decoder (required).
memory – the sequnce from the last layer of the encoder (required).
tgt_mask – the mask for the tgt sequence (optional).
memory_mask – the mask for the memory sequence (optional).

transformer_u2¶

Module Contents¶

Classes¶

`transformer_u2`¶