transformer_u2
¶
the transformer model
Module Contents¶
Classes¶
A transformer model. User is able to modify the attributes as needed. |
|
TransformerEncoder is a stack of N encoder layers |
|
TransformerDecoder is a stack of N decoder layers |
- class transformer_u2.TransformerU2(d_model=512, nhead=8, num_encoder_layers=6, num_decoder_layers=6, dim_feedforward=2048, dropout=0.1, activation='gelu', unidirectional=False, look_ahead=0, custom_encoder=None, custom_decoder=None, conv_module_kernel_size=0, static_chunk_size: int = 0, use_dynamic_chunk: bool = False, use_dynamic_left_chunk: bool = False)¶
Bases:
athena.layers.transformer.Transformer
A transformer model. User is able to modify the attributes as needed.
- Parameters
d_model – the number of expected features in the encoder/decoder inputs (default=512).
nhead – the number of heads in the multiheadattention models (default=8).
num_encoder_layers – the number of sub-encoder-layers in the encoder (default=6).
num_decoder_layers – the number of sub-decoder-layers in the decoder (default=6).
dim_feedforward – the dimension of the feedforward network model (default=2048).
dropout – the dropout value (default=0.1).
activation – the activation function of encoder/decoder intermediate layer, relu or gelu (default=relu).
custom_encoder – custom encoder (default=None).
custom_decoder – custom decoder (default=None).
Examples
>>> transformer_model = Transformer(nhead=16, num_encoder_layers=12) >>> src = tf.random.normal((10, 32, 512)) >>> tgt = tf.random.normal((20, 32, 512)) >>> out = transformer_model(src, tgt)
- call(src, tgt, src_mask=None, tgt_mask=None, memory_mask=None, return_encoder_output=False, return_attention_weights=False, training=None)¶
Take in and process masked source/target sequences.
- Parameters
src – the sequence to the encoder (required).
tgt – the sequence to the decoder (required).
src_mask – the additive mask for the src sequence (optional).
tgt_mask – the additive mask for the tgt sequence (optional).
memory_mask – the additive mask for the encoder output (optional).
src_key_padding_mask – the ByteTensor mask for src keys per batch (optional).
tgt_key_padding_mask – the ByteTensor mask for tgt keys per batch (optional).
memory_key_padding_mask – the ByteTensor mask for memory keys per batch (optional).
- Shape:
src: \((N, S, E)\).
tgt: \((N, T, E)\).
src_mask: \((N, S)\).
tgt_mask: \((N, T)\).
memory_mask: \((N, S)\).
Note: [src/tgt/memory]_mask should be a ByteTensor where True values are positions that should be masked with float(‘-inf’) and False values will be unchanged. This mask ensures that no information will be taken from position i if it is masked, and has a separate mask for each sequence in a batch.
output: \((N, T, E)\).
Note: Due to the multi-head attention architecture in the transformer model, the output sequence length of a transformer is same as the input sequence (i.e. target) length of the decode.
where S is the source sequence length, T is the target sequence length, N is the batch size, E is the feature number
Examples
>>> output = transformer_model(src, tgt, src_mask=src_mask, tgt_mask=tgt_mask)
- class transformer_u2.TransformerU2Encoder(encoder_layers, static_chunk_size: int = 0, use_dynamic_chunk: bool = False, use_dynamic_left_chunk: bool = False)¶
Bases:
tensorflow.keras.layers.Layer
TransformerEncoder is a stack of N encoder layers
- Parameters
encoder_layer – an instance of the TransformerEncoderLayer() class (required).
num_layers – the number of sub-encoder-layers in the encoder (required).
norm – the layer normalization component (optional).
Examples
>>> encoder_layer = [TransformerU2EncoderLayer(d_model=512, nhead=8) >>> for _ in range(num_layers)] >>> transformer_encoder = TransformerU2Encoder(encoder_layer) >>> src = tf.random.gamma(10, 32, 512) >>> out = transformer_encoder(src)
- call(src, src_mask=None, training=None)¶
Pass the input through the endocder layers in turn.
- Parameters
src – the sequnce to the encoder (required).
mask – the mask for the src sequence (optional).
- set_unidirectional(uni=False)¶
whether to apply trianglar masks to make transformer unidirectional
- class transformer_u2.TransformerU2Decoder(decoder_layers)¶
Bases:
tensorflow.keras.layers.Layer
TransformerDecoder is a stack of N decoder layers
- Parameters
decoder_layer – an instance of the TransformerDecoderLayer() class (required).
num_layers – the number of sub-decoder-layers in the decoder (required).
norm – the layer normalization component (optional).
Examples
>>> decoder_layer = [TransformerDecoderLayer(d_model=512, nhead=8) >>> for _ in range(num_layers)] >>> transformer_decoder = TransformerDecoder(decoder_layer) >>> memory = tf.random.gamma(10, 32, 512) >>> tgt = tf.random.gamma(20, 32, 512) >>> out = transformer_decoder(tgt, memory)
- call(tgt, memory, tgt_mask=None, memory_mask=None, return_attention_weights=False, training=None)¶
Pass the inputs (and mask) through the decoder layer in turn.
- Parameters
tgt – the sequence to the decoder (required).
memory – the sequnce from the last layer of the encoder (required).
tgt_mask – the mask for the tgt sequence (optional).
memory_mask – the mask for the memory sequence (optional).