Seq2seq¶
Layers mapping sequences to sequences
Modules
MultiHeadedAttention
¶
Applies (multi headed) attention, as in the Transformer
Arguments¶
n_heads (
int
): Number of attention headsn_units (
int
): Number of units per head, defaults to the last dimension of the inputcausal (
bool
): Use causality (make each time point in output dependent only on previous time points of input)name (
str
): Layer name
call
Arguments¶
inputs
(List[tf.Tensor]
): List of the following tensors
query: Query Tensor of shape [batch_size, Tq, dim]
value: Value Tensor of shape [batch_size, Tv, dim].
- key: Optional key Tensor of shape [batch_size, Tv, dim].
If not given, will use value for both key and value, which is the most common case
mask
(List[tf.Tensor]
): List of the following tensors
- query_mask: A boolean mask Tensor of shape [batch_size, Tq].
If given, the output will be zero at the positions where mask==False
- value_mask: A boolean mask Tensor of shape [batch_size, Tv].
If given, will apply the mask such that values at positions where mask==False do not contribute to the result
Input shape¶
(batch_size, time_steps, channels)
Output shape¶
Same shape as input.
Examples¶
Apply a 4 headed (default) self attention
import tensorflow as tf
import tavolo as tvl
# Inputs
inputs = tf.keras.Input(shape=(max_seq_length,), dtype='int32')
# Embedding lookup
embedding_layer = tf.keras.layers.Embedding(max_tokens, dimension)
embedded = embedding_layer(inputs)
# Apply multi headed self attention
mh_attention = tvl.seq2seq.MultiHeadedAttention()
attended = mh_attention([embedded, embedded])
Apply a 4 headed attention, using a query vector and masking
import tensorflow as tf
import tavolo as tvl
# Inputs
query_input = tf.keras.Input(shape=(max_seq_length,), dtype='int32')
value_input = tf.keras.Input(shape=(max_seq_length,), dtype='int32')
# Embedding lookup
embedding_layer = tf.keras.layers.Embedding(max_tokens, dimension, mask_zero=True)
embedded_query = embedding_layer(query_input)
embedded_value = embedding_layer(value_input)
# Masks
query_mask = embedding_layer.compute_mask(query_input)
value_mask = embedding_layer.compute_mask(value_input)
# Apply multi headed self attention
mh_attention = tvl.seq2seq.MultiHeadedAttention()
attended = mh_attention([embedded_query, embedded_value], mask=[query_mask, value_mask])
Note
Since the query and value should be passed separately, it is recommended to use the functional API or model subclassing to use this layer.