Seq2seq¶

Layers mapping sequences to sequences

Modules

MultiHeadedAttention

`MultiHeadedAttention`¶

Applies (multi headed) attention, as in the Transformer

Arguments¶

n_heads (int): Number of attention heads
n_units (int): Number of units per head, defaults to the last dimension of the input
causal (bool): Use causality (make each time point in output dependent only on previous time points of input)
name (str): Layer name

`call` Arguments¶

inputs (List[tf.Tensor]): List of the following tensors

query: Query Tensor of shape [batch_size, Tq, dim]

value: Value Tensor of shape [batch_size, Tv, dim].

key: Optional key Tensor of shape [batch_size, Tv, dim].
If not given, will use value for both key and value, which is the most common case

mask (List[tf.Tensor]): List of the following tensors

query_mask: A boolean mask Tensor of shape [batch_size, Tq].
If given, the output will be zero at the positions where mask==False

value_mask: A boolean mask Tensor of shape [batch_size, Tv].
If given, will apply the mask such that values at positions where mask==False do not contribute to the result

Input shape¶

(batch_size, time_steps, channels)

Output shape¶

Same shape as input.

Examples¶

Apply a 4 headed (default) self attention

import tensorflow as tf
import tavolo as tvl

# Inputs
inputs = tf.keras.Input(shape=(max_seq_length,), dtype='int32')

# Embedding lookup
embedding_layer = tf.keras.layers.Embedding(max_tokens, dimension)
embedded = embedding_layer(inputs)

# Apply multi headed self attention
mh_attention = tvl.seq2seq.MultiHeadedAttention()
attended = mh_attention([embedded, embedded])

Apply a 4 headed attention, using a query vector and masking

import tensorflow as tf
import tavolo as tvl

# Inputs
query_input = tf.keras.Input(shape=(max_seq_length,), dtype='int32')
value_input = tf.keras.Input(shape=(max_seq_length,), dtype='int32')

# Embedding lookup
embedding_layer = tf.keras.layers.Embedding(max_tokens, dimension, mask_zero=True)
embedded_query = embedding_layer(query_input)
embedded_value = embedding_layer(value_input)

# Masks
query_mask = embedding_layer.compute_mask(query_input)
value_mask = embedding_layer.compute_mask(value_input)

# Apply multi headed self attention
mh_attention = tvl.seq2seq.MultiHeadedAttention()
attended = mh_attention([embedded_query, embedded_value], mask=[query_mask, value_mask])

Note

Since the query and value should be passed separately, it is recommended to use the functional API or model subclassing to use this layer.

References¶

Attention Is All You Need