Seq2seq

Layers mapping sequences to sequences


MultiHeadedAttention

Applies (multi headed) attention, as in the Transformer

Arguments

  • n_heads (int): Number of attention heads

  • n_units (int): Number of units per head, defaults to the last dimension of the input

  • causal (bool): Use causality (make each time point in output dependent only on previous time points of input)

  • name (str): Layer name

call Arguments

  • inputs (List[tf.Tensor]): List of the following tensors

  • query: Query Tensor of shape [batch_size, Tq, dim]

  • value: Value Tensor of shape [batch_size, Tv, dim].

  • key: Optional key Tensor of shape [batch_size, Tv, dim].

    If not given, will use value for both key and value, which is the most common case

  • mask (List[tf.Tensor]): List of the following tensors

  • query_mask: A boolean mask Tensor of shape [batch_size, Tq].

    If given, the output will be zero at the positions where mask==False

  • value_mask: A boolean mask Tensor of shape [batch_size, Tv].

    If given, will apply the mask such that values at positions where mask==False do not contribute to the result

Input shape

(batch_size, time_steps, channels)

Output shape

Same shape as input.

Examples

Apply a 4 headed (default) self attention

import tensorflow as tf
import tavolo as tvl

# Inputs
inputs = tf.keras.Input(shape=(max_seq_length,), dtype='int32')

# Embedding lookup
embedding_layer = tf.keras.layers.Embedding(max_tokens, dimension)
embedded = embedding_layer(inputs)

# Apply multi headed self attention
mh_attention = tvl.seq2seq.MultiHeadedAttention()
attended = mh_attention([embedded, embedded])

Apply a 4 headed attention, using a query vector and masking

import tensorflow as tf
import tavolo as tvl

# Inputs
query_input = tf.keras.Input(shape=(max_seq_length,), dtype='int32')
value_input = tf.keras.Input(shape=(max_seq_length,), dtype='int32')

# Embedding lookup
embedding_layer = tf.keras.layers.Embedding(max_tokens, dimension, mask_zero=True)
embedded_query = embedding_layer(query_input)
embedded_value = embedding_layer(value_input)

# Masks
query_mask = embedding_layer.compute_mask(query_input)
value_mask = embedding_layer.compute_mask(value_input)

# Apply multi headed self attention
mh_attention = tvl.seq2seq.MultiHeadedAttention()
attended = mh_attention([embedded_query, embedded_value], mask=[query_mask, value_mask])

Note

Since the query and value should be passed separately, it is recommended to use the functional API or model subclassing to use this layer.