Embeddings

Modules related to embeddings


PositionalEncoding

Positional encoding layer, usually added on top of an embedding layer. Embeds information about the position of the elements using the formula.

\[ \begin{align}\begin{aligned}PE[pos,2i]=sin\left(\frac{pos}{normalize\_factor^{\frac{2i}{embedding\_dim}}}\right)\\PE[pos,2i+1]=cos\left(\frac{pos}{normalize\_factor^{\frac{2i}{embedding\_dim}}}\right)\end{aligned}\end{align} \]

The resulting embedding gets added (point-wise) to the input.

Arguments

  • max_sequence_length (int): Maximum sequence length of input

  • embedding_dim (int): Dimensionality of the input’s last dimension

  • normalize_factor (float): Normalize factor

  • name (str): Layer name

Input shape

(batch_size, time_steps, channels) where time_steps equals to the max_sequence_length and channels to embedding_dim

Output shape

Same shape as input.

Examples

import tensorflow as tf
import tavolo as tvl

model = tf.keras.Sequential([tf.keras.layers.Embedding(vocab_size, 8, input_length=max_sequence_length),
                             tvl.embeddings.PositionalEncoding(max_sequence_length=max_sequence_length,
                                                               embedding_dim=8)])  # Add positional encoding

References

Attention Is All You Need


DynamicMetaEmbedding

3
model = tf.keras.Sequential([tf.keras.layers.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype=’int32’),
tvl.embeddings.DynamicMetaEmbedding([w2v_embedding, glove_embedding],

input_length=MAX_SEQUENCE_LENGTH)])

Using the same example as above, it is possible to define the output’s channel size

model = tf.keras.Sequential([tf.keras.layers.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32'),
                             tvl.embeddings.DynamicMetaEmbedding([w2v_embedding, glove_embedding],
                                                                 input_length=MAX_SEQUENCE_LENGTH,
                                                                 output_dim=200)])

Dynamic Meta-Embeddings for Improved Sentence Representations


ContextualDynamicMetaEmbedding

Applies learned attention to different sets of embeddings matrices per token, to mix separate token representations into a joined one. Self attention is context-dependent, meaning each word’s representation in the output is only dependent on the sentence’s original embeddings in the given matrices, and the attention vector. The context is generated by a BiLSTM.

Arguments

  • embedding_matrices (List[np.ndarray]): List of embedding matrices

  • output_dim (int): Dimension of the output embedding

  • mask_zero (bool): Whether the input value 0 is a special “padding” value that should be masked out

  • input_length (Optional[int]): Parameter to be passed into internal tf.keras.layers.Embedding matrices

  • n_lstm_units (int): Number of units in each LSTM, (notated as m in the original article)

  • name (str): Layer name

Input shape

(batch_size, time_steps)

Output shape

(batch_size, time_steps, output_dim)

Examples

Create Dynamic Meta Embeddings using 2 separate embedding matrices. Notice it is the user’s responsibility to make sure all the arguments needed in the embedding lookup are passed to the tf.keras.layers.Embedding constructors (like trainable=False).

import tensorflow as tf
import tavolo as tvl

w2v_embedding = np.array(...)  # Pre-trained embedding matrix

glove_embedding = np.array(...)  # Pre-trained embedding matrix

model = tf.keras.Sequential([tf.keras.layers.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32'),
                             tvl.embeddings.DynamicMetaEmbedding([w2v_embedding, glove_embedding],
                                                                 input_length=MAX_SEQUENCE_LENGTH)])

Using the same example as above, it is possible to define the output’s channel size and number of units in each LSTM

model = tf.keras.Sequential([tf.keras.layers.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32'),
                             tvl.embeddings.DynamicMetaEmbedding([w2v_embedding, glove_embedding],
                                                                  input_length=MAX_SEQUENCE_LENGTH,
                                                                  n_lstm_units=128, output_dim=200)])

References

Dynamic Meta-Embeddings for Improved Sentence Representations