Welcome to tavolo’s documentation!¶

Showcase¶
import tensorflow as tf
import tavolo as tvl
model = tf.keras.Sequential([
tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embedding_size, input_length=max_len),
tvl.seq2vec.YangAttention(n_units=64), # <--- Add Yang style attention
tf.keras.layers.Dense(n_hidden_units, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')])
model.compile(optimizer=tf.keras.optimizers.SGD(), loss=tf.keras.losses.BinaryCrossentropy())
# Run learning rate range test
lr_finder = tvl.learning.LearningRateFinder(model=model)
learning_rates, losses = lr_finder.scan(train_data, train_labels, min_lr=0.0001, max_lr=1.0, batch_size=128)
### Plot the results to choose your learning rate
Installation¶
Note
Tavolo will not install tensorflow by itself, this is to prevent installations of CPU and GPU versions together. It is the user’s responsibility to install the tensorflow library
Install from source code¶
git clone https://github.com/eliorc/tavolo.git
cd tavolo
python setup.py install
Contributing¶
dev
branch.Note
Do not create pull requests into the master
branch. Pull requests should be made to the dev
branch, from which changes will be merged into master
on releases.
Code and Documentation¶
Comments - Even if the code is clear, use comments to explain steps (step comment example).
Variable verbosity - Use verbose variable names that imply the meaning of their content, e.g. use
mask
instead ofm
.Clear tensor shapes - When applying operations on tensors, include the shape of the result in a comment. (tensor shape example).
Format - reStructuredText is the documentation format use, and specifically PEP 287 (PyCharm’s default) for class methods. On class level docstring, make sure you always include the following sections:
Arguments - For the
__init__
arguments (Arguments section example).Examples - For examples (Examples section example)
References - For sources (articles etc.) for further reading (References section example).
If you are contributing a
tf.keras.layers.Layer
subclass, also include:
Input Shape - Input shape accepted by the layer’s
call
method (Input Shape section example).Output Shape - Output shape accepted by the layer’s
call
method (Output Shape section example).
Testing¶
tests/<parent_module>/<module_name>_test.py
.tvl.normalization.LayerNormalization
, the tests should be written in tests/normalization/layer_normalization_test.py
.tf.keras.layers.Layer
implementation, always include:
test_shapes()
- Given accepted input shapes, make sure the output shape is as expected (test_shapes() example).
test_masking()
- Make sure layer supports masking (test_masking() example).
test_serialization()
- Make sure layer can be saved and loaded usingget_config
andfrom_config
(test_serialization() example).
test_logic()
for evaluating expected output given known input (test_logic() example).pytest --cov=tavolo tests/
dev
branch) and it will be added to the package in a following release.Embeddings¶
Modules related to embeddings
PositionalEncoding
¶
Create a positional encoding layer, usually added on top of an embedding layer. Embeds information about the position of the elements using the formula
The resulting embedding gets added (point-wise) to the input.
Arguments¶
max_sequence_length (
int
): Maximum sequence length of inputembedding_dim (
int
): Dimensionality of the of the input’s last dimensionnormalize_factor (
float
): Normalize factorname (
str
): Layer name
Input shape¶
(batch_size, time_steps, channels) where time_steps equals to the max_sequence_length
and channels to embedding_dim
Output shape¶
Same shape as input.
Examples¶
import tensorflow as tf
import tavolo as tvl
model = tf.keras.Sequential([tf.keras.layers.Embedding(vocab_size, 8, input_length=max_sequence_length),
tvl.embeddings.PositionalEncoding(max_sequence_length=max_sequence_length,
embedding_dim=8)]) # Add positional encoding
References¶
DynamicMetaEmbedding
¶
Applies learned attention to different sets of embeddings matrices per token, to mix separate token representations into a joined one. Self attention is word-dependent, meaning each word’s representation in the output is only dependent on the word’s original embeddings in the given matrices, and the attention vector.
Arguments¶
embedding_matrices (
List[np.ndarray]
): List of embedding matricesoutput_dim (
int
): Dimension of the output embeddingmask_zero (
bool
): Whether or not the input value 0 is a special “padding” value that should be masked outinput_length (
Optional[int]
): Parameter to be passed into internaltf.keras.layers.Embedding
matricesname (
str
): Layer name
Input shape¶
(batch_size, time_steps)
Output shape¶
(batch_size, time_steps, output_dim)
Examples¶
Create Dynamic Meta Embeddings using 2 separate embedding matrices. Notice it is the user’s responsibility to make sure
all the arguments needed in the embedding lookup are passed to the tf.keras.layers.Embedding
constructors (like trainable=False
).
import tensorflow as tf
import tavolo as tvl
w2v_embedding = np.array(...) # Pre-trained embedding matrix
glove_embedding = np.array(...) # Pre-trained embedding matrix
model = tf.keras.Sequential([tf.keras.layers.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32'),
tvl.embeddings.DynamicMetaEmbedding([w2v_embedding, glove_embedding],
input_length=MAX_SEQUENCE_LENGTH)]) # Use DME embeddings
Using the same example as above, it is possible to define the output’s channel size
model = tf.keras.Sequential([tf.keras.layers.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32'),
tvl.embeddings.DynamicMetaEmbedding([w2v_embedding, glove_embedding],
input_length=MAX_SEQUENCE_LENGTH,
output_dim=200)])
ContextualDynamicMetaEmbedding
¶
Applies learned attention to different sets of embeddings matrices per token, to mix separate token representations into a joined one. Self attention is context-dependent, meaning each word’s representation in the output is only dependent on the sentence’s original embeddings in the given matrices, and the attention vector. The context is generated by a BiLSTM.
Arguments¶
embedding_matrices (
List[np.ndarray]
): List of embedding matricesoutput_dim (
int
): Dimension of the output embeddingmask_zero (
bool
): Whether or not the input value 0 is a special “padding” value that should be masked outinput_length (
Optional[int]
): Parameter to be passed into internaltf.keras.layers.Embedding
matricesn_lstm_units (
int
): Number of units in each LSTM, (notated as m in the original article)name (
str
): Layer name
Input shape¶
(batch_size, time_steps)
Output shape¶
(batch_size, time_steps, output_dim)
Examples¶
Create Dynamic Meta Embeddings using 2 separate embedding matrices. Notice it is the user’s responsibility to make sure
all the arguments needed in the embedding lookup are passed to the tf.keras.layers.Embedding
constructors (like trainable=False
).
import tensorflow as tf
import tavolo as tvl
w2v_embedding = np.array(...) # Pre-trained embedding matrix
glove_embedding = np.array(...) # Pre-trained embedding matrix
model = tf.keras.Sequential([tf.keras.layers.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32'),
tvl.embeddings.DynamicMetaEmbedding([w2v_embedding, glove_embedding],
input_length=MAX_SEQUENCE_LENGTH)]) # Use CDME embeddings
Using the same example as above, it is possible to define the output’s channel size and number of units in each LSTM
model = tf.keras.Sequential([tf.keras.layers.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32'),
tvl.embeddings.DynamicMetaEmbedding([w2v_embedding, glove_embedding],
input_length=MAX_SEQUENCE_LENGTH,
n_lstm_units=128, output_dim=200)])
Learning¶
Modules for altering the learning process
CyclicLearningRateCallback
¶
Apply cyclic learning rate. Supports the following scale schemes:
triangular
- Triangular cycletriangular2
- Triangular cycle that shrinks amplitude by half each cycleexp_range
- Triangular cycle that shrinks amplitude bygamma ** <cycle iterations>
each cycle
Arguments¶
base_lr (
float
): Lower boundary of each cyclemax_lr (
float
): Upper boundary of each cycle, may not be reached depending on the scaling functionstep_size (
int
): Number of batches per half-cycle (step)scale_scheme (
str
): One of{'triangular', 'triangular2', 'exp_range'}
. Ifscale_fn
is passed, this argument is ignoredgamma (
float
): Constant used for theexp_range
’sscale_fn
, used as (gamma ** <cycle iterations>
)scale_fn (
callable
): Custom scaling policy, accepts cycle index / iterations depending on thescale_mode
and must return a value in the range [0, 1]. If passed, ignoresscale_scheme
scale_mode (
str
): Define whetherscale_fn
is evaluated on cycle index or cycle iterations
Examples¶
Apply a triangular cyclic learning rate (default), with a step size of 2000 batches
import tensorflow as tf
import tavolo as tvl
clr = tvl.learning.CyclicLearningRateCallback(base_lr=0.001, max_lr=0.006, step_size=2000)
model.fit(X_train, Y_train, callbacks=[clr])
Apply a cyclic learning rate that shrinks amplitude by half each cycle
import tensorflow as tf
import tavolo as tvl
clr = tvl.learning.CyclicLearningRateCallback(base_lr=0.001, max_lr=0.006, step_size=2000, scale_scheme='triangular2')
model.fit(X_train, Y_train, callbacks=[clr])
Apply a cyclic learning rate with a custom scaling function
import tensorflow as tf
import tavolo as tvl
scale_fn = lambda x: 0.5 * (1 + np.sin(x * np.pi / 2))
clr = tvl.learning.CyclicLearningRateCallback(base_lr=0.001, max_lr=0.006, step_size=2000, scale_fn=scale_fn)
model.fit(X_train, Y_train, callbacks=[clr])
LearningRateFinder
¶
Learning rate finding utility for conducting the “LR range test”, see article reference for more information
Use the scan
method for finding the loss values for learning rates in the given range
Arguments¶
model (
tf.keras.Model
): Model for conduct test for. Must callmodel.compile
before using this utility
Examples¶
Run a learning rate range test in the domain [0.0001, 1.0]
import tensorflow as tf
import tavolo as tvl
train_data = ...
train_labels = ...
# Build model
model = tf.keras.Sequential([tf.keras.layers.Input(shape=(784,)),
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)])
# Must call compile with optimizer before test
model.compile(optimizer=tf.keras.optimizers.SGD(), loss=tf.keras.losses.CategoricalCrossentropy())
# Run learning rate range test
lr_finder = tvl.learning.LearningRateFinder(model=model)
learning_rates, losses = lr_finder.scan(train_data, train_labels, min_lr=0.0001, max_lr=1.0, batch_size=128)
### Plot the results to choose your learning rate
References¶
-
learning.LearningRateFinder.
scan
(x, y, min_lr: float = 0.0001, max_lr: float = 1.0, batch_size: Optional[int] = None, steps: int = 100) → Tuple[List[float], List[float]]¶ Scans the learning rate range
[min_lr, max_lr]
for loss values- Parameters
x – Input data. It could be: - A Numpy array (or array-like), or a list of arrays (in case the model has multiple inputs) - A TensorFlow tensor, or a list of tensors (in case the model has multiple inputs) - A dict mapping input names to the corresponding array/tensors, if the model has named inputs - A
tf.data
dataset or a dataset iterator. Should return a tuple of either(inputs, targets)
or(inputs, targets, sample_weights)
- A generator orkeras.utils.Sequence
returning(inputs, targets)
or(inputs, targets, sample weights)
y – Target data. Like the input data x, it could be either Numpy array(s) or TensorFlow tensor(s). It should be consistent with
x
(you cannot have Numpy inputs and tensor targets, or inversely). Ifx
is a dataset, dataset iterator, generator, ortf.keras.utils.Sequence
instance,y
should not be specified (since targets will be obtained fromx
).min_lr – Minimum learning rate
max_lr – Maximum learning rate
batch_size – Number of samples per gradient update. Do not specify the
batch_size
if your data is in the form of symbolic tensors, dataset, dataset iterators, generators, ortf.keras.utils.Sequence
instances (since they generate batches)steps – Number of steps to scan between min_lr and max_lr
- Returns
Learning rates, losses documented
Seq2seq¶
Layers mapping sequences to sequences
Modules
MultiHeadedAttention
¶
Applies (multi headed) attention, as in the Transformer
Arguments¶
n_heads (
int
): Number of attention headsn_units (
int
): Number of units per head, defaults to the last dimension of the inputcausal (
bool
): Use causality (make each time point in output dependent only on previous time points of input)name (
str
): Layer name
call
Arguments¶
inputs
(List[tf.Tensor]
): List of the following tensors
query: Query Tensor of shape [batch_size, Tq, dim]
value: Value Tensor of shape [batch_size, Tv, dim].
- key: Optional key Tensor of shape [batch_size, Tv, dim].
If not given, will use value for both key and value, which is the most common case
mask
(List[tf.Tensor]
): List of the following tensors
- query_mask: A boolean mask Tensor of shape [batch_size, Tq].
If given, the output will be zero at the positions where mask==False
- value_mask: A boolean mask Tensor of shape [batch_size, Tv].
If given, will apply the mask such that values at positions where mask==False do not contribute to the result
Input shape¶
(batch_size, time_steps, channels)
Output shape¶
Same shape as input.
Examples¶
Apply a 4 headed (default) self attention
import tensorflow as tf
import tavolo as tvl
# Inputs
inputs = tf.keras.Input(shape=(max_seq_length,), dtype='int32')
# Embedding lookup
embedding_layer = tf.keras.layers.Embedding(max_tokens, dimension)
embedded = embedding_layer(inputs)
# Apply multi headed self attention
mh_attention = tvl.seq2seq.MultiHeadedAttention()
attended = mh_attention([embedded, embedded])
Apply a 4 headed attention, using a query vector and masking
import tensorflow as tf
import tavolo as tvl
# Inputs
query_input = tf.keras.Input(shape=(max_seq_length,), dtype='int32')
value_input = tf.keras.Input(shape=(max_seq_length,), dtype='int32')
# Embedding lookup
embedding_layer = tf.keras.layers.Embedding(max_tokens, dimension, mask_zero=True)
embedded_query = embedding_layer(query_input)
embedded_value = embedding_layer(value_input)
# Masks
query_mask = embedding_layer.compute_mask(query_input)
value_mask = embedding_layer.compute_mask(value_input)
# Apply multi headed self attention
mh_attention = tvl.seq2seq.MultiHeadedAttention()
attended = mh_attention([embedded_query, embedded_value], mask=[query_mask, value_mask])
Note
Since the query and value should be passed separately, it is recommended to use the functional API or model subclassing to use this layer.
References¶
Seq2vec¶
Layers mapping sequences to vectors
Modules
YangAttention
¶
Reduce time dimension by applying attention using learned variables
Arguments¶
n_units (
int
): Attention’s variables unitsname (
str
): Layer name
Input shape¶
(batch_size, time_steps, channels)
Output shape¶
(batch_size, channels)
Examples¶
import tensorflow as tf
import tavolo as tvl
model = tf.keras.Sequential([tf.keras.layers.Embedding(vocab_size, 8, input_length=max_sequence_length),
tvl.seq2vec.YangAttention()])