About

AMMDI is an open-notebook hypertext writing experiment, authored by Mike Travers aka mtraven. It's a work in progress and some parts are more polished than others. Comments welcome! More.

Search

MapFull

Incoming links

from Language Models

transformer blocks

from Language Models

The input vectors are then passed through a series of transformer blocks. Each block consists of a self-attention layer and a feed-forward layer. The self-attention layer allows the model to consider the relationships between different tokens in the input, while the feed-forward layer transforms the input using a learned function.

Twin Pages

No twinpages!

transformer blocks

30 Dec 2022 03:50 - 28 Nov 2023 05:18

[[1706.03762] Attention Is All You Need](https://arxiv.org/abs/1706.03762)

Recurrent networks (including LSTM) are state of the art (in 2017). This proposes flushing them and replacing with nothing but attention mechanisms, specifically the Transformer.

We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely

In these [convolutional] models, the number of operations required to relate signals from two arbitrary input or output positions grows in the distance between positions, linearly for ConvS2S and logarithmically for ByteNet

would be interesting to know the topology of those networks (I don't).

Sequence prediction has obvious problems of non-parallizablity, which all these things aim to address somehow.

self-attention (intra-attention)