attention
17 Jan 2026 - 17 Jan 2026
- Attention (machine learning) - Wikipedia)
Attention-like mechanisms were introduced in the 1990s under names like multiplicative modules, sigma pi units, and hyper-networks. Its flexibility comes from its role as "soft weights" that can change during runtime, in contrast to standard weights that must remain fixed at runtime.
- hard weights from training (backward), soft wweights from use (forward). Weird terminology but OK.