attention

17 Jan 2026 - 17 Jan 2026

Attention (machine learning) - Wikipedia)

Attention-like mechanisms were introduced in the 1990s under names like multiplicative modules, sigma pi units, and hyper-networks. Its flexibility comes from its role as "soft weights" that can change during runtime, in contrast to standard weights that must remain fixed at runtime.

hard weights from training (backward), soft wweights from use (forward). Weird terminology but OK.