Glossa

Math-heavy fixture

Display equations and inline math interleave with prose. KaTeX or MathJax may render math *after* widget mount; the walker should treat math containers as no-mark zones (they don't contain natural-language prose).

Per plan/01_renderer.md, math blocks render as <div class="math-display" data-block-id="…"> and inline math is wrapped in <span class="math-inline">. Both should be skipped by the term walker.

Gradients and Jacobians

The gradient of a scalar function is the vector of partial derivatives:

The Jacobian generalizes the gradient to vector-valued functions :

The Hessian is the matrix of second-order partials. In the LaTeX source above, the words "gradient", "Jacobian", and "Hessian" do not appear — they're only in the surrounding prose, which is where they should be marked.

Eigenvalues and SVD

An eigenvalue and eigenvector satisfy . The SVD factors any matrix as:

where holds the singular values. PCA is the special case where we use the SVD of a centered data matrix to find directions of maximum variance.

Norms

The L2 norm of is . The L1 norm is . The generic norm in prose refers to whichever is in scope.

Softmax and cross-entropy

The softmax of a vector is:

The inverse is the logit. Cross-entropy loss between a target distribution and a prediction is . KL divergence is .

Attention

In self-attention, queries , keys , and values combine via scaled dot product:

The numerator's normalization is cosine similarity when and are unit-norm. The whole operation acts on tensors of shape (batch, heads, seq, d_k). einsum notation compresses the math: bhqd, bhkd -> bhqk.

Positional encoding adds a sinusoidal vector to each token embedding so the dot-product attention can distinguish positions:

Inline math density

Inline math in dense paragraphs: when and , the chain rule gives . The terms "gradient" and "attention" appear in this very paragraph and should be marked in the prose, but the math expressions , , should NOT be touched by the walker.