Math-heavy fixture

Display equations and inline math interleave with prose. KaTeX or MathJax may render math *after* widget mount; the walker should treat math containers as no-mark zones (they don't contain natural-language prose).

Per plan/01_renderer.md, math blocks render as <div class="math-display" data-block-id="…"> and inline math is wrapped in <span class="math-inline">. Both should be skipped by the term walker.

Gradients and Jacobians

The gradient of a scalar function $f : R^{n} \to R$ is the vector of partial derivatives:

\nabla f (x) = (\frac{\partial f}{\partial x _{1}}, \dots, \frac{\partial f}{\partial x _{n}})

The Jacobian generalizes the gradient to vector-valued functions $f : R^{n} \to R^{m}$ :

J_{f} = \frac{\partial f _{1}}{\partial x _{1}} ⋮ \frac{\partial f _{m}}{\partial x _{1}} \dots ⋱ \dots \frac{\partial f _{1}}{\partial x _{n}} ⋮ \frac{\partial f _{m}}{\partial x _{n}}

The Hessian is the matrix of second-order partials. In the LaTeX source above, the words "gradient", "Jacobian", and "Hessian" do not appear — they're only in the surrounding prose, which is where they should be marked.

Eigenvalues and SVD

An eigenvalue $λ$ and eigenvector $v$ satisfy $A v = λ v$ . The SVD factors any matrix $A$ as:

A = U Σ V^{⊤}

where $Σ$ holds the singular values. PCA is the special case where we use the SVD of a centered data matrix to find directions of maximum variance.

Norms

The L2 norm of $v$ is $∥ v ∥_{2} = \sum_{i} v_{i}^{2}$ . The L1 norm is $∥ v ∥_{1} = \sum_{i} ∣ v_{i} ∣$ . The generic norm in prose refers to whichever is in scope.

Softmax and cross-entropy

The softmax of a vector $z$ is:

softmax (z)_{i} = \frac{e ^{z_{i}}}{\sum _{j} e ^{z_{j}}}

The inverse is the logit. Cross-entropy loss between a target distribution $p$ and a prediction $q$ is $H (p, q) = - \sum_{i} p_{i} lo g q_{i}$ . KL divergence is $D_{K L} (p ∥ q) = \sum_{i} p_{i} lo g \frac{p _{i}}{q _{i}}$ .

Attention

In self-attention, queries $Q$ , keys $K$ , and values $V$ combine via scaled dot product:

Attention (Q, K, V) = softmax (\frac{Q K ^{⊤}}{d _{k}}) V

The numerator's normalization is cosine similarity when $Q$ and $K$ are unit-norm. The whole operation acts on tensors of shape (batch, heads, seq, d_k). einsum notation compresses the math: bhqd, bhkd -> bhqk.

Positional encoding adds a sinusoidal vector to each token embedding so the dot-product attention can distinguish positions:

P E_{(p os, 2 i)} = sin (p os /1000 0^{2 i / d}), P E_{(p os, 2 i + 1)} = cos (p os /1000 0^{2 i / d})

Inline math density

Inline math in dense paragraphs: when $f (x) = x^{2}$ and $g (x) = e^{x}$ , the chain rule gives $\frac{d}{d x} g (f (x)) = g^{'} (f (x)) f^{'} (x) = e^{x^{2}} \cdot 2 x$ . The terms "gradient" and "attention" appear in this very paragraph and should be marked in the prose, but the math expressions $f$ , $g$ , $e^{x^{2}}$ should NOT be touched by the walker.