# pedagogy.dev — full corpus

Every note on the site, concatenated. Source of truth is Markdown in src/content/notes/.

========================================================================
PILLAR: How People Learn
========================================================================

## The Spacing Effect
URL: https://pedagogy.dev/notes/spacing-effect/
Tags: memory, retention, study-design
Updated: 2026-06-15

The **spacing effect** is one of the most robust findings in the science of human
learning: information studied across spaced sessions is remembered far better than
the same amount of study packed into one session.

## The core idea

Two hours of study is not two hours of study. Four 30-minute sessions across a week
beat one 2-hour block — even though total time is identical. The *gaps* are doing
work.

The leading explanation is **retrieval difficulty**. When you return to material
after a delay, you've partially forgotten it, so pulling it back requires effort.
That effortful retrieval is precisely what strengthens the memory trace. Cramming
removes the gaps, so every "retrieval" is trivially easy — and easy retrieval
teaches the brain almost nothing.

## What it looks like in practice

- **Expanding intervals** — review after 1 day, then 3, then 7, then 21. Each
  successful recall buys a longer gap before the next one.
- **Interleaving** — mixing related topics within a session adds spacing *between*
  exposures to any single topic.
- **Desirable difficulty** — the session should feel a little hard. If recall is
  effortless, the interval was too short.

## Why it matters here

The spacing effect is the human-learning counterpart to ideas that show up in
machine learning too — see [Spaced Repetition Meets Curriculum
Learning](/notes/spaced-repetition-meets-curriculum-learning/), and the decay math
in [The Forgetting Curve](/notes/forgetting-curve/).

------------------------------------------------------------------------

## Retrieval Practice (the Testing Effect)
URL: https://pedagogy.dev/notes/retrieval-practice/
Tags: memory, active-recall, assessment
Updated: 2026-06-12

**Retrieval practice** — the testing effect — is the finding that *trying to recall*
something strengthens memory more than *re-studying* it. The test is not just a
measurement; it's a learning event.

## The classic result

Students who read a passage and then take a recall test outperform students who read
the same passage twice — on a delayed final test. The double-readers often *feel*
more confident (the material feels fluent), but fluency is a poor proxy for durable
memory. This gap between feeling and reality is why learners systematically
under-use the technique.

## Mechanism

Each successful retrieval:

1. Reconsolidates the memory along the cues you actually used to find it.
2. Creates additional retrieval routes, making future access easier.
3. Surfaces gaps immediately — you find out what you *don't* know while you can
   still fix it.

## Putting it to work

- Close the book and write down everything you remember (free recall).
- Use flashcards that force production, not recognition.
- Pair it with spacing — retrieval after a delay is the strongest combination. See
  [The Spacing Effect](/notes/spacing-effect/).

> Rule of thumb: if you can re-read it without discomfort, you're measuring fluency,
> not building memory.

------------------------------------------------------------------------

========================================================================
PILLAR: How AI Learns
========================================================================

## Gradient Descent
URL: https://pedagogy.dev/notes/gradient-descent/
Tags: optimization, training, fundamentals
Updated: 2026-06-14

**Gradient descent** is the engine of machine learning. A model "learns" by
repeatedly nudging its parameters in the direction that reduces error.

## The update rule

Given parameters `θ` and a loss function `L(θ)` that measures how wrong the model is:

```
θ ← θ − η · ∇L(θ)
```

- `∇L(θ)` is the **gradient** — the direction of steepest *increase* in loss.
- We step in the *opposite* direction (the minus sign) to *decrease* loss.
- `η` (eta) is the **learning rate** — how big each step is.

Repeat until the loss stops improving. That's it. Everything else — momentum, Adam,
schedulers — is a refinement of this loop.

## The intuition

Picture the loss as a hilly landscape and the model as a ball. The gradient tells
you which way is uphill; you roll downhill. Too large a learning rate and the ball
overshoots and bounces around; too small and it crawls.

## Where it gets interesting

- **Stochastic** gradient descent estimates the gradient from small batches, trading
  noise for speed — and that noise often helps generalization.
- The gradient itself is computed by **backpropagation**, the chain rule applied
  across the network's layers.
- The shape of `L` is set by the choice of loss — see [Cross-Entropy
  Loss](/notes/cross-entropy-loss/).

The human parallel: a learning rate that's too high looks like cramming (big,
unstable jumps); spaced, moderate steps converge more reliably — a theme in
[Spaced Repetition Meets Curriculum Learning](/notes/spaced-repetition-meets-curriculum-learning/).

------------------------------------------------------------------------

## Cross-Entropy Loss
URL: https://pedagogy.dev/notes/cross-entropy-loss/
Tags: loss-functions, classification, information-theory
Updated: 2026-06-10

**Cross-entropy** is the standard loss for classification — including the
next-token prediction that trains language models. It measures the distance between
what the model predicted and what actually happened.

## The formula

For a single example with true class `y` and predicted probability `p` for the
correct class:

```
L = −log(p)
```

For the full distribution over classes:

```
L = − Σ  y_i · log(p_i)
```

Because the true label is usually one-hot (`y_i` is 1 for the right class, 0
elsewhere), the sum collapses to just `−log(p_correct)`.

## Why the log matters

The `−log` term is the whole personality of this loss:

- Predict the right answer with `p = 0.99` → loss ≈ 0.01 (barely penalized).
- Predict it with `p = 0.5` → loss ≈ 0.69 (meaningful nudge).
- Predict it with `p = 0.01` → loss ≈ 4.6 (enormous penalty).

So a model that is **confidently wrong** is punished far more than one that is
merely uncertain. This pressure pushes models toward well-calibrated probabilities.

## The connection back to people

There's a learning-science echo here: confident errors are also the most valuable
ones for *humans* to correct — the "hypercorrection effect," where high-confidence
mistakes, once corrected, are the least likely to recur. Both systems learn most
from being confidently wrong and finding out.

------------------------------------------------------------------------

========================================================================
PILLAR: Patterns & Formulas
========================================================================

## The Magic Formula
URL: https://pedagogy.dev/notes/the-magic-formula/
Tags: thesis, building, product-strategy, ai
Updated: 2026-06-15

> **The magic is: AI + workflow + UX + customer pain + distribution.**

This is the core thing to remember. Not *one* of these — **all five, together.** Each is necessary; none is sufficient. The leverage is in the combination.

## The five ingredients

- **AI** — the raw capability. The engine that makes something possible (or cheap, or fast) that wasn't before.
- **Workflow** — the AI embedded in how work *actually gets done*, not a clever demo off to the side. It has to fit the real sequence of steps someone already moves through.
- **UX** — the experience that makes it usable, obvious, even delightful. Power nobody can figure out how to use is power that never ships.
- **Customer pain** — a real, *felt* problem someone will pay to make go away. Without it, you've built a solution hunting for a problem.
- **Distribution** — how it actually reaches the people who have that pain. The best product no one sees still loses.

## Why it has to be all five

Pull any single one out and the whole thing collapses:

- AI without **customer pain** → a cool toy nobody needs.
- AI without **distribution** → brilliant, and invisible.
- AI without **UX** → powerful, and unusable.
- AI without **workflow** → a demo, not a habit.
- Customer pain without **AI** → the same slow way everyone already does it.

The edge isn't being best at any *one* of these. It's being the rare person who stacks **all five on the same problem.**

------------------------------------------------------------------------

## The Forgetting Curve
URL: https://pedagogy.dev/notes/forgetting-curve/
Tags: memory, formulas, decay
Updated: 2026-06-13

In the 1880s Hermann Ebbinghaus memorized nonsense syllables and tested himself over
time. He found that retention drops sharply at first, then levels off — a curve that
turns up again and again in both human and machine learning.

## The shape

Memory retention `R` after time `t` is often modeled as exponential decay:

```
R = e^(−t / S)
```

- `R` — proportion retained (1 = perfect, 0 = gone).
- `t` — time since learning.
- `S` — **memory strength** (stability). Larger `S` = slower forgetting.

The key feature is that loss is fastest *immediately* after learning. Most of what
you forget, you forget soon.

## Why it's a "pattern," not just a fact

The same exponential form appears across learning systems:

- **Spaced repetition algorithms** (SM-2, FSRS) explicitly estimate `S` and schedule
  the next review for the moment `R` is predicted to dip to ~90%. Each successful
  review *increases* `S`, flattening the curve — this is the mechanism behind
  [The Spacing Effect](/notes/spacing-effect/).
- **Exponential decay** also governs learning-rate schedules and
  exponential moving averages in model training — the same `e^(−t/S)` skeleton,
  repurposed.

## The practical takeaway

You can't stop forgetting — but you can raise `S`. Every well-timed retrieval bends
the curve flatter. Review *just before* you'd forget, not after.

------------------------------------------------------------------------

## Spaced Repetition Meets Curriculum Learning
URL: https://pedagogy.dev/notes/spaced-repetition-meets-curriculum-learning/
Tags: analogy, curriculum-learning, scheduling
Updated: 2026-06-11

Humans and neural networks are wildly different systems, yet the *scheduling* of
their learning rhymes. This note lines up two ideas that turn out to be cousins.

## On the human side: spaced repetition

People retain more when exposures are spaced over time and ordered from easier to
harder. The schedule adapts to the learner: items you find hard come back sooner;
items you've mastered drift to long intervals. (See [The Spacing
Effect](/notes/spacing-effect/) and [Retrieval
Practice](/notes/retrieval-practice/).)

## On the machine side: curriculum learning

**Curriculum learning** is the finding that models often train better when examples
are presented in a meaningful order — easy concepts first, hard ones later — rather
than uniformly at random. Like a good syllabus, it shapes the path the optimizer
takes.

## The shared pattern

| | Humans | Machines |
|---|---|---|
| **Order** | easy → hard | easy → hard (curriculum) |
| **Timing** | space reviews over time | replay / rehearsal buffers |
| **Adaptivity** | review weak items sooner | sample hard examples more |
| **Failure mode** | cramming → fast forgetting | too-high learning rate → instability |

The deep commonality: **learning is path-dependent.** *What* you study isn't the
only thing that matters — *when* and *in what order* changes the outcome, for brains
and for [gradient descent](/notes/gradient-descent/) alike.

## Open question for this project

If forgetting follows [the same exponential skeleton](/notes/forgetting-curve/) in
both systems, how far can we push the analogy? Which human study techniques have a
genuine machine-learning twin, and which are coincidences of vocabulary? That's
exactly the kind of thing this site exists to map.

------------------------------------------------------------------------