# pedagogy.dev — full corpus Every note on the site, concatenated. Source of truth is Markdown in src/content/notes/. ======================================================================== PILLAR: How People Learn ======================================================================== ## The Spacing Effect URL: https://pedagogy.dev/notes/spacing-effect/ Tags: memory, retention, study-design Updated: 2026-06-15 The **spacing effect** is one of the most robust findings in the science of human learning: information studied across spaced sessions is remembered far better than the same amount of study packed into one session. ## The core idea Two hours of study is not two hours of study. Four 30-minute sessions across a week beat one 2-hour block — even though total time is identical. The *gaps* are doing work. The leading explanation is **retrieval difficulty**. When you return to material after a delay, you've partially forgotten it, so pulling it back requires effort. That effortful retrieval is precisely what strengthens the memory trace. Cramming removes the gaps, so every "retrieval" is trivially easy — and easy retrieval teaches the brain almost nothing. ## What it looks like in practice - **Expanding intervals** — review after 1 day, then 3, then 7, then 21. Each successful recall buys a longer gap before the next one. - **Interleaving** — mixing related topics within a session adds spacing *between* exposures to any single topic. - **Desirable difficulty** — the session should feel a little hard. If recall is effortless, the interval was too short. ## Why it matters here The spacing effect is the human-learning counterpart to ideas that show up in machine learning too — see [Spaced Repetition Meets Curriculum Learning](/notes/spaced-repetition-meets-curriculum-learning/), and the decay math in [The Forgetting Curve](/notes/forgetting-curve/). ------------------------------------------------------------------------ ## Retrieval Practice (the Testing Effect) URL: https://pedagogy.dev/notes/retrieval-practice/ Tags: memory, active-recall, assessment Updated: 2026-06-12 **Retrieval practice** — the testing effect — is the finding that *trying to recall* something strengthens memory more than *re-studying* it. The test is not just a measurement; it's a learning event. ## The classic result Students who read a passage and then take a recall test outperform students who read the same passage twice — on a delayed final test. The double-readers often *feel* more confident (the material feels fluent), but fluency is a poor proxy for durable memory. This gap between feeling and reality is why learners systematically under-use the technique. ## Mechanism Each successful retrieval: 1. Reconsolidates the memory along the cues you actually used to find it. 2. Creates additional retrieval routes, making future access easier. 3. Surfaces gaps immediately — you find out what you *don't* know while you can still fix it. ## Putting it to work - Close the book and write down everything you remember (free recall). - Use flashcards that force production, not recognition. - Pair it with spacing — retrieval after a delay is the strongest combination. See [The Spacing Effect](/notes/spacing-effect/). > Rule of thumb: if you can re-read it without discomfort, you're measuring fluency, > not building memory. ------------------------------------------------------------------------ ======================================================================== PILLAR: How AI Learns ======================================================================== ## Gradient Descent URL: https://pedagogy.dev/notes/gradient-descent/ Tags: optimization, training, fundamentals Updated: 2026-06-14 **Gradient descent** is the engine of machine learning. A model "learns" by repeatedly nudging its parameters in the direction that reduces error. ## The update rule Given parameters `θ` and a loss function `L(θ)` that measures how wrong the model is: ``` θ ← θ − η · ∇L(θ) ``` - `∇L(θ)` is the **gradient** — the direction of steepest *increase* in loss. - We step in the *opposite* direction (the minus sign) to *decrease* loss. - `η` (eta) is the **learning rate** — how big each step is. Repeat until the loss stops improving. That's it. Everything else — momentum, Adam, schedulers — is a refinement of this loop. ## The intuition Picture the loss as a hilly landscape and the model as a ball. The gradient tells you which way is uphill; you roll downhill. Too large a learning rate and the ball overshoots and bounces around; too small and it crawls. ## Where it gets interesting - **Stochastic** gradient descent estimates the gradient from small batches, trading noise for speed — and that noise often helps generalization. - The gradient itself is computed by **backpropagation**, the chain rule applied across the network's layers. - The shape of `L` is set by the choice of loss — see [Cross-Entropy Loss](/notes/cross-entropy-loss/). The human parallel: a learning rate that's too high looks like cramming (big, unstable jumps); spaced, moderate steps converge more reliably — a theme in [Spaced Repetition Meets Curriculum Learning](/notes/spaced-repetition-meets-curriculum-learning/). ------------------------------------------------------------------------ ## Cross-Entropy Loss URL: https://pedagogy.dev/notes/cross-entropy-loss/ Tags: loss-functions, classification, information-theory Updated: 2026-06-10 **Cross-entropy** is the standard loss for classification — including the next-token prediction that trains language models. It measures the distance between what the model predicted and what actually happened. ## The formula For a single example with true class `y` and predicted probability `p` for the correct class: ``` L = −log(p) ``` For the full distribution over classes: ``` L = − Σ y_i · log(p_i) ``` Because the true label is usually one-hot (`y_i` is 1 for the right class, 0 elsewhere), the sum collapses to just `−log(p_correct)`. ## Why the log matters The `−log` term is the whole personality of this loss: - Predict the right answer with `p = 0.99` → loss ≈ 0.01 (barely penalized). - Predict it with `p = 0.5` → loss ≈ 0.69 (meaningful nudge). - Predict it with `p = 0.01` → loss ≈ 4.6 (enormous penalty). So a model that is **confidently wrong** is punished far more than one that is merely uncertain. This pressure pushes models toward well-calibrated probabilities. ## The connection back to people There's a learning-science echo here: confident errors are also the most valuable ones for *humans* to correct — the "hypercorrection effect," where high-confidence mistakes, once corrected, are the least likely to recur. Both systems learn most from being confidently wrong and finding out. ------------------------------------------------------------------------ ======================================================================== PILLAR: Patterns & Formulas ======================================================================== ## The Magic Formula URL: https://pedagogy.dev/notes/the-magic-formula/ Tags: thesis, building, product-strategy, ai Updated: 2026-06-15 > **The magic is: AI + workflow + UX + customer pain + distribution.** This is the core thing to remember. Not *one* of these — **all five, together.** Each is necessary; none is sufficient. The leverage is in the combination. ## The five ingredients - **AI** — the raw capability. The engine that makes something possible (or cheap, or fast) that wasn't before. - **Workflow** — the AI embedded in how work *actually gets done*, not a clever demo off to the side. It has to fit the real sequence of steps someone already moves through. - **UX** — the experience that makes it usable, obvious, even delightful. Power nobody can figure out how to use is power that never ships. - **Customer pain** — a real, *felt* problem someone will pay to make go away. Without it, you've built a solution hunting for a problem. - **Distribution** — how it actually reaches the people who have that pain. The best product no one sees still loses. ## Why it has to be all five Pull any single one out and the whole thing collapses: - AI without **customer pain** → a cool toy nobody needs. - AI without **distribution** → brilliant, and invisible. - AI without **UX** → powerful, and unusable. - AI without **workflow** → a demo, not a habit. - Customer pain without **AI** → the same slow way everyone already does it. The edge isn't being best at any *one* of these. It's being the rare person who stacks **all five on the same problem.** ------------------------------------------------------------------------ ## The Forgetting Curve URL: https://pedagogy.dev/notes/forgetting-curve/ Tags: memory, formulas, decay Updated: 2026-06-13 In the 1880s Hermann Ebbinghaus memorized nonsense syllables and tested himself over time. He found that retention drops sharply at first, then levels off — a curve that turns up again and again in both human and machine learning. ## The shape Memory retention `R` after time `t` is often modeled as exponential decay: ``` R = e^(−t / S) ``` - `R` — proportion retained (1 = perfect, 0 = gone). - `t` — time since learning. - `S` — **memory strength** (stability). Larger `S` = slower forgetting. The key feature is that loss is fastest *immediately* after learning. Most of what you forget, you forget soon. ## Why it's a "pattern," not just a fact The same exponential form appears across learning systems: - **Spaced repetition algorithms** (SM-2, FSRS) explicitly estimate `S` and schedule the next review for the moment `R` is predicted to dip to ~90%. Each successful review *increases* `S`, flattening the curve — this is the mechanism behind [The Spacing Effect](/notes/spacing-effect/). - **Exponential decay** also governs learning-rate schedules and exponential moving averages in model training — the same `e^(−t/S)` skeleton, repurposed. ## The practical takeaway You can't stop forgetting — but you can raise `S`. Every well-timed retrieval bends the curve flatter. Review *just before* you'd forget, not after. ------------------------------------------------------------------------ ## Spaced Repetition Meets Curriculum Learning URL: https://pedagogy.dev/notes/spaced-repetition-meets-curriculum-learning/ Tags: analogy, curriculum-learning, scheduling Updated: 2026-06-11 Humans and neural networks are wildly different systems, yet the *scheduling* of their learning rhymes. This note lines up two ideas that turn out to be cousins. ## On the human side: spaced repetition People retain more when exposures are spaced over time and ordered from easier to harder. The schedule adapts to the learner: items you find hard come back sooner; items you've mastered drift to long intervals. (See [The Spacing Effect](/notes/spacing-effect/) and [Retrieval Practice](/notes/retrieval-practice/).) ## On the machine side: curriculum learning **Curriculum learning** is the finding that models often train better when examples are presented in a meaningful order — easy concepts first, hard ones later — rather than uniformly at random. Like a good syllabus, it shapes the path the optimizer takes. ## The shared pattern | | Humans | Machines | |---|---|---| | **Order** | easy → hard | easy → hard (curriculum) | | **Timing** | space reviews over time | replay / rehearsal buffers | | **Adaptivity** | review weak items sooner | sample hard examples more | | **Failure mode** | cramming → fast forgetting | too-high learning rate → instability | The deep commonality: **learning is path-dependent.** *What* you study isn't the only thing that matters — *when* and *in what order* changes the outcome, for brains and for [gradient descent](/notes/gradient-descent/) alike. ## Open question for this project If forgetting follows [the same exponential skeleton](/notes/forgetting-curve/) in both systems, how far can we push the analogy? Which human study techniques have a genuine machine-learning twin, and which are coincidences of vocabulary? That's exactly the kind of thing this site exists to map. ------------------------------------------------------------------------