Train Loss: (1, 1.099)

Information Design

Educational Callouts

Custom callouts designed for academic contexts with distinct semantic types.

✦Intuition

Intuitive Understanding

Intuition callouts help build mental models. They use warm colors to invite the reader to think conceptually before diving into formalisms.

DDefinition

Formal Definition

Definition callouts use professional blue tones for rigorous statements and terminology.

TTheorem

Theorem 1.1 (Central Limit)

Theorem callouts use purple tones to highlight proven mathematical or scientific results.

EExample

Example callouts (green) provide concrete applications of abstract concepts.

!Common pitfall

Common Pitfall

Warning callouts (red) alert readers to subtle errors or misconceptions.

✶Nice to know

"Nice to know" callouts (pink) provide interesting side notes that aren't critical to the main path.

Rigorous Notation

Math & Equations

Integrated LaTeX support via KaTeX with block and inline formatting.

Block Equations

f(x) = \int_{-\infty}^{\infty} \hat{f}(\xi) e^{2\pi i \xi x} \, d\xi

Inline Equations

The fundamental theorem of calculus states that

\int_a^b f(x) dx = F(b) - F(a)

where

F' = f

.

Logical Structure

Proof Blocks

1.Assume √2 is rational. Then √2 = a/b where a, b are coprime integers.

2.Squaring both sides: 2 = a²/b², so a² = 2b². Thus a² is even, which implies a is even.

3.Let a = 2k. Then (2k)² = 2b², so 4k² = 2b², which means b² = 2k².

4.Thus b² is even, so b is even. But if a and b are both even, they aren't coprime-a contradiction.

∎

Visual Learning

Interactive Graphs

Sigmoid Activation FunctionTypical non-linear function used in neural networks.

Normal DistributionThe probability density function for a Gaussian distribution.

Media

Image Blocks

Use markdown image syntax or the explicit MDX image component for figures with optional captions.

Distributed systems example cover image — MdxImage supports responsive rendering with optional captions.

Data Analysis

Statistical Charts

Algorithm Performance Comparison (ms)

Energy Consumption Breakdown

Compute (45%)

Memory (25%)

I/O (20%)

Other (10%)

System Trade-offs

Machine Learning

ML Plot Suite

Training Curves

Epoch-wise train/validation loss with an overfitting marker.

Train LossValidation Loss

Embedding Projection

Cluster separation in 2D projection.

Class AClass B

Confusion Matrix

Normalized class-level classification accuracy.

Row	Cat	Dog	Bird
Cat	0.92	0.07	0.01
Dog	0.08	0.85	0.07
Bird	0.03	0.11	0.86

Low

High

Computation

Algorithms & Code

Mini-batch Gradient Descent

Input

Dataset D, model parameters θ, learning rate η

Output

Updated parameters θ*

Complexity

O(E * |D| * fwd/bwd)

1
Initialize
Initialize parameters θ and optimizer state.
Random init or pretrained
2
Forward
For each mini-batch, compute predictions and loss.
Cross-entropy or MSE
3
Backward
Backpropagate gradients with respect to θ.
Autodiff graph
4
Update
Apply optimizer update rule using η and gradients.
SGD/Adam step
5
Repeat
Run for E epochs and monitor validation metrics.
Early stopping optional

Training Step (PyTorch)

python · train_step.py

1def train_step(model, batch, optimizer, criterion):
2    model.train()
3    x, y = batch
4    optimizer.zero_grad()
5    logits = model(x)
6    loss = criterion(logits, y)
7    loss.backward()
8    optimizer.step()
9    return loss.item()

Learning Loop

Exercise Blocks

Bias-Variance Check

medium

Training loss keeps decreasing, but validation loss starts increasing after epoch 12. What is happening and what should you change first?

Deep Learning Internals

Tensor Shapes

Layer	Operation	Shape	Note
Input Tokens	Embedding Lookup	[B, T, d_model]	Token + position embedding
Self-Attention	QK^T / sqrt(dk)	[B, H, T, T]	Attention scores
Context	softmax(scores) * V	[B, T, d_model]	-
MLP	Linear -> GELU -> Linear	[B, T, d_model]	-
Logits	Projection to vocab	[B, T, V]	Pre-softmax output

Architecture

Model Diagrams

Encoder-Decoder Overview

Simple sequence-to-sequence model with cross-attention bridge.

Infrastructure

System Architecture

Declarative diagrams for distributed systems and cloud topologies.

3-Tier Distributed Web Architecture

Load Balancer

Server A

Server B

Primary DB

Communication

Sequence Diagrams

Visualize message passing and distributed protocols over time.

Raft Leader Election (Successful)

Structured Data

Math Tables

Operator	Meaning	LaTeX
∇	Gradient / Nabla	\nabla
Σ	Summation	\sum
∏	Product	\prod
∂	Partial Derivative	\partial

UX Flow

Series Navigation

← Previous

Vector Calculus

Math Foundations

2 of 5

Next →

Real Analysis

Component Gallery