Building an LLM
From Scratch
A quest to understand Transformers by building one. Trained on poetry, powered by PyTorch, and built from the ground up.
The Dad Question
Why torture myself with backpropagation?
One evening, my dad casually asked: "So, what exactly does this LLM thing do?"
I panicked. I threw out buzzwords—"transformers", "attention", "tokenization". I realized if I couldn't explain it simply, I didn't understand it.
So I decided to build one. Not fine-tune GPT-4. Not call an API. But build a Transformer from empty Python files. To peek under the hood, sweat through the math, and finally say: "Dad, I got this."
Why Scratch?
"You don’t just want wheels — you want monster truck wheels."
The Curriculum
Raising a baby poet on a diet of 223k characters.
We used Character-Level Tokenization. Every letter, space, and punctuation mark gets a unique ID. Simple, effective, and perfect for learning the "atoms" of language.
Tokenization Logic
Dataset Size
Tiny but Mighty
chars = sorted(list(set(text)))
stoi = { ch:i for i,ch in enumerate(chars) }
itos = { i:ch for i,ch in enumerate(chars) }
encode = lambda s: [stoi[c] for c in s]
decode = lambda l: ''.join([itos[i] for i in l])The Brain
Evolution from Bigram to Transformer.
V1
Bigram Model
Looks at one character to guess the next. No context, just probability tables. Result: Total gibberish.
V2
Self-Attention
Tokens start "talking" to each other. Queries, Keys, and Values allow the model to focus on relevant past information.
V3
Multi-Head Attention
Multiple attention heads run in parallel, capturing different nuances of the text simultaneously.
The Output
From gibberish to semi-coherent poetry.
Loss Curve
Started at ~4.5 (random guessing). After 2000 iterations, dropped to ~2.0, showing significant learning of structure and syntax.
Structure Emergence
The model learned to form words, use punctuation, and mimic the stanza structure of poems, even if the meaning is abstract.
Eum[boRFXHR)!Nk'Gwqw;ei(3sVtCLU...
To the bitnessabs, I witcap in syecrietss to siow,
there were.
A whake I pack in chicad Shateing...