Compression is Comprehension
There’s a moment when recursion clicks. You stop tracing the stack in your head - call, call, call, return, return, return - and start seeing the shape: “the answer for n is built from the answer for n-1.” You’re not holding more information. You’re holding less. But it covers more.
That’s compression. And compression, it turns out, is what understanding actually is.
The chess experiment
In 1973, Chase and Simon ran an experiment with chess players. They’d show a board position briefly, then ask players to reconstruct it from memory. Experts crushed novices - no surprise there.
But here’s the interesting part: when pieces were placed randomly instead of from real games, experts and novices performed almost identically.
Experts don’t have better memory hardware. They have better compression. Where a novice sees 25 individual pieces, an expert sees 4-5 meaningful chunks: “Sicilian Defense setup,” “kingside attack formation.” Each chunk is a compressed representation of multiple pieces in specific relationships.
The expertise isn’t in the remembering. It’s in the encoding.
Compression as a theory of understanding
Gregory Chaitin, who helped develop algorithmic information theory, put it directly:
“Comprehension is compression; a useful theory is a compression of the data.”
And more pointedly:
“To me, you understand something only if you can program it.”
This isn’t metaphor. Kolmogorov complexity - the field Chaitin worked in - measures the “intrinsic” complexity of something as the length of the shortest program that produces it. If you can describe something with a short program, it has structure, patterns, regularity. If you can’t compress it - if the shortest description is as long as the thing itself - it’s essentially random.
The shortest program isn’t just a file size optimization. It’s the deepest possible understanding of the structure. Finding that program means finding the underlying pattern that generates the surface phenomena.
Newton’s laws are a compression. Before Newton: thousands of observations about falling apples, orbiting moons, rolling balls, flying cannonballs. After Newton: F = ma. Three laws. All those observations are derivable from a tiny kernel of theory. You “understand” gravity when you can compress all gravitational phenomena into those equations.
The ladder you can’t see
Paul Graham wrote about the “Blub paradox” - Blub being a hypothetical middle-of-the-road programming language:
“As long as our hypothetical Blub programmer is looking down the power continuum, he knows he’s looking down… But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn’t realize he’s looking up. What he sees are merely weird languages.”
You can always see the compressions below your level. Assembly is obviously more verbose than C. C is obviously more verbose than Python. Looking down, you see clearly what those lower levels are missing.
But looking up, the next level of compression just looks like “weird unnecessary stuff.” Lisp macros? “Why would I need programs that write programs?” Haskell’s type system? “Seems like a lot of ceremony.” The abstraction you haven’t internalized yet looks like overkill - until it clicks.
The Blub programmer can’t see what they’re missing because their mental model can’t accommodate the compression. You can only see the full landscape from the top.
Precision, not vagueness
There’s a common misconception that abstraction means being vague - hand-wavy, imprecise, hiding the details. Dijkstra corrected this:
“The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.”
Abstraction doesn’t make things fuzzier. It creates a new plane where you can be exact about things that would be impossible to pin down at lower levels. You can’t precisely describe “user authentication” in terms of register operations. But at the right level of abstraction, you can be very precise about what it means.
Knuth put the skill this way:
“The essence of computer science is an ability to understand many levels of abstraction simultaneously.”
It’s not about living at one level. It’s about moving between levels fluidly.
The wrong compression
Compression can go wrong. Sandi Metz’s warning:
“Duplication is far cheaper than the wrong abstraction.”
You can compress to the wrong pattern. Someone sees similarity, extracts an abstraction, and now you have a compression that doesn’t actually capture the underlying structure. It’s like a lossy compression that throws away the wrong bits. Over time, edge cases accumulate, conditionals multiply, and the “abstraction” becomes harder to understand than the duplication it replaced.
The wrong compression is worse than no compression. At least with duplication, the code says what it does. A bad abstraction obscures while pretending to clarify.
The limit
Here’s something beautiful and a little unsettling: Chaitin proved that some mathematical truths are “irreducible.” They’re true, but there’s no proof simpler than the statement itself. They’re true for no reason - true by accident.
“There is no concise theory: it has to be comprehended or apprehended as a thing in itself.”
Not everything compresses. Some things resist understanding in this sense. There are facts that have no shorter description, no underlying pattern, no deeper explanation.
Understanding has limits. But within those limits, compression is the game. When something clicks - when recursion makes sense, when a design pattern suddenly feels obvious, when a codebase that seemed chaotic reveals its structure - you’ve found a shorter description. You’ve compressed.
That’s what understanding feels like. Because that’s what understanding is.