The Bitter Lesson

Richard Sutton’s 2019 essay “The Bitter Lesson” makes a simple argument: in AI, more compute beats clever engineering. Every time.

The pattern keeps repeating. Researchers spend years encoding human knowledge into systems - handcrafted rules, expert heuristics, domain-specific tricks. Then someone comes along with a dumber approach that just throws more computation at the problem, and it wins.

Chess: decades of chess knowledge encoded into engines, then Deep Blue and later AlphaZero brute-forced their way past all of it. Computer vision: years of careful feature engineering, then CNNs trained on massive datasets made it all obsolete. Speech recognition, machine translation, game playing - same story.

It’s “bitter” because it feels wrong. We want to believe that understanding the problem deeply, that human insight, should matter. But the evidence says otherwise. General-purpose learning methods that scale with compute keep beating specialized approaches.

Why? Compute gets cheaper over time. Human expertise doesn’t scale. And it turns out that learning from data, given enough of it, finds patterns we wouldn’t have thought to encode.

The implication: bet on approaches that improve with more compute and data, not on clever tricks that don’t scale. This is why AI research has moved toward massive models, self-supervised learning, and the current race to build bigger training clusters.

Whether you find this liberating or depressing probably depends on how much you enjoy hand-tuning things.