The Harness Problem
Seymour Cray liked to ask:
If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?
Picture it. A thousand chickens harnessed to a plow, pulling in different directions, tangling each other’s lines. The infrastructure to coordinate them would weigh more than the plow. Cray built the fastest supercomputers on earth for two decades, and this was his case against parallel computing.
He was wrong, mostly. The chickens won.
GPUs are thousands of tiny cores. MapReduce split data processing across commodity machines. Every top supercomputer today is massively parallel. Cray himself seemed to realize this - in 1996, he founded a company to build a massively parallel machine.
But the chickens didn’t win by being chickens. They won because someone solved the harness problem first.
Removing the harness
A GPU works because matrix multiplication decomposes cleanly. Each core multiplies its slice, the results combine, and the cores barely talk to each other. No coordination, no synchronization. Each chicken gets its own piece of the field.
MapReduce took inspiration from map and reduce in functional programming and made them the core abstractions. If your problem fits that shape, you get massive parallelism for free. Forcing your problem into that shape was the hard part.
Chess is the more interesting case. Parallelizing traditional chess engines hits diminishing returns. Searching a game tree has sequential dependencies - you need the result of one branch to decide whether to skip another. You can’t just throw more cores at it. DeepMind reformulated the problem entirely. Instead of searching deeper, train a neural network through self-play to evaluate positions and guide a shallower search. Thousands of games can run in parallel during training, each one independent. The problem changed shape, and the harness got radically lighter.
In every case, the breakthrough wasn’t more chickens. It was someone figuring out how to remove the harness entirely.
When the harness is the work
This doesn’t always work. Some problems resist decomposition - not because we haven’t found the trick yet, but because the coordination is the problem.
Distributed consensus is the clearest example. Nodes need to agree on the order of transactions. That agreement is the whole point. You can make it more efficient - better algorithms, fewer round trips - but you can’t give each node its own independent piece, because the relationship between them is the work.
The coordination overhead in these cases isn’t an engineering failure. It’s a property of the problem. Add more chickens and you don’t get more plowing done - you get more tangled harnesses. Past a certain point, it actually gets worse, not just slower to improve.
Most real problems live somewhere between embarrassingly parallel and irreducibly coordinated. The useful question is whether you can reshape yours so the harness gets lighter.
The bitter part
This is what Rich Sutton’s bitter lesson is really about. Scale always beats cleverness - but only after someone finds a formulation where scale could win. That decomposition is oxen work. And once it exists, the chickens take over.
The oxen plow the first row. The chickens plow the rest. And eventually, nobody remembers the oxen were there.