Programming with an AI agent is the next rung on the high-level language ladder. Machine code, then assembly, then FORTRAN, then C, then Python/JavaScript, then this. The new tier is very high level. With a real harness around it, the code that comes out is better than what you would have hand-written, and it ships faster. People who say AI produces slop are running the new compiler with no harness. That is a skill issue, not a tool problem.
The history of programming is the history of climbing. Punch cards and raw binary gave way to assembly in 1949, which gave way to FORTRAN in 1957 when John Backus’s team at IBM proved an optimizing compiler could match hand-tuned code. Grace Hopper’s A-0 and Flow-Matic became COBOL, the first time source code read like English. Dennis Ritchie’s C made portability cheap in 1972. Python and JavaScript made garbage collection and dynamic typing the default. Each rung was attacked as “not real programming” and each was vindicated by the economics. The agent is the next rung. The attacks are right on schedule.
At very high level, your source code is English and Markdown. Your compiler is Claude or Codex. Your target is whatever the platform needs — TypeScript, Rust, SQL, WGSL, protobuf. The translation step that used to be your job is now the compiler’s job, the same way register allocation stopped being your job when C arrived. The interesting work moves up. The mechanical work moves down. The ratio of intent to artifact gets steeper at every rung, and at this one it is about a hundred to one.
Without a harness, the new compiler emits slop. With one, it emits code better
than humans normally write. The harness is an
AGENTS.md at the repo root, technical and product
specs it links to, an issues/ folder where work happens, a test suite the
agent runs after every change, and a second AI — Codex if you’re driving
Claude, Claude if you’re driving Codex — that reviews the output at every
meaningful checkpoint. That is the syntax of very high level programming.
GitHub’s spec-kit shipped exactly this thesis as an open
standard
and crossed 90,000 stars in eight months. The industry is already converging.
Inside each issue, work is a sequence of experiments, not a task list. An experiment is designed to fail; you log Pass, Partial, or Fail, and you log the conclusion. The failures are the point. A logged dead end is a dead end no future agent will ever walk down again — including future you. Plans hide the search. Experiments record it. That is what makes the methodology converge reliably instead of vibing toward whatever the model felt like producing this session. I wrote the mechanics up in Issues and Experiments and the philosophy in Research Driven Development.
Here is the receipt. Since 2025-10 I have not typed a line of code by hand.
Everything that has shipped under my name in that window — TermSurf, a
terminal-native browser,
Shannon, a shell,
KeyPears, a password manager, the
EarthBucks blockchain, Compucha, this blog post —
was written by an agent under my direction, reviewed by a second agent,
tested, and merged. Click into the issues/ folder of any of those repos.
You will find 725+ closed issue READMEs, each one a small lab notebook with
experiments labeled Pass, Partial, or Fail. That is the engineering history,
not a sanitized summary. It is open. Audit it.
The strongest evidence that the harness works is what it makes irrational. If you find a bug in agent-written code, do not hand-edit. Update the spec and run another experiment. The agent is faster than you, follows instructions exactly, and the test harness plus second-agent review catches what slips through. Changing the source is higher leverage than changing the output. This is the same logic that made it irrational to hand-tune compiler output the moment optimizing compilers got good enough — and they got good enough sixty years ago. The output of a very-high-level compiler is exhaust. The spec is the program.
The reason “AI slop” is a real complaint is that most people are running this
compiler with no harness. No AGENTS.md, no specs, no issue folder, no second
agent, no tests. They prompt, they paste, they ship. Of course it rots. The
New Stack documented the open-source maintenance crisis this is
creating and they are right
about the symptom. They are wrong about the cause. The cause is context
engineering — knowing what to put in the harness so the compiler emits code
worth shipping. It is a learnable skill. People who have learned it produce
better and faster code than they ever produced by hand. People who have not
call the output slop and blame the model. The model is fine. Climb.
Program higher.