I don’t solve software issues anymore. I specify them, type /goal solve issue 801, and walk away — the agent designs experiments, writes the code, and runs
the tests while a second agent reviews its work adversarially at every step.
This is not a demo or a prediction. It is how I shipped real features this past
week, and it means we have crossed into the automated software development age.
My job is now scaffolding, not coding. I keep a folder called issues/,
numbered from 1 to infinity, and every issue is a self-contained problem
statement. A clear description of the goal. Explicit verification criteria for
what “solved” actually means. And guardrails: which tests to write — unit tests,
linter, typechecker — plus the code patterns and best practices the solution
must follow. Once that is written down, in the issue itself or in AGENTS.md, my
work is essentially done. I am not the one who solves the issue. I am the one who
defines it well enough that solving it becomes mechanical.
The agent does not make a plan and march through it. It designs a series of experiments, and each experiment either fully solves the issue, partially solves it, or fails — and failure is fine. A failed experiment is progress, because it means something was poorly understood, and now it isn’t. Every experiment is logged whether it passes or fails, so the record narrows toward the goal with each iteration. I’ve written about this lab-notebook methodology before, as research-driven development and as issues and experiments on the filesystem. What’s new is that I no longer drive the loop by hand.
One command runs the loop now. After the description, verification criteria, and
guardrails are all in place, I type /goal solve issue 801 and the agent takes
it from there: design an experiment, get it reviewed, implement it, verify it,
record the result, design the next one, repeat until the verification criteria
pass. This is the difference between a methodology and an automation. Last year I
described using Claude Code to do my entire
job by hand, one
prompt at a time. Now the prompts are a single goal, and the iteration is the
machine’s. I set the goal and walk away.
The thing that makes this reliable is a second agent trying to tear the work
apart. It is not enough for the AI to write and pass its own tests — a separate
agent performs an adversarial review at the key stages. The experiment design
must get a pass before any code is written, and the result must get a pass
before the next experiment begins. This critical review keeps the agent on track
and free from the errors and AI slop
that a single model, marking its own homework, will happily ship. A different
model helps, but it isn’t strictly required: different context is all you need,
so long as both agents are state-of-the-art. Use a fresh built-in subagent — a
clean context, not a fork — or call out to a CLI tool like claude -p or
codex -p. The reviewer just has to come at the work cold.
Here is the part you can’t dismiss. With this flow I added full PDF support to
TermSurf — a GPU-accelerated Chromium
browser inside the terminal — in about a day. By hand, wiring PDF rendering
through a Chromium Content-API embedder would have cost me months. PDFs now
render inline, scroll, select, and print. And right now, as I write this, /goal
is doing something I would never have attempted alone: it is porting Ghostty from
Zig into Rust, one terminal subsystem at a time. Issue 801 is 127 experiments
deep, nearly all of them passing, with over 300 commits landed in the last few
days — tabstops, page storage, selection, formatters, escape-sequence handlers —
each one designed, reviewed, implemented, tested, and reviewed again, with my
hands off the keyboard. That is a falsifiable claim with a commit log behind it.
This works today because three things converged at once. The models finally crossed the quality bar where their code is better than what most engineers write by hand. The tooling grew goal-level orchestration, so a whole issue can be handed off in one command. And the best practices — tests, explicit verification, adversarial review — matured into a loop that produces output you can actually trust. None of these alone is enough. Together they mean the software development lifecycle, end to end, can now run without me in it. I already stopped writing code. Now I’ve stopped solving issues, too.
Set the goal.