Every feature I ship is built as a sequence of experiments, documented in markdown before, during, and after the code. I call this practice research-driven development. “Issues and experiments” is the shape it takes on the filesystem, and it is the single most important development practice I have adopted in the last year. The markdown is the primary artifact. The code is what happens when an experiment returns a Pass.
An issue is a folder at issues/{N}-{slug}/ containing a README.md with TOML
frontmatter. The frontmatter records status, opened, and (eventually)
closed. The body states the goal, the background, and any constraints. That is
the whole problem statement — no solution, no task list, no estimate. An issue
exists to hold a question that hasn’t been answered yet, and the answering of it
is what the rest of the folder records.
An experiment lives inside an issue’s README as a section with the header
### Experiment N: {title}. It has four parts: a description of what and why, a
list of files that will change, a verification procedure with concrete pass/fail
criteria, and a result — Pass, Partial, or Fail. The result is
always recorded. The conclusion paragraph says what we learned and what the next
experiment should probably test. That is every experiment in every issue across
every project I run.
This is the part that violates conventional project management. Most methodologies tell you to plan the work: break the story into subtasks, size each one, schedule them. Research-driven development forbids it. You design Experiment 1 only after the issue’s goal is written down clearly. You design Experiment 2 only after Experiment 1 has returned a result. An experiment that fails changes what the next experiment should be; an experiment that passes may eliminate the need for the next experiment entirely. Listing them up front is a category error — you would be planning the search before you’ve started searching.
Every experiment has three possible outcomes, and all three are valuable. Pass advances the issue toward its goal. Partial tells you the approach is closer but incomplete, and usually describes exactly which piece still needs to be solved. Fail eliminates a dead end. A dead end is a result I will never not need, because the alternative — not knowing the path is blocked — is strictly worse. The three-outcome model turns the emotional experience of failure into the neutral experience of filing a report.
When an issue is closed, it is never edited again. Wrong turns stay in the record. Typos stay in the record. Experiments that turned out to be based on a misunderstanding of the problem stay in the record, exactly as they were written. This rule is counter-intuitive to anyone trained in “keep documentation up to date,” and it is the most important rule in the whole system. The lab notebook is more valuable when it shows the search, not just the answer. A future reader — human or AI — who can see that I tried three wrong approaches before finding the fourth learns something that the polished version would hide.
Every line of code I have shipped in the last six months went through this
pattern. TermSurf,
Shannon,
KeyPears,
EarthBucks, Compucha — identical folder structure,
identical three outcomes, hundreds of issues apiece open to anyone who wants to
read them. The GitHub repos for TermSurf, Shannon, and KeyPears are the
receipts: click into issues/ in any of them and you are reading the actual
engineering history, not a sanitized summary.
This was not practical when I was writing code by hand. A lab notebook that you hand-maintain is a second full-time job. Research-driven development works because the AI reads and writes the notebook as fluently as it writes the code, and the notebook is how the AI keeps its bearings across a ten-file refactor at 2 a.m. The pattern is a direct consequence of having stopped writing code myself. The two practices are co-dependent. The notebook makes the AI competent, and the AI makes the notebook cheap enough to maintain.
I’ve written about research-driven development twice before — once on
this blog and once on
my day job’s engineering blog at Ironlight.
Those posts covered the philosophy: treat software like a research program,
run sequential experiments, document the failures, let the result of each
experiment determine the next. This post is the mechanics. The issues/
folder, the frontmatter, the three outcomes, the immutability rule — that is
what research-driven development looks like when it hits the filesystem.
Experiment.