Research Driven Development

A failed experiment is not a failed plan. A failed experiment is progress—you learned what doesn’t work, which narrows the search for what does. Research Driven Development is a methodology built on this insight: when you’re doing genuinely innovative work, you can’t plan your way forward. You have to research your way forward, one experiment at a time—and an AI agent can help at every stage.

The full flow looks like this: start by writing down your goal. Research the problem space—read docs, search for prior art, understand the constraints. Then, instead of making a plan, design an experiment. Implement it. Record the result—pass, partial, or fail. Write down what you learned. Use that knowledge to design the next experiment. Repeat until the goal is accomplished. The documentation you produce along the way—the goal, the research, the experiments, the failures, the breakthroughs—becomes a permanent record that both humans and AI can reference later. It’s a lab notebook for software.

To understand where RDD fits, it helps to see what came before it.

Write Before You Code

Every serious software methodology agrees on one thing: write before you code. Test-driven development says write the tests first. Spec-driven development says write the spec first. Documentation-driven development says write the user-facing docs first. They’re all correct. Writing forces clarity. Writing is cheaper than coding. Iterating on prose is faster than iterating on implementation.

But all of these methodologies share a hidden assumption: you know what to build. What happens when you don’t?

The Spec Driven Era

Spec-driven development is the current industry darling. GitHub released an open-source spec-kit. AWS built Kiro around it. Martin Fowler wrote about the tooling ecosystem. The pitch is compelling: write a formal specification, feed it to an AI agent, and the agent generates code that conforms to the spec. Code becomes a “transient byproduct” of the specification.

This works. For APIs, CRUD apps, and well-understood features, it works extremely well. Teams report 75% reductions in cycle time because incompatibilities are caught at the spec review stage instead of in production.

But notice the assumption baked into the methodology: you can write a complete and correct spec before you start. The spec is the source of truth. The AI implements from the spec. If the spec is wrong, everything downstream is wrong.

This is fine when you know what you’re building. Most software is like this. You’re adding a REST endpoint. You’re building a settings page. You’re integrating a third-party API. The shape of the solution is known before you start.

But the most interesting software problems are not like this.

The Gap

I’ve been building TermSurf for the past two months. TermSurf is a protocol for embedding full web browsers inside terminal emulators. Type web google.com and a browser appears directly in your terminal pane—no window switching, no context loss. The protocol is designed to work with any terminal and any browser engine: Ghostboard (a Ghostty fork in Zig), Wezboard (a WezTerm fork in Rust), with Chromium as the first browser engine and WebKit, Gecko, and Ladybird planned.

To make this work, I had to fork Chromium. Chromium has 1.7 million commits and 35 million lines of code. It is one of the largest open-source projects in the world. There is no documentation for how to embed its GPU rendering output into a terminal pane via zero-copy compositing. There is no Stack Overflow answer. There is no prior art to copy from.

One does not simply “fork Chromium.” You cannot write a spec for this. You cannot plan your way through 35 million lines of someone else’s code. Every problem is unprecedented. Every solution is an experiment into unknown territory.

This is the kind of project where traditional methodologies break down completely. A plan has no room for failure. If your plan doesn’t work, you’re behind schedule. You wasted time. Something went wrong.

But when you’re working at this scale of uncertainty, things “go wrong” all the time. And the thing that “went wrong” is often the most valuable thing that happened. You learned something you couldn’t have learned any other way.

TermSurf is always failing. And because it’s always failing, it’s always learning. That’s why I needed a methodology that treats failure as progress rather than a setback. That methodology is Research Driven Development.

Why Documentation, Not Code

Before describing the methodology, there’s a key insight that makes it work: documentation is a better representation of knowledge than code.

Modern AI tools are not just about writing code. They read and write documentation too. This matters more than most people realize, because documentation has a structural advantage over code: it’s linear.

An issue document tells a story from top to bottom. Goal, research, experiment, result, conclusion. One file, one narrative. A human can read it start to finish and understand the entire arc of a problem and its solution in minutes.

The code that implements the same feature is nothing like this. It’s scattered across dozens of files, modules, and layers. A single feature in TermSurf might touch the protobuf protocol definition, the Rust socket handler in Roamium, the Zig rendering pipeline in Ghostboard, and the C++ Chromium Content API patches. Reading the code tells you what the system does right now. It doesn’t tell you why it does it that way, what was tried and rejected, or how we got here.

Documentation captures all of that. And because it’s linear and human-readable, it’s also easy for AI to read and understand. One file with a clear narrative is trivially parseable. Source code scattered across four languages and three repositories is not.

This creates a powerful bidirectional loop. The AI helps you write the documentation as you work—summarizing research, recording experiment results, capturing conclusions. Later, the AI reads that same documentation back in and maps it to the scattered source code files, reconstructing the context that would otherwise be lost. The linear narrative becomes an index into the nonlinear codebase.

This is the foundation that RDD is built on. The lab notebook isn’t overhead. It’s the most valuable artifact you produce.

RDD is the scientific method applied to software engineering, with an AI agent as a collaborator at every stage. It emerged from the practical reality of building TermSurf—a project where every problem is unprecedented and every solution must be discovered through experimentation.

Here are the pillars.

Goal, Not Plan. Research Before Action.

Start by writing down what you want to achieve, not how you’ll achieve it. A plan presumes you know the path. A goal acknowledges you don’t.

Then investigate before touching code. Read documentation. Search for prior art. Understand the problem space. Use the AI agent as a research partner—it can survey a landscape faster than you can alone. This phase builds the context that makes experiments meaningful rather than random guessing.

Experiments, Not Tasks

This is the core philosophical shift. The fundamental unit of work in RDD is the experiment, not the task. A task either succeeds or fails. An experiment always succeeds—because even a “failed” result produces knowledge.

Each experiment has a clear structure: what you’re going to try, what code changes are required, how you’ll verify the result, and concrete pass/fail criteria. Then you run it, record what happened, and write what you learned. Only then do you design the next experiment. Never plan multiple experiments ahead—the result of Experiment N determines what Experiment N+1 should be.

This sounds inefficient. It’s the opposite. Sequential experiments with tight feedback loops converge faster than upfront plans because each experiment is informed by everything that came before it.

Document Everything, Edit Nothing. Failure is Progress.

The log is immutable. Record the goal, the research, each experiment’s design, its result, and what you learned. Never go back and rewrite history. The document isn’t a deliverable—it’s a lab notebook. Wrong turns are as valuable as right ones because they narrow the search space for future work.

This is the philosophical core of RDD. In traditional development, failure means you’re behind schedule. In RDD, failure means you’ve eliminated a possibility and learned something concrete. By documenting what doesn’t work, you figure out what does. Failure is not a setback. It’s an accomplishment.

AI as Collaborator at Every Stage

The AI agent isn’t just a code generator. It participates in formulating the goal, conducting research, designing experiments, implementing them, analyzing results, and writing the documentation that ties it all together.

This is what makes RDD practical. The AI compresses the cycle time of each experiment dramatically. What might take a day of manual coding, testing, and analysis can happen in minutes. And because the AI reads and writes documentation as naturally as it reads and writes code, the lab notebook stays current without extra effort.

RDD in Practice

The pane borders example is typical of every problem in TermSurf. Not because pane borders are especially hard, but because at this scale of complexity, every feature requires experimentation.

The goal: add configurable colored borders around split panes in the terminal emulator, with visual distinction between focused and unfocused panes. I had no idea how the rendering engine handled content insets relative to decorative borders. Five experiments later, I did.

Experiment 1: Implement config fields, border rendering, and content inset. Result: Partial. Borders drew correctly, but content overlapped them. Learned: border drawing infrastructure is solid. Content inset approach needs rethinking.

Experiment 2: Fix content overlap by reducing pixel_width. Result: Fail. Compiled cleanly, no visual change. Learned: pixel_width alone does not control content clipping.

Experiment 3: Reduce cell count upstream in the resize logic. Result: Fail. Cell count changed but content still overlapped borders. Learned: this is the wrong abstraction level—it’s a global window property, not a per-pane property.

Experiment 4: Reframe the problem entirely. Apply per-pane pixel insets on all four edges, like a CSS container model. Result: Pass. Content insets work correctly. Borders draw without covering content.

Experiment 5: Hide borders when only one pane is visible. Result: Pass. Borders appear only for multi-pane layouts.

Three “failures” led directly to the breakthrough in Experiment 4. Each one eliminated an approach that looked plausible on paper. Experiment 2 proved that downstream rendering parameters weren’t the right lever. Experiment 3 proved that global-level adjustments couldn’t solve a per-pane problem. These weren’t wasted effort. They were the path to the solution.

Without the failures, Experiment 4’s insight—reframe as CSS container model, apply insets unconditionally on all sides—would never have emerged. The failures were necessary. This is what every feature in TermSurf looks like. Over 725 issues and counting, each one a sequence of experiments, each one producing knowledge whether the experiments pass or fail.

And because every experiment was documented in a single linear file, I can go back months from now, read the issue, and immediately understand why the code looks the way it does. So can the AI.

The Progression

There’s a clear evolutionary arc in software methodologies, and each generation assumes less upfront knowledge:

Test-Driven Development: You know the tests but not the code. Write the tests first, then implement until they pass.

Spec-Driven Development: You know the spec but not the implementation. Write the spec first, then generate or implement from it.

Research Driven Development: You know the goal but not much else. Write the goal first, then run experiments until you discover the solution.

TDD handles known requirements with unknown implementations. SDD handles known architectures with unknown details. RDD handles known objectives with unknown everything else. As Kinde put it, SDD is “the next step beyond TDD.” RDD is the next step beyond SDD.

These methodologies aren’t mutually exclusive. Use SDD when you know what to build: APIs, CRUD features, well-documented integrations. Use DDD when you know what the feature looks like to users. Use RDD when you’re doing something genuinely new—and RDD often converges into SDD once your experiments reveal the right approach. The research phase discovers what to build. The spec phase builds it cleanly.

Start Researching

If you’re building anything truly innovative, stop planning and start researching.

Write down your goal. Do the research. Design an experiment. Run it. Record what happened. Learn from it. Design the next experiment. Repeat until the goal is accomplished.

Embrace failure as progress. Document everything. Let the AI help at every stage. The documentation compounds in value over time—for you, for your team, and for every AI agent that touches the codebase after you.

The methodology works whether you know how to solve the problem or not. That’s the point. You don’t need to know the answer before you start. You just need to know the question.

And if the experiment fails? Good. Now you know something you didn’t before.

Research Driven Development

Write Before You Code

The Spec Driven Era

The Gap

Why Documentation, Not Code