Design-First Collaboration

AI coding assistants default to generating implementation

immediately — embedding design decisions invisibly in the output. I propose a

structured conversation pattern that mirrors whiteboarding with a human pair:

progressive levels of design alignment before any code, reducing cognitive

load and catching misunderstandings at the cheapest possible

moment.

03 March 2026

Rahul is a Principal Engineer at Thoughtworks, based in Gurgaon, India.

He is passionate about the craft of building maintainable software through DDD and Clean Architecture,

and explores how AI can help teams achieve engineering excellence.

This article is part of a series:

Patterns for Reducing Friction in AI-Assisted Development

When I pair program with a colleague on something complex, we don't start

at the keyboard. We go to the whiteboard. We sketch components, debate data

flow, argue about boundaries. We align on what the system needs to do before

discussing how to build it. Only after this alignment — sometimes quick,

sometimes extended — do we sit down and write code. The whiteboarding is not

overhead. It is where the real thinking happens, and it is what makes the

subsequent code right. The principle is simple: whiteboard before

keyboard.

With AI coding assistants, this principle vanishes entirely. The speed is

seductive: describe a feature, receive hundreds of lines of implementation

in seconds. The AI may understand the requirement perfectly well — an email

notification service with retry logic, say. But understanding what to

build and collaborating on how to build it are two different activities,

and AI collapses them into one. It does not pause to discuss which

components to create, whether to use existing infrastructure or introduce

new abstractions, what the interfaces should look like. It jumps from

requirement to implementation, making every technical design decision

silently along the way.

I have come to think of this as the “Implementation Trap.” The AI

produces tangible output so quickly that the natural checkpoint between

thinking about design and writing code disappears. The result is not

just misaligned code. It is the cognitive burden of untangling design

decisions I was never consulted on, bundled inside an implementation I now

have to review line by line.

The Implementation Trap is not simply that AI skips design. In a meaningful

sense, the AI does make design decisions when it generates code — about

scope, component boundaries, data flow, interfaces, error handling. But those

decisions arrive silently, embedded in the implementation. There is no moment

where I can say “wait, we already have a queue system” or “that interface

won't work with our existing services.” The first time I see the AI's design

thinking is when I am reading code, which is the most expensive and

cognitively demanding place to discover a disagreement.

This, I believe, is why reviewing AI-generated code feels so much more

exhausting than reviewing a colleague's work. When a human pair submits code

after a whiteboarding session, I am reviewing implementation against a design

I already understand and agreed to. When AI generates code from a single

prompt, I am simultaneously evaluating scope (did it build what I needed?),

architecture (are the component boundaries right?), integration (does it fit

our existing infrastructure?), contracts (are the interfaces correct?), and

code quality (is the implementation clean?) — all at once, all entangled.

That is too many dimensions of judgment for a single pass. The brain is not

built for it. Things get missed — not because I am careless, but because I am

overloaded.

Software engineers have understood this principle since the 1980s. Barry

Boehm's Cost of Change Curve demonstrated that fixing a requirements

misunderstanding at the design stage costs a fraction of fixing it in

implementation — and orders of magnitude less than fixing it in production.

The same economics apply to AI collaboration: catching a scope mismatch in a

two-minute design conversation is fundamentally cheaper than discovering it

woven through 400 lines of generated code.

There is an additional cost that is easy to overlook. AI tends to add

features. A request for a notification service might return code with rate

limiting, analytics hooks, and a webhook system — none of which were

requested. This is not just scope creep; it is what I would call technical

debt injection. Every unrequested addition is code I did not ask for but must

review, tests I did not plan but must write, surface area I did not need but

must maintain. And because it arrives looking clean and reasonable, it is easy

to accept without questioning whether it belongs. The cognitive load of the

review process increases, because I am now evaluating code that solves

problems I never asked about.

The solution, I believe, is to reconstruct the whiteboarding conversation

that human pairs do naturally — making the AI's implicit design thinking

explicit and collaborative. Rather than asking for implementation directly, I

walk through progressive levels of design. Each level surfaces a category of

decisions that would otherwise be buried in generated code.

I use five levels, ordered from abstract to concrete:

The key constraint: no code until Level 5 is approved.

This single rule changes the dynamics of the collaboration. Each level

becomes a checkpoint — a place where I can agree, disagree, or redirect before

any code exists. Level 1 serves as a shared vocabulary check — confirming that

the AI and I are talking about the same feature, with the same scope, before

any design begins. The deeper value starts at Level 2, where the real

technical design conversation unfolds: component boundaries, then interaction

patterns, then contracts. At each level, I am thinking about one category of

decision, not all of them at once. This is cognitive load management — the

brain is never juggling everything at once. And because both human and AI

align at each step, a shared mental model builds incrementally, just as it

does at a whiteboard.

This is how effective human pairs work. At a whiteboard, a colleague does

not propose an entire implementation in one breath. They sketch a box, explain

what it does, wait for feedback, then sketch the next box. The shared mental

model builds incrementally. Design-First applies the same discipline to AI

collaboration: a sequential conversation where alignment builds step by step,

and where each step constrains the decision space for the next.

Without this structure, the AI is not really a pair — it is a colleague who

went off to a separate room, made all the decisions alone, and returned with

finished code. Design-First makes the collaboration real. Both human and AI

examine each layer of the design together, and by the time implementation

begins, there is genuine shared understanding of what is being built and

why.

Recently, while building a notification service, this approach proved its

worth almost immediately.

I came to the conversation with clear context: the feature scope from our

story — email-only delivery for v1, with retry logic and basic status tracking

— and the project's tech stack, including BullMQ for job processing. At the

Capabilities level, I shared this scope upfront and asked the AI to confirm

its understanding before any design began. A quick check, but an important

one: it ensured we were speaking the same language about what was being built,

and what was explicitly out of scope.

The real design conversation began at the Components level. The AI proposed

five components, including a dedicated RetryQueue abstraction wrapping

BullMQ. Architecturally clean on paper — but unnecessary. BullMQ's built-in

retry mechanism with exponential backoff handles this natively, and an

additional abstraction would add indirection without value. I pushed back: use

BullMQ directly, no wrapper. The AI simplified the component structure. This

was not a context gap — the AI knew about BullMQ from the project context I

had shared. It was a genuine design disagreement about whether to abstract

over existing infrastructure, resolved in seconds because no code existed

yet.

At the Interactions level, this simpler component structure had a cascading

benefit. Without the unnecessary abstraction layer, the data flow was more

direct: the handler queues a BullMQ job, the worker processes it, retries are

handled natively. The interaction design was cleaner and better integrated

than anything that included the wrapper would have been.

The Contracts level is where something particularly valuable happens. Once

I had agreed-upon function signatures, types, and interfaces —

NotificationPayload, NotificationResult, EmailProvider,

DeliveryTracker — I had everything I needed to ask the AI to write tests

before any implementation code. The design conversation had, almost as a

side effect, created the preconditions for test-driven development. Approve

contracts, generate tests, then implement against those tests. For teams that

practice TDD, Design-First provides a natural on-ramp. For teams that do not,

the contracts still serve as a safety net — a concrete, reviewable

specification that makes misunderstandings visible before they become

bugs.

By the time implementation began at Level 5, the code was far more aligned

than anything a single prompt would have produced. Scope was confirmed.

Architecture fit the existing infrastructure — not because the AI was told

about it mid-conversation, but because we debated how to use it at the design

level. Contracts were defined and tested. The AI was not guessing — it was

implementing an agreed design.

This approach requires a form of discipline that cuts against how AI

assistants are typically used. AI is trained to be helpful, and “helpful”

usually means producing tangible output quickly. Left unchecked, it will jump

to implementation. It will combine levels, offering component diagrams with

code already written, or proposing contracts with implementations attached.

The discipline of saying “stop — we are still at Level 2, show me only the

component structure” is really about protecting working memory from premature

detail. It keeps the conversation at the right level of abstraction for the

decision being made.

Not every task needs all five levels. The framework scales to the

complexity of the work:

The framework is a tool for managing complexity, not a ritual to be applied

uniformly.

Design-First also compounds with what I have called Knowledge Priming — sharing curated

project context (tech stack, conventions, architecture decisions) with the

AI before beginning work. When the AI already understands the project's

architecture, naming conventions, and version constraints, each design

level is already anchored to the codebase. The Capabilities discussion

reflects real requirements. The Components level uses existing naming

patterns. The Contracts match established type conventions. Priming

provides the vocabulary; Design-First provides the structure. Together,

they create a conversation where alignment happens naturally rather than

through constant correction.

In practice, a single opening prompt is often enough to set the entire

pattern in motion:

This prompt sets the sequential discipline — the AI will present each

level, wait for feedback, and hold off on code until given explicit

approval. But the real value is in the brackets: the project-specific

constraints each developer fills in. “Email only for v1.” “Use existing

BullMQ infrastructure.” “Functional patterns, no classes.” “All interfaces

must align with our established type conventions.” These constraints carry

forward through every level, shaping each design decision without needing

to be repeated. At each checkpoint, the conversation deepens — I add

context, correct assumptions, steer the design based on what I know about

the codebase. The prompt starts the motion; the collaboration at each

level shapes it.

Over time, teams that find this pattern valuable often encode it as a

reusable, shareable instruction — enriching it with architectural

guardrails, quality expectations, and team-specific defaults that apply

automatically to every design conversation.

This approach is not without costs. Design conversations take longer than

immediately requesting code. For trivial tasks, the overhead is not justified.

There is also a learning curve — developers accustomed to typing a prompt and

receiving code may find the sequential structure unfamiliar at first. The

discipline to sustain it requires conscious effort, especially when deadlines

press and the temptation to “just ask for the code” is strong.

I believe the investment pays off primarily for non-trivial work — the kind

where a misunderstanding costs not minutes but hours, where architectural

alignment matters, where the code will be maintained and extended long after the

initial conversation ends.

The value of the whiteboard was never the diagram. It was the alignment —

the shared understanding that develops when two people think through a problem

together, one layer at a time. Design-First applies this principle to AI

collaboration.

When both human and AI operate from the same scope, the same component

architecture, the same interaction model, the same contracts, the cognitive

load shifts from exhausted vigilance to collaborative refinement. The

developer's role changes from reverse-engineering invisible design decisions

to steering them explicitly. This, I believe, is what it means to actually

pair with an AI — not just receive its output, but participate in its

thinking.

The simplest version of this entire approach is a single constraint: no

code until the design is agreed. Everything else follows from there.

03 March 2026: published

Alex Chen