Architecture Decision Records: Why We Did It That Way

GreenBox delivers weekly produce boxes from local farms to 1,000+ subscribers. The team of fifteen has drawn bounded contexts, built decision tables, and started shipping faster – but new developers keep asking “why does it work this way?” and nobody can give a straight answer.

Ravi starts at GreenBox on a Monday. He’s developer number eleven – the team is growing fast. He’s got eight years of experience, mostly in backend systems at a mid-size consultancy in Melbourne, and he’s keen to get productive quickly. Keen is an understatement. Ravi’s wife Meera is six months pregnant with their first child. He left the consultancy for GreenBox because the salary was better, the equity was real, and the commute from North Perth was twenty minutes instead of an hour on the train. He has a private, unspoken timeline: prove himself indispensable before the baby arrives, so that when he takes paternity leave, nobody questions whether he’s pulling his weight. He reads the onboarding docs over the weekend before his start date.

By Wednesday, he’s reading through the billing code. He notices something odd. The payment system charges customers on delivery day, not on signup day. That’s unusual. Most subscription businesses charge when you sign up, then again on a fixed cycle. GreenBox charges when the box is actually dispatched.

Ravi asks in the team Slack: “Why do we charge on delivery day? Seems like it’d be simpler to charge on a fixed billing cycle.”

Tom replies: “That’s just how it works.”

Priya: “I think there was a reason but I don’t remember.”

Kai: “I assumed it was a Stripe limitation.”

Maya: “Lee suggested it during the Event Storming session. Something about variable box contents?”

Nobody can give a definitive answer. The decision was made months ago, during the early discovery workshops, by a team that was half the current size. The people who were in the room have a vague memory. The people who weren’t – Kai, Ravi, Anika, and the six other new joiners – have no memory at all.

The LLM explains it wrong

Ravi does what any modern developer would do. He asks the LLM.

He pastes the billing code into his LLM and prompts: “Explain why this payment system charges on delivery day instead of signup day.”

The LLM produces a confident, well-structured explanation:

The system charges on delivery day to align with the subscription renewal cycle. This ensures customers are billed for each delivery period as it begins, reducing the likelihood of disputes over charges for undelivered items. This is a common pattern in subscription-based delivery services.

It sounds plausible. It reads well. Ravi nods at his screen. He’s new. The explanation matches his mental model of subscription billing. Questioning it would mean going back to Slack and asking a follow-up question that might make him look like he hasn’t done his homework. He’s three days in. Meera asked him last night how it was going and he said “really well.” He wants that to be true.

The LLM’s explanation is also wrong.

The real reason – which nobody currently at GreenBox can articulate clearly – is that box contents vary week to week based on farm availability. The price might differ slightly between weeks if premium substitutions are required. Charging at signup would mean charging for a box whose contents haven’t been determined yet. The Event Storming session revealed that the time between supply matching (Tuesday) and delivery (Thursday) was the natural billing point, because by Tuesday evening, the actual box contents and their cost are known.

That reasoning emerged from a specific workshop, on a specific day, with specific people in the room. It was a good decision. But the reasoning lives only in the fading memories of the people who were there, and in a photograph of a whiteboard that’s buried somewhere in Maya’s camera roll.

The LLM’s explanation isn’t absurd. It’s just a guess dressed up as an explanation. It reverse-engineers intent from code, which is exactly what code can’t carry. Code tells you what. It never tells you why.

The danger

Ravi reads the LLM’s explanation and accepts it. He’s new. The explanation sounds reasonable. He moves on.

Two weeks later, Ravi is asked to implement a subscription pause feature. When a customer pauses, no boxes are delivered. The billing should also pause. But because Ravi believes the system charges on delivery day to “align with the renewal cycle” – the LLM’s explanation – he implements the pause by skipping the renewal charge. That works fine for paused subscribers.

It also accidentally breaks the variable pricing logic. In the real system, charging on delivery day is tied to knowing the box contents first. Ravi’s pause implementation assumes a fixed charge cycle, which is the wrong mental model. The bug doesn’t surface for three weeks, until a customer who paused and unpaused gets charged for a box at last month’s price instead of this week’s price.

The fix is straightforward. The bug itself is minor. But the root cause is serious: Ravi made a reasonable decision based on a plausible but incorrect understanding of why the system works the way it does. If he’d known the real reason – variable box contents require variable pricing, which requires knowing the contents before charging – he’d have implemented the pause differently from the start.

Charlotte hears about the bug and recognises the pattern immediately.

Institutional memory

“This is the most common scaling problem I see,” Charlotte tells the team at the Friday retro. “Not code quality. Not architecture. Memory.”

She pauses. The room is attentive. Charlotte doesn’t often pause.

“I coached a meal kit company before GreenBox. Thirty people. Growing fast. They had the same problem – the team that made the early decisions had mostly moved on, and the people who replaced them didn’t know why anything worked the way it did. The pricing model was a mystery. The supplier agreements had logic nobody could explain. When new developers joined, they read the code, guessed at the intent, and built on top of their guesses.” She looks at Ravi. He looks at the table. “That company doesn’t exist any more. There were many reasons it failed, but one of them was this: nobody could change the system confidently because nobody understood why it was built that way.”

Charlotte straightens. “When the team was five people who were all in the room when every decision was made, institutional memory was free. It lived in your heads. You could lean across the desk and ask.”

She looks around the room. Fifteen people. Five of them were there for the original discovery sessions. Ten of them weren’t.

“Now you’re fifteen, and half of you weren’t here when the important decisions were made. Six months from now, you’ll be twenty-five, and three-quarters of them won’t have been here. The proportion of the team that carries the original context keeps shrinking.”

She draws it on the whiteboard:

6 months ago

Maya, Tom, Priya, Jas, Sam

5/5 carry context = 100%

Today

Maya, Tom, Priya, Jas, Sam + Kai, Anika, Ravi, +7 others

5/15 carry context = 33%

6 months from now

Same 5 originals in a team of 25

5/25 carry context = 20%

“And it’s worse than the numbers suggest,” Charlotte continues. “Because even the original five don’t remember everything perfectly. Memory degrades. Details blur. Tom remembers there was a reason for charge-on-delivery but not what it was. Priya thinks it might have been a Stripe limitation. Maya remembers Lee was involved but not the specifics.”

“Meanwhile, every new person who joins does what Ravi did. They ask the LLM. The LLM guesses. Sometimes the guess is right. Sometimes it’s wrong. And you don’t find out which until something breaks.”

Architecture Decision Records

Charlotte introduces Architecture Decision Records – ADRs. The concept comes from Michael Nygard, and the format is deliberately simple. An ADR is a short document that captures one decision. Not a design document. Not a specification. Just: what was decided, why, what was considered, and what the consequences are.

The format:

Title: A short name for the decision. Date: When it was made. Status: Accepted, superseded, or deprecated. Context: What was going on when the decision was made. The constraints, the pressures, the information available. Decision: What was decided. Consequences: What follows from the decision – both positive and negative.

That’s it. One decision per record. A few paragraphs each. Five minutes to write. Five minutes to read.

Charlotte is specific about what makes each section useful.

“The Context section is the most important part. It’s what tells a future reader not just what you decided, but what the world looked like when you decided it. Constraints change. Pressures shift. A decision that was right six months ago might be wrong today – but you can only evaluate that if you know what the original constraints were.”

The first ADR

Charlotte and the team write three ADRs for the three most-questioned decisions in the codebase. The first one is the charge-on-delivery decision.

ADR-001: Charge on delivery day, not signup day

Date: March 2026

Status: Accepted

Context:

GreenBox box contents vary week to week based on farm availability. The Event Storming session (March 2026) revealed that supply matching happens on Tuesday, and the actual box contents – including any substitutions – are finalised on Tuesday evening. The per-box cost can vary slightly depending on what’s included (e.g., premium substitutions like asparagus cost more than standard items like carrots).

If we charge at signup, we’re charging for a box whose contents aren’t yet known. This creates two problems: (1) we might undercharge if the box includes premium items, absorbing the cost, or (2) we might overcharge and need to issue refunds, which creates customer friction and Stripe fees.

Lee raised this during the Event Storming session. The team considered three options.

Alternatives considered:

Charge on signup, fixed price per box size. Simplest to implement. But absorbs the cost variance, which at scale could be significant. Also means customers pay for a box they haven’t received, which increases cancellations during the early trust-building period.
Charge on signup, adjust on delivery. Charge a base price, then issue a supplementary charge or credit based on actual contents. Complex to implement. Confusing for customers who see multiple charges.
Charge on delivery day. Charge once, on the day the box is dispatched, for the actual contents. Single charge, accurate amount, aligned with the moment the customer receives value.

Decision:

Charge on delivery day (Thursday), after box contents are finalised (Tuesday evening). Single charge per box, reflecting actual contents.

Consequences:

Positive: Accurate billing. No adjustments or refunds. Customer is charged for what they actually receive.
Positive: Aligns payment with value delivery, which reduces early cancellations.
Negative: Revenue is less predictable week to week. Financial forecasting needs to account for content variability.
Negative: If a delivery fails, the charge has already been processed. Need a clear refund process for failed deliveries.
Negative: Pause and cancellation logic must account for the billing-delivery coupling. You can’t simply skip a billing cycle – you need to ensure no box is packed and no charge is generated.

Charlotte prints the ADR and pins it next to the billing code.

“Ravi,” she says. “Read that. Does it change how you’d have implemented the pause feature?”

Ravi reads it. The bit about variable box contents. The bit about billing being coupled to delivery. The consequence about pause logic needing to account for the billing-delivery coupling.

“Yes,” he says. “Completely. I would have known the charge is tied to the actual box contents, not to a fixed cycle. The pause implementation would have been different from the start.”

LLMs help write ADRs

The team has months of decisions behind them and only a handful of ADRs. Writing them all from scratch would take weeks. Charlotte has a shortcut.

“You don’t have to remember everything perfectly. The git history, the PR descriptions, and the Slack conversations hold the breadcrumbs. And LLMs are good at synthesis.”

She shows the team how to draft ADRs with LLM assistance. Tom pulls up the original PR that implemented charge-on-delivery. The PR description mentions the Event Storming session. The comments reference a Slack thread where Lee explained the reasoning. There’s a commit message that says “switch billing to delivery-day charging per Event Storming findings.”

Tom feeds these artefacts to the LLM:

Draft an Architecture Decision Record for the decision to charge customers on delivery day instead of signup day. Here’s the PR description, the Slack thread, and the relevant commit messages. Use this ADR format: Title, Date, Status, Context, Alternatives Considered, Decision, Consequences.

The LLM produces a draft. It’s not perfect – it misses some nuance about the premium substitution cost variance, and it invents an alternative that wasn’t actually considered. But it’s 80% there. Tom and Maya spend ten minutes correcting and completing it.

“The LLM does the first-pass synthesis,” Charlotte says. “The humans do the review and correction. This is the same pattern as everywhere else – LLMs are good at the mechanical work. Humans are good at the judgement.”

The team writes twelve ADRs over the following week. The LLM drafts each one from git history and Slack threads. The original team members review and correct. The process takes about twenty minutes per ADR, including review.

Git History
PRs, commits

Slack Threads
Discussions

→

LLM
Drafts ADR

→

Team Reviews
& Corrects

→

Published ADR

That same week, Ravi deploys the pause feature update to production. It’s supposed to be behind a config flag – only visible to internal testers. But he forgets to set the flag, and all 1,000 subscribers can suddenly see “Pause Subscription” on their account page. Three subscribers pause immediately. Tom: “We need a proper feature flag system, not a config file I have to remember to edit.” They add a simple feature flag – an environment variable that controls visibility. It’s crude, but it prevents the next accidental launch. Charlotte insists they record it: ADR-003, “Features deploy dark by default.” The ADR captures the reasoning: Ravi’s mistake wasn’t carelessness, it was a missing safeguard. The flag is the safeguard.

The payoff

Three weeks after the ADRs go live, a new developer named Jess joins the team. She’s working on the Melbourne billing integration. Her first question: “Why does the payment system charge on delivery day?”

Kai points her to ADR-001. Jess reads it. Five minutes. She understands the decision, the reasoning, the constraints, and the consequences. She doesn’t ask the LLM. She doesn’t ask Tom. She doesn’t make an assumption based on a plausible guess.

When she implements the Melbourne billing variation – which needs to account for different delivery days in different cities – she knows to keep the billing coupled to the delivery day, not to a fixed cycle. The implementation is correct the first time.

“That’s the payoff,” Charlotte says. “Not one person saved from one bug. It’s every future person saved from every future misunderstanding. ADRs are compound interest. Each one you write pays dividends every time someone reads it.”

Tom’s ADR that wasn’t good enough

Not every ADR is useful. Tom writes one the following week.

ADR-013: Use Stripe for payments

Date: February 2026

Status: Accepted

Context: We needed a payment provider.

Decision: Use Stripe.

Consequences: It works.

Charlotte reads it and pushes back.

“Tom, this tells a future reader nothing they couldn’t figure out by looking at the imports. Why Stripe? What else did you consider? What were the constraints? What trade-offs did you make?”

Tom shrugs. “It was obvious. Stripe is the standard.”

“Was it obvious? Did you consider Square? Did you consider direct bank transfers for the Australian market? Did you consider that Stripe’s Australian fees are higher than some local providers? Did anyone raise concerns about lock-in?”

Tom pauses. “Actually, Maya wanted to use a local provider because the fees were lower. And Priya pointed out that Stripe’s webhook system was more reliable for delivery-day billing. We picked Stripe because the webhook reliability mattered more than the fee difference.”

“That’s the ADR,” Charlotte says. “Not ‘we picked Stripe because it’s popular.’ You picked Stripe because webhook reliability for delivery-day billing outweighed the fee advantage of a local provider. That’s a real trade-off. Write that down.”

Tom rewrites the ADR:

ADR-013: Use Stripe for payment processing

Date: February 2026

Status: Accepted

Context:

GreenBox needs to process recurring payments for weekly subscriptions. The billing happens on delivery day (see ADR-001), which means the payment system needs reliable webhooks to confirm charges before boxes are dispatched. Payment failures need to be detected and handled before the fulfilment process begins.

Australia has several payment providers with lower transaction fees than Stripe (e.g., Pin Payments at 1.75% + 30c vs Stripe at 1.7% + 30c for domestic cards, but Stripe charges more for international cards).

Alternatives considered:

Pin Payments. Lower fees for international cards. Australian company. But webhook reliability was reported as inconsistent by other developers in the Ruby AU community, and the API documentation was less comprehensive.
Direct bank transfers (BECS Direct Debit). Lowest fees. But settlement time is 3-5 business days, which doesn’t work with charge-on-delivery-day billing. The charge wouldn’t settle before the box was dispatched.
Stripe. Marginally higher fees. But webhooks are reliable, the API is well-documented, and the LLM has extensive training data on Stripe integrations, meaning generated code is more likely to be correct.

Decision: Use Stripe.

Consequences:

Positive: Webhook reliability supports the charge-on-delivery workflow.
Positive: Excellent API documentation and LLM familiarity reduce development time.
Positive: Well-established subscription billing APIs reduce custom code.
Negative: Higher fees on international cards (rare for GreenBox’s Australian customer base, but worth monitoring).
Negative: Vendor lock-in. Switching payment providers later would require rewriting billing integration code.

“Now someone reading this in a year knows why you picked Stripe,” Charlotte says. “And more importantly, they know what constraints to check if they want to switch. If webhook reliability improves at Pin Payments, or if BECS settlement times speed up, the ADR tells them exactly which assumptions to revisit.”

When to write ADRs

Charlotte gives the team a simple rule: write an ADR whenever you make a decision that a new team member would question six months from now.

Not every decision needs one. “Use tabs or spaces” doesn’t need an ADR. “Name the variable customerID” doesn’t need one. But “charge on delivery day instead of signup day” does. “Use Stripe instead of a cheaper local provider” does. “Structure the codebase as bounded contexts” does. “Adjust farm supply estimates by a reliability factor” does.

The test is simple. Charlotte tells them to imagine a developer joining in six months, reading the code, and asking “why?” If the answer is “because it was obvious,” you probably don’t need an ADR. If the answer requires a story – a constraint, a trade-off, a workshop insight – you do.

The team adds ADR writing to their definition of done for significant technical decisions. Not every story. Not every PR. But when a PR introduces a new pattern, changes a workflow, or picks one approach over another for non-obvious reasons – someone writes an ADR.

ADRs and LLM context

There’s a secondary benefit Charlotte didn’t anticipate. The ADRs become part of the context the team feeds to LLMs.

When Kai asks the LLM to implement a new billing feature, he includes the relevant ADRs in the prompt: “Here’s ADR-001 explaining why we charge on delivery day, and ADR-013 explaining why we use Stripe. Implement a feature that allows customers to split payment across two cards.”

The LLM produces code that respects both constraints. It doesn’t accidentally implement a fixed billing cycle because ADR-001 explains the variable pricing model. It doesn’t suggest switching to a different payment provider because ADR-013 explains the webhook reliability requirement.

“ADRs aren’t just for humans,” Charlotte says. “They’re context that makes LLM output more accurate. The more context you give the LLM about why things are the way they are, the less likely it is to generate code that contradicts your architecture.”

This is the same insight from the bounded contexts post: the quality of LLM output depends on the quality of the context you provide. Bounded contexts give the LLM structural boundaries. ADRs give it reasoning. Both make the generated code more likely to be correct.

What ADRs don’t do

ADRs are not design documents. They don’t describe how something works in detail – the code does that. They describe why it works that way.

ADRs are not permanent. A decision can be superseded. When GreenBox eventually outgrows Stripe or the billing model changes, ADR-013 gets a status change to “Superseded by ADR-047” and a new ADR explains the new decision and the new reasoning. The old ADR stays in the repository. It’s history, not garbage. Future readers can trace the evolution of thinking.

ADRs are not consensus documents. They record the decision and who made it, including dissent. If Priya argued against Stripe and was overruled, that goes in the ADR. Not to blame anyone – but because if Priya’s concerns prove valid later, the ADR tells the team exactly which arguments to revisit.

And ADRs are not a substitute for conversation. They’re the output of conversations, not a replacement for them. If a team stops talking and starts writing ADRs in isolation, they’ve missed the point. The conversation produces the understanding. The ADR preserves it.

What the team learned

The cost of lost institutional memory is invisible until it’s not. The team didn’t notice the problem when they were five people in one room. They started noticing when Ravi introduced a bug because he didn’t know why the billing system worked the way it did. They’ll notice it more as they grow to twenty, twenty-five, thirty people.

ADRs are cheap insurance. Five minutes to write. Five minutes to read. They prevent the class of bugs that comes from misunderstood intent – which, in a codebase increasingly generated by LLMs, is a growing risk. The LLM reads the code and guesses at intent. The ADR tells it the intent directly.

Charlotte’s summary: “Code is how. ADRs are why. You need both. And when LLMs are generating the code, you need the why more than ever – because the LLM will always produce confident code. The question is whether it’s confidently right or confidently wrong. ADRs tip the balance.”

The team now has bounded contexts, decision tables, and ADRs. The architecture is clearer, the domain logic is explicit, and the institutional memory is preserved. But a new question is emerging: should GreenBox build its own delivery tracking system, or buy one? Tom thinks the LLM can build it in a week. Charlotte thinks that’s the wrong question.

That’s Wardley Mapping (coming 25 August).