The Greenbox Story · Drawing the Lines

Architecture Decision Records: Why We Did It That Way

June 25, 2026 · 11 min read

Ravi starts at Greenbox on a Monday. He’s developer number eleven. Eight years of experience, mostly in backend systems at a Melbourne consultancy. His wife Meera is six months pregnant with their first child. He left the consultancy for Greenbox because the salary was better, the equity was real, and the commute from North Perth was twenty minutes instead of an hour. He has a private timeline: prove himself indispensable before the baby arrives, so that when he takes paternity leave, nobody questions whether he’s pulling his weight.

By Wednesday, he’s reading the billing code. He notices something odd. The payment system charges subscribers on delivery day, not signup day. That’s unusual.

Ravi asks in Slack: “Why do we charge on delivery day? Seems like it’d be simpler to charge on a fixed billing cycle.”

Tom: “That’s just how it works.”

Priya: “I think there was a reason but I don’t remember.”

Maya: “Lee suggested it during the Event Storming session. Something about variable box contents?”

Nobody can give a definitive answer. The decision was made over a year ago, by a team that was a third of the current size.

The LLM explains it wrong

Ravi pastes the billing code into his LLMA neural network trained to predict the next token in a sequence, large enough that it generalises to tasks it wasn’t explicitly trained for. : “Explain why this payment system charges on delivery day instead of signup day.”

The LLM produces a confident explanation: the system charges on delivery day to align with the subscription renewal cycle, reducing disputes over undelivered items.

It sounds plausible. Ravi nods. He’s three days in. Questioning it would mean going back to Slack with a follow-up that might make him look like he hasn’t done his homework. Meera asked last night how it was going and he said “really well.”

The LLM’s explanation is also wrong.

The real reason is that box contents vary week to week based on farm availability. The per-box cost can differ depending on substitutions. Charging at signup means charging for a box whose contents aren’t yet known. The Event Storming session revealed that the billing point should be after supply matching (Tuesday evening), when actual contents and costs are known.

That reasoning lives only in the fading memories of the people who were there.

Two weeks later, Ravi picks up a rework of how pauses interact with billing, the coupling has been creaking since the bounded-context refactor. Because he believes billing aligns with a fixed cycle, the LLM’s explanation, he reworks the pause to skip the renewal charge. It accidentally breaks the variable pricing logic. The bug doesn’t surface for three weeks.

The fix is straightforward. But the root cause is serious: a reasonable decision based on a plausible but incorrect understanding of why the system works the way it does.

Institutional memory

Charlotte hears about the bug.

“This is the most common scaling problem I see,” she tells the team at the Friday retro. “Not code quality. Not architecture. Memory.”

She draws three boxes on the whiteboard:

A year ago
Maya, Tom, Priya, Jas, Sam
5/5 carry context = 100%
Today
Same 5 + Kai, Anika, Ravi, +6
5/14 carry context = 36%
A year from now
Same 5 in a team of 25
5/25 carry context = 20%

“And it’s worse than the numbers suggest. Even the original five don’t remember everything perfectly. Meanwhile, every new person asks the LLM. The LLM guesses. Sometimes it’s right. Sometimes it’s wrong. And you don’t find out which until something breaks.”

Architecture Decision Records

Charlotte introduces ADRs. Architecture Decision Records. The concept comes from Michael Nygard. A short document capturing one decision: what was decided, why, what was considered, and what follows.

Title. Date. Status (accepted, superseded, deprecated). Context (what was going on). Decision (what was chosen). Consequences (positive and negative).

One decision per record. A few paragraphs. Five minutes to write. Five minutes to read.

“The Context section is the most important part. It tells a future reader what the world looked like when you decided. Constraints change. A decision that was right six months ago might be wrong today, but you can only evaluate that if you know the original constraints.”

The first ADR


ADR-001: Charge on delivery day, not signup day

Date: November 2023

Status: Accepted

Context:

Greenbox box contents vary week to week based on farm availability. Supply matching happens on Tuesday, and actual box contents, including substitutions, are finalised Tuesday evening. The per-box cost can vary depending on what’s included. Charging at signup means charging for a box whose contents aren’t yet known.

Lee raised this during the Event Storming session. Three options were considered.

Alternatives considered:

  1. Charge on signup, fixed price. Simplest. But absorbs cost variance, and subscribers pay for a box they haven’t received.
  2. Charge on signup, adjust on delivery. Complex. Confusing multiple charges.
  3. Charge on delivery day. Single charge, accurate amount, aligned with value delivery.

Decision: Charge on delivery day (Thursday), after box contents are finalised (Tuesday evening).

Consequences:

  • Positive: Accurate billing. No adjustments or refunds.
  • Positive: Payment aligns with value delivery, reducing early cancellations.
  • Negative: Revenue less predictable week to week.
  • Negative: Pause and cancellation logic must account for the billing-delivery coupling.

“Ravi,” Charlotte says. “Read that. Does it change how you’d have implemented the pause?”

Ravi reads it. The bit about variable contents. The bit about billing coupled to delivery.

“Yes. Completely.”

The second worked example

Charlotte asks the team to do one more before she shows them the shortcut. Kai picks the Terraform decision from three weeks ago. It’s the most recent decision anyone can remember the conversation around, which makes it useful for calibration.


ADR-007: Use Terraform for AWS infrastructure

Date: February 2025

Status: Accepted

Context:

When Kai joined, the production EC2 instance, RDS database, and S3 buckets had been hand-clicked in the AWS console eighteen months earlier, with no record outside Tom’s memory. Two near-misses had already happened: a security group change that broke staging access for half a day, and a configuration drift that doubled the time to debug an SQS issue because the live security group rules didn’t match the diagram anyone remembered. With the team growing past ten people and Melbourne launching, the cost of “Tom knows” was becoming a cost everyone paid.

Alternatives considered:

  1. AWS CloudFormation. Native to AWS, no extra tooling, state lives inside AWS. Modules are clunkier than Terraform’s, and the team had no prior experience with CloudFormation at this scale.
  2. AWS CDK. Higher-level abstractions, real programming language. Kai had reservations about onboarding new joiners into a CDK codebase before the team had any IaC discipline at all: too many ways to be clever before establishing the boring baseline. Agreed to revisit when the volume of infra justifies the abstraction.
  3. Terraform with HCL. Industry standard at this scale, cloud-portable in principle, large body of patterns and examples. State file in S3 with locking via DynamoDB.

Decision: Terraform with HCL, single workspace, state in S3 + DynamoDB lock. Single environment for now (production). The pipeline runs terraform plan on every PR. Applying is still manual (Tom holds the only credentials) until a staging environment lands and the team can resolve who else can apply.

Consequences:

  • Positive: Infrastructure is now written down. New joiners can read the Terraform to understand what production looks like.
  • Positive: PR review on infra changes. Drift becomes visible in the plan output.
  • Negative: Single workspace means there is only one environment. When staging lands this will need restructuring into workspaces or modules.
  • Negative: Manual apply is a bottleneck. A deploy role and a way to run apply from CI is required before the second environment.
  • Open: Kai’s initial file covers about 60% of the live console reality. The remaining IAM roles, Route 53 records, and drifted security groups will get imported as the team touches them.

Ravi looks up. “We have an IAM role drift problem?”

Tom: “We have an everything drift problem.”

LLMs help write ADRs

The team has months of decisions behind them. Charlotte has a shortcut.

Tom pulls up the original PR that implemented charge-on-delivery. The description references the Event Storming session. Comments link to a Slack thread where Lee explained the reasoning. He feeds these to the LLM:

Draft an ADR for the decision to charge on delivery day instead of signup day. Here’s the PR description, the Slack thread, and the commit messages.

The LLM produces a draft. It’s 80% there, misses some nuance about premium substitution costs, invents an alternative that wasn’t considered. Tom and Maya correct it in ten minutes.

Git History
PRs, commits
Slack Threads
Discussions
LLM
Drafts ADR
Team Reviews
& Corrects
Published ADR

The team writes twelve ADRs over the following week. About twenty minutes each, including review.

That same week, Ravi deploys the pause-billing rework and forgets to set the config flag, the half-finished new flow goes live for all 2,800 subscribers at once. Three of them hit it before anyone notices. Tom: “We need a proper feature flag system.” They add a simple environment variable for visibility. Charlotte insists they record it: ADR-013, “Features deploy dark by default.” It’s the first new ADR after the retrospective batch, the first one that captures a decision as it happens rather than months later.

Tom’s ADR that wasn’t good enough

Tom writes one the following week.


ADR-014: Use Stripe for payments

Date: September 2023 Status: Accepted

Context: We needed a payment provider.

Decision: Use Stripe.

Consequences: It works.


Charlotte pushes back. “This tells a future reader nothing they couldn’t figure out from the imports. Why Stripe? What else did you consider?”

“It was obvious. Stripe is the standard.”

“Was it? Did you consider Square? Direct bank transfers? Did anyone raise concerns about lock-in?”

Tom pauses. “Actually, Maya wanted a local provider because the fees were lower. And Priya pointed out Stripe’s webhook system was more reliable for delivery-day billing.”

“That’s the ADR. Not ‘Stripe because it’s popular.’ Stripe because webhook reliability for delivery-day billing outweighed the fee advantage. That’s a real trade-off.”

Tom rewrites it with the actual reasoning: three alternatives considered, webhook reliability as the deciding factor, fee trade-off acknowledged, vendor lock-in noted as a consequence. Now someone reading it in a year knows exactly which assumptions to revisit if they want to switch.

ADRs as LLM context

When Kai asks the LLM to implement a new billing feature, he includes the relevant ADRs in the PromptThe input you hand to an LLM – system instructions, user message, examples, retrieved documents, tool descriptions, the lot. . The LLM produces code that respects the constraints because the ADRs explain why the system works the way it does, not just how.

ADRs aren’t just for humans. They’re context that makes LLM output more accurate. Bounded contexts give the LLM structural boundaries. ADRs give it reasoning. Both make the generated code more likely to be correct.

When to write ADRs

Charlotte’s rule: write an ADR whenever you make a decision that a new team member would question six months from now. If the answer requires a story, a constraint, a trade-off, a workshop insight, you need one. If the answer is “because it was obvious,” you probably don’t.

ADRs aren’t permanent, decisions can be superseded. They’re not consensus documents, dissent goes in too. And they’re not a substitute for conversation. They’re the output of conversations, preserved so the conversation doesn’t have to happen again.

Code is how. ADRs are why. When LLMs are generating the code, you need the why more than ever, because the LLM will always produce confident code. The question is whether it’s confidently right or confidently wrong.

The team now has bounded contexts, decision tables, and ADRs. The reasoning is written down. But a different kind of memory problem is already sitting in the support inbox: a subscriber who paused for a week is certain she was charged anyway, and the billing database can only say what her balance is, not how it got there. ADRs taught the team to keep the why. Next, the ledger learns to keep the what: event sourcing the ledger in Go.

The next chapter, Domain-Driven Design: Event Sourcing the Ledger in Go, publishes around 27 June.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.