The situation
A product team is building an AI support assistant for a mid-sized SaaS company. The assistant handles first-line queries, billing, account access, feature questions, refund requests, and escalates to human agents when it can’t. Measured over six weeks of closed beta:
- Average conversation length: 15 turns, ranging from five to past thirty.
- Return rate: 40% within 30 days. Median return gap eleven days; roughly half reference something from a previous thread, “did the refund you mentioned go through?”, “I’m still seeing the login error you helped me with last week”.
- Tool useLetting an LLM call structured functions you’ve defined – search, calculator, database query, API call – instead of trying to do everything in text. : three or four per conversation. Account lookups, subscription checks, ticket creation.
- Platform: Bedrock. Nothing self-hosted.
- Team: two backend engineers, one front-end, no dedicated ML-ops.
- Compliance: GDPR. Conversation content is personal data; deletion-on-request has to be clean, retention has to be bounded.
What actually matters
“Memory” is two different problems in one word. Turn-level coherence is what makes turn fifteen aware of turn two, the kind of short-term working state that any conversational system needs within one visit. Cross-session recall is what makes the user who comes back eleven days later land on a bot that remembers the open refund ticket without asking them to retype it. Treating these as one problem is how teams end up with a single mechanism that does neither well; the retrieval shapes, retention expectations, and privacy treatments are different enough that the design has to split them.
The second question is who owns each layer? Session memory is a runtime concern, it has to be correct for every turn of every conversation, and it fails visibly when it fails. Cross-session memory is a product concern, it doesn’t have to be perfect, but it has to avoid two specific incidents: interrogating the returning user as if they were a stranger, and leaking something from the user’s own past that they’d prefer the bot didn’t bring up. Those failure modes want different owners: short-term is backend plumbing, long-term is product policy with engineering behind it.
The third is what’s the blast radius of the wrong retrieval? A turn-level miss is embarrassing: “I thought we were talking about refunds, not deliveries.” A cross-session miss is worse in two distinct ways. A false positive, retrieving someone else’s conversation as “relevant” to this user’s question, is a wrongful-disclosure incident. A false negative, failing to surface this user’s open refund when they asked about it, is a product incident and a trust deficit. The first wants strong per-user isolation that can’t be bypassed by Prompt injectionAn attack where untrusted text the model is processing tries to override the instructions you actually gave it. ; the second wants retrieval that actually works on short, fragmented conversation text, which is not the strong suit of tooling designed for prose documents.
The fourth is what is the GDPR delete story? A user who exercises their right to be forgotten has to see every record of their conversations disappear. A memory design where deletion is a three-step cascade across four stores is a compliance trap. The correct shape is one call per store, ideally scoped to a single identifier that the application already carries. That preference shows up strongly in the design, a single opaque per-user key is a clean delete primitive; a per-turn vector inside a shared knowledge base with metadata filtering is doable but harder to reason about under audit.
The fifth is what can a two-backend-engineer team actually operate? Anything that scales linearly with conversation count is trouble by year two. A bespoke summarisation cron that runs LLMA neural network trained to predict the next token in a sequence, large enough that it generalises to tasks it wasn’t explicitly trained for. calls on every session close is infrastructure the team now owns, plus its eviction policy, plus its retention TTL, plus its retry-on-failure logic. A managed alternative that does the same job behind a configuration surface frees the team to spend its attention on the product. The trade is “flexibility”, and in this product, the flexibility isn’t used.
Finally: what’s the architecture-growth story? Today’s design will be extended, more tools, more integrations, maybe a billing AgentA system that wraps an LLM with tools, memory, and a loop, so it can take multi-step actions toward a goal rather than just answering one prompt. and a support agent speaking to the same user. Memory that’s tightly coupled to a specific agent instance is easier to start with and harder to grow. Memory that’s keyed by user identity across agents is harder to start with, easier to grow. The design picks the first for now and leaves a seam where the second can be added when the product hits that complexity.
What we’ll filter on
Five filters the memory design has to tick.
- In-session coherence. Turn fifteen must be aware of turn two. The agent needs to see the relevant history of this conversation when it generates the next response.
- Cross-session recall. A user returning eleven days later should land on a bot that can reasonably answer “what was the last thing we talked about?” without asking them to retype context. Not perfect replay, a usable summary.
- Orchestration included. Fifteen turns with three tool calls per conversation means the assistant is planning, calling tools, observing results, and deciding what to do next. The memory solution has to live next to the orchestration, not compete with it.
- Retrieval quality for conversational context. Pulling the correct fact from a past conversation is a different retrieval problem from pulling the correct paragraph from a product manual. Conversation data is short, interleaved, and context-dependent.
- Operational overhead low enough for two backend engineers. No bespoke orchestration loop, no custom summarisation pipeline, no self-hosted vector database. GDPR delete has to be a button, not a project.
The memory landscape on Bedrock
Four plausible ways to build this.
Bedrock Agents’ built-in memory. A Bedrock Agent is the managed orchestration primitive: ModelA trained set of weights plus the architecture that makes them useful – the thing you load up and run inference against.
+ action groups (tool definitions) + knowledge bases + PromptThe input you hand to an LLM – system instructions, user message, examples, retrieved documents, tool descriptions, the lot.
templates, all wired together so the platform handles the plan-call-observe loop. Memory comes in two layers. Session state is automatic: every InvokeAgent call within a session sees the full conversation history, pass a sessionId and the agent assembles the history itself. Long-term session summaries are opt-in: enable memoryConfiguration with SESSION_SUMMARY, set a memoryId per end user, set a retention window (1 to 365 days). After each session ends, the agent generates a concise summary and stores it keyed to that memoryId. Delete is a single DeleteAgentMemory call.
DynamoDB-backed session store (build-your-own). Roll the orchestration loop yourself. A Lambda receives the user turn, reads conversation-so-far from DynamoDB (partition key sessionId, sort key turn timestamp), builds the prompt, calls InvokeModel, writes the response back, returns it. Cross-session recall is a second table keyed by user ID holding rolled-up state. Summaries come from an LLM call you write and schedule.
Bedrock Knowledge Bases for long-term recall. Dump transcripts or summaries into S3 and query at runtime for “what’s this user’s history?”. Chunking strategies assume a prose document; conversations are short, fragmentary, and relevance is keyed to who spoke and when. A chunk from someone else’s refund thread retrieved as “relevant” to this user’s login question is a correctness problem with a compliance problem stapled to it.
Custom vector store with conversation embeddings. Embed each conversation (or turn, or summary) with Titan Embeddings V2, store in OpenSearch Serverless or pgvector with per-user metadata, at session start query for the current user’s top-k most relevant past interactions. Full control of chunking granularity, metadata filtering, ranking. Also a second stateful system to own alongside DynamoDB.
Side by side
| Option | In-session coherence | Cross-session recall | Orchestration included | Retrieval for conversation | Low ops |
|---|---|---|---|---|---|
| Bedrock Agents memory | ✓ | ✓ | ✓ | ✓ | ✓ |
| DynamoDB session store (DIY) | ✓ | ✓ | ✗ | ✓ | ✗ |
| Knowledge Bases for past transcripts | ✗ | — | ✗ | ✗ | — |
| Custom vector store of conversation embeddings | — | ✓ | ✗ | ✓ | ✗ |
Matching the layers to the memory
Bedrock Agents memory, in depth
A Bedrock Agent is more than a model invocation, it’s an orchestration surface. Define action groups (tool schemas plus implementing Lambdas), optionally attach knowledge bases, write an instruction prompt, call InvokeAgent with a user turn and a sessionId. The runtime handles the ReAct-style loop.
Session memory is automatic. Every call with the same sessionId sees every prior turn, including tool calls and tool results. Idle timeout defaults to 30 minutes, configurable up to 24 hours. Turn fifteen sees turns one through fourteen because the agent reads them itself.
Long-term summary memory is configuration. Set memoryConfiguration on the agent with enabledMemoryTypes: [SESSION_SUMMARY] and a storageDays retention window. At runtime, pass a memoryId alongside the sessionId, typically a hash of the authenticated user ID. When the session ends, the agent generates a summary using a managed (customisable) prompt and stores it keyed to that memoryId. Subsequent sessions with the same memoryId have the prior summaries injected into context.
Retention and deletion. storageDays sets a TTL; once it lapses, the summary is gone. DeleteAgentMemory with a memoryId wipes everything for that user on demand. GDPR right-to-be-forgotten in one request.
Limits worth naming. memoryId lookup is exact-match, not semantic, no vector-search “find users with similar past experiences” built in. Summaries are bounded in length, so very long histories lose detail over time. Session memory is within an agent instance, moving a user from a support agent to a billing agent needs application-level plumbing to pass state across.
When build-your-own earns a place
Two situations flip the decision toward DynamoDB + a hand-rolled loop.
When orchestration isn’t wanted. Bedrock Agents is opinionated about how tools get called, it runs the loop, chooses which action group, writes the reasoning. Teams that need tighter control over prompts, tool ordering, or failure modes sometimes build their own orchestration loop. Session state then has to live somewhere, and DynamoDB is the natural home: partition key sessionId, sort key turn timestamp, TTL for auto-expiry.
When state is richer than turns. Conversations aren’t the only per-session state, a shopping cart, a configured quote, a workflow status aren’t naturally turns. DynamoDB holds that directly, and the tools read and write it.
Neither flip applies to the two-engineer support bot. Orchestration is standard ReAct-over-tools; state is conversational. Bedrock Agents covers both.
The hybrid worth knowing. Teams using Bedrock Agents memory often add a small DynamoDB or S3 store for structured cross-session facts, ticket numbers, subscription plan, last-known issue code, that the agent needs reliably regardless of whether they appear in a generated summary. Summary memory is the prose recall; the DynamoDB table is the structured one. A tool the agent calls to fetch it is the clean seam.
Why Knowledge Bases is the wrong shape for conversations
Four reasons.
Chunking doesn’t match. Knowledge Bases chunk documents, fixed-size (default ~300 tokens), hierarchical, or semantic, assuming nearby text is topically coherent. A conversation transcript has rapid speaker alternation, interleaved tool outputs, and short turns; a 300-token chunk spans three sub-topics and two speakers.
Retrieval relevance is topic, not speaker. A vector search for “refund” across a knowledge base of all transcripts will cheerfully return high-similarity chunks from other users’ refund conversations. Compliance problem plus correctness problem. Metadata filtering by user ID helps but has to be attached at ingestion and is less flexible than a native vector store’s.
Summaries vs transcripts. Storing raw transcripts means retrieving fragments. The correct thing to retrieve is summaries, and generating those is the job Bedrock Agents’ long-term memory already does.
GDPR is harder. Deleting a user’s data means locating every chunk that contains their content in a service-managed index, then re-ingesting. DeleteAgentMemory is one call.
Knowledge Bases are correct for “what does our support policy say about refunds?”, a reference corpus shared across users. Wrong for “what did this user say yesterday?”, per-user conversational state.
A worked design
- Bedrock Agent wrapping Claude Haiku 4.5, latency-sensitive, cost-sensitive, reasoning bar for first-line support is low enough. Action groups for account lookup, subscription status, ticket create/query. One Knowledge Base attached for the product documentation corpus, the policy memory, not the user memory.
- Session memory: on by default.
sessionIdis the chat-widget session, rotated on explicit “new conversation” or 30 minutes idle. - Long-term summary memory:
memoryConfigurationwithSESSION_SUMMARY,storageDays: 90.memoryIdissha256(userId), stable per authenticated user, doesn’t leak the raw ID.sessionIdandmemoryIdboth passed on everyInvokeAgentcall. - Structured cross-session state: a small DynamoDB table keyed by user ID, holding open ticket IDs, subscription tier, last-issue-code. A GetUserContext action group lets the agent fetch this at conversation start when relevant.
- GDPR delete: a Lambda triggered by account closure calls
DeleteAgentMemorywith the user’smemoryId, deletes the DynamoDB row, records an audit trail. - Retention: summaries lapse after 90 days via
storageDays. - Monitoring: CloudWatch on
InvokeAgentlatency and error rate; a weekly anonymised sample of summaries reviewed for quality.
No dedicated memory database, no custom summarisation cron, no per-user vector index. The memory plumbing comes with the agent.
What’s worth remembering
- Short-term and long-term memory are different problems. Turn-level coherence within one conversation is session state; cross-visit recall is summary state. A single solution rarely does both well unless it was designed for both.
- Bedrock Agents memory covers both layers as managed functionality. Session memory is automatic, pass a
sessionId. Long-term summary memory is configuration, enableSESSION_SUMMARY, pass amemoryId, setstorageDays. memoryIdscopes long-term memory to a user;sessionIdscopes session memory to a conversation. Orthogonal identifiers, both passed on everyInvokeAgentcall when long-term memory is enabled.DeleteAgentMemoryis the GDPR delete button. One API call, scoped to amemoryId. Retention also lapses automatically viastorageDays(1 to 365).- Knowledge Bases are for reference corpora, not conversational state. Chunking, retrieval relevance, and per-user isolation all work against using them for past-transcript recall.
- DynamoDB fits as structured-state companion to Bedrock Agents memory. Ticket IDs, subscription tier, status flags, things the agent fetches via a tool call, not things the agent summarises in prose. A hybrid is common and clean.
- A custom vector store over conversation embeddings is flexibility that costs a team. Justified when cross-user semantic similarity is a product feature; overkill when the product just needs “remember this user”.
- Bedrock Agents includes orchestration. Action groups, knowledge bases, and the ReAct-style tool loop come with the agent. Build-your-own means rebuilding that loop, more code to own, no better outcome for standard shapes.
The answer: use a Bedrock Agent with session memory for in-conversation coherence and long-term summary memory (SESSION_SUMMARY, memoryId per user, storageDays retention) for cross-session recall. Attach a Knowledge Base for product documentation, the reference corpus every user shares. Add a small DynamoDB table of structured per-user state (open tickets, subscription tier) behind a GetUserContext action group. Wire DeleteAgentMemory into the account-closure path for GDPR. The two engineers ship a memory system without operating a memory system.