Buy, Borrow, Build

A product manager at a B2B SaaS has been told by their CEO to “add AI to the product.” They have no machine-learning background, a team that has never trained a model, and a fortnight to show something. They’ve heard of Amazon Bedrock, Amazon SageMaker, and a grab-bag of named services – Comprehend, Translate, Textract, Rekognition. Those three categories aren’t alternatives for the same problem; they’re three different shapes of AI offering, and the correct one for any given feature depends on whether a ready-made service already does the job.

The situation

A mid-sized B2B SaaS runs a help-desk product. Customers raise support tickets in a web form; the text flows through queues, is triaged by a rules engine, and lands in an agent’s inbox. The CEO has returned from a conference with a directive: “add AI.” Board presentation in three weeks.

The PM has a backlog of half-formed ideas – six of them:

Sentiment on inbound tickets – flag angry customers so agents prioritise them.
Auto-translate tickets from non-English customers into the agent’s language, and the agent’s reply back.
Extract structured fields from attached PDFs – invoices, purchase orders – so agents don’t retype.
Moderate screenshots for anything NSFW before a human sees them.
Draft suggested replies based on the ticket and the knowledge base.
Rank the backlog by predicted priority using last year’s labelled tickets.

Six features. The board wants “AI.” The PM wants a plan that picks the shortest path for each.

Constraints are harsh but familiar: no ML background on the team (three backend engineers, one front-end, no data scientist); fast time-to-prototype (something running against real tickets within the fortnight); predictable cost (a line item finance can sign off without a model-unit-hour forecast); and no infrastructure the team has to babysit (managed endpoints, not GPU fleets).

What we might want from this

Before reaching for a service, it’s worth being honest about what “add AI” actually has to mean for a team without ML staff.

The first thing is that the cheapest AI feature is one AWS has already built. If the task is “flag angry emails” or “pull fields off an invoice,” those are problems many companies have; there’s a fair chance AWS has shipped a service that does exactly that. Starting at that layer – managed, task-specific, one API call – beats anything bespoke on time-to-prototype, cost, latency, and stability of behaviour. Only if no pre-built answer exists does it make sense to go up a layer.

The second is what happens the week after launch. A prototype is easy; a prototype the team has to keep alive for a year is harder. A managed service that AWS updates is no-maintenance. A foundation-model prompt on Bedrock is low-maintenance but drifts when the vendor retrains. A bespoke SageMaker model is high-maintenance – training data, drift monitoring, endpoint scaling, retraining cadence. The PM’s fortnight should generate something closer to the first shape than the third.

The third is cost predictability. Finance wants a line item. Per-request, per-page, per-character, per-token pricing gives a line item; it scales with use, which is usually fine. Per-instance-hour pricing for inference endpoints is a capacity forecast – uncomfortable when the product is new and the load is unknowable. Training jobs are per-compute-hour spikes with no guarantee the output model is actually good. The shape of the bill has to match the predictability of the product.

The fourth is time-to-quality. “Working prototype” is not “good enough to ship.” A managed service ships with AWS’s quality baseline; a tuned prompt ships with whatever the tuner can squeeze out of a general model; a bespoke model ships with whatever the data supports and the team can evaluate. The team has no evaluation muscle – which means the further up the custom stack they go, the longer they spend not sure if it’s good enough.

The fifth is how many of these features are really the same problem. Sentiment and moderation and field extraction are recognisably distinct; “draft a reply” and “rank by priority” look different from those and different from each other. A programme plan that picks the shortest path per feature – not the same path for all six – gets to shipped faster than one that forces everything through one layer.

Finally, the correct answer for one is not the correct answer for all. Some of the backlog is Layer 1 (managed service exists); some is Layer 2 (general-purpose LLM with a prompt); some is Layer 3 (bespoke model). The programme has three shapes of work, not one. The PM’s job for the fortnight is sorting the six features into those three buckets and picking the bucket that actually ships.

The attributes that matter

Task already solved – does AWS ship a service that does this exact thing?
No ML expertise needed – team writes application code, doesn’t train models.
Predictable usage-based pricing – per-request, per-page, per-token. Scales with use.
Fully managed – no EC2, no containers, no endpoints the team provisions.
Time to prototype – days, not weeks.

The AWS AI landscape

AWS groups its AI offerings into three layers.

Layer 1 – Managed AI services. Pre-built, task-specific: detect sentiment, translate text, extract data from a form, find faces in an image. AWS trained the model, AWS hosts it, AWS updates it. The service is the feature.

Layer 2 – Amazon Bedrock. A serverless API over a catalogue of general-purpose foundation models from Anthropic, Amazon (Nova and Titan), Meta, Mistral, Cohere, AI21, Stability, and others. One API, many models. Pick the model, write the prompt, pay per token.

Layer 3 – Amazon SageMaker AI. The platform for building, training, and hosting your own models. Notebooks, training jobs, inference endpoints, batch transform, feature stores, model registries. Pay per compute-hour at every stage. Where a data-science team lives when no pre-built answer fits.

The attribute table

Layer	Pre-built task	No ML expertise	Predictable pricing	Fully managed
Managed AI services	✓	✓	✓	✓
Bedrock	–	✓	✓	✓
SageMaker AI	✗	✗	✗	✗

The rule of thumb: for each feature, work down the layers. Start at Layer 1; move to Layer 2 only if no managed service fits; reach Layer 3 only if neither will do.

Matching six features to three layers

Five features ship in a fortnight across two layers; the sixth waits for the correct team. Three layers, work top-down, and the plan writes itself.

The managed AI services in depth

The Layer 1 catalogue is worth knowing by name. Each has a scope, an SDK call, and a unit of billing you can put on a napkin.

Amazon Comprehend. NLP over text: sentiment, entities, key phrases, language detection, PII, topic modelling. Billed in units of 100 characters, three-unit minimum per request, $0.0001 per unit for the first 10M. Free tier: 50,000 units/month for 12 months. Where ticket sentiment lives.

Amazon Translate. Machine translation across 75+ languages, real-time and batch. $15 per million characters for standard. Free tier: 2M characters/month for 12 months.

Amazon Textract. Extracts text, handwriting, tables, and form data from documents. DetectDocumentText at $0.0015/page (raw OCR), AnalyzeDocument at $0.015/page (tables) or $0.05/page (form key-value pairs). Invoices use AnalyzeExpense at $0.01/page. Free tier: 1,000 pages/month for three months.

Amazon Rekognition. Image and video analysis: label detection (10,000+ categories), face detection and comparison, content moderation, OCR-in-images. $0.001/image for the first million. Free tier: 1,000 images/month, per API group, for 12 months. NSFW moderation is a single DetectModerationLabels call.

Amazon Transcribe. Speech-to-text. Per second (15-second minimum); standard starts at $0.024/minute. Free tier: 60 minutes/month for 12 months.

Amazon Polly. Text-to-speech. Standard voices $4/M characters; neural $16/M characters.

Amazon Lex. Conversational bots. $0.004/speech request, $0.00075/text request.

Amazon Kendra. Enterprise semantic search. Priced per index-hour (GenAI Enterprise from $0.32/hour) – uniquely for Layer 1.

Amazon Personalize. Recommendations. V2 real-time at $0.15/1,000 requests.

Amazon Fraud Detector and Amazon Forecast: pre-built online-fraud scoring and time-series forecasting.

Against the backlog, four of the six ideas find a managed-service home:

Sentiment to Comprehend DetectSentiment.
Auto-translate to Translate TranslateText.
Extract fields from PDFs to Textract AnalyzeDocument or AnalyzeExpense.
Moderate screenshots to Rekognition DetectModerationLabels.

Four features, four SDK calls, four line items.

When Bedrock is the answer

Two ideas don’t match a managed service.

“Draft suggested replies based on the ticket and the knowledge base.” There’s no DraftReply API – the tone, the structure, the policy constraints, and the knowledge base are all specific to this company. But it is exactly what a general-purpose language model is for.

Bedrock’s shape: one Converse API call, pick a model ID, pass the prompt, get the generation. Per-token pricing, input and output charged separately. No training.

A sample of 2026 on-demand prices, for scale rather than memorisation:

Claude Haiku – cheap, fast. Roughly $1/$5 per million input/output tokens.
Claude Sonnet – mid tier most production RAG lands on. Roughly $3/$15.
Claude Opus – premium. Roughly $15/$75.
Amazon Nova Lite – Amazon’s own cheap tier, roughly $0.06/$0.24.
Meta Llama 3.1 70B – open-weight, competitive mid tier.

Two things matter for the PM. First, the model is a runtime parameter, not an architectural commitment. Switch Haiku to Sonnet via the model ID. Start cheap, upgrade only if quality doesn’t clear the bar. Second, Bedrock is serverless and per-token – same “no infrastructure, predictable cost” shape as the managed services, just with the model you chose.

Bedrock’s adjacent features – Knowledge Bases for RAG, Guardrails for content safety, Agents for tool-using workflows – are available without leaving the SDK. A first-cut reply drafter is Bedrock plus a Knowledge Base pointed at the docs store.

When SageMaker is the answer

One idea fits neither Layer 1 nor Layer 2.

“Rank the backlog by predicted priority using last year’s labelled tickets.” No managed service exists for priority prediction; priority is company-specific. Bedrock could be asked to assign priority via a prompt, but the input is tabular: structured ticket features plus a labelled history. Classical supervised learning, not generation.

SageMaker’s parts: Studio/notebooks for exploration (per-instance-hour); training jobs (per-instance-hour); real-time inference endpoints (persistent, per-hour); serverless inference; batch transform; Autopilot and Canvas for teams without deep ML expertise (lower skill bar, not lower infrastructure).

What makes it distinctively Layer 3 – not a fancier Bedrock – is the shape of the work. Training a priority classifier means feature engineering, labelled data, train/test splits, hyperparameter tuning, evaluation metrics, drift monitoring. SageMaker is the toolset for doing that properly. Without the skills, SageMaker is a very expensive Jupyter notebook.

The honest answer: defer the priority-ranking feature. Ship the other five on Layers 1 and 2, come back when a data scientist exists – or prototype in SageMaker Canvas with Autopilot and accept a rougher quality bar. Don’t stand up a training pipeline just to tick the “we have AI” box.

A worked trace through the backlog

Sentiment. Comprehend DetectSentiment. 500-char ticket = five units at $0.0001 = $0.0005/ticket. 10,000 tickets/month = $5 before the free tier. Layer 1.

Auto-translate. Translate both directions. 500 chars at $15/M = $0.0075/ticket. A thousand exchanges + replies/month = $15. Layer 1.

Extract invoice PDFs. Textract AnalyzeExpense at $0.01/page. 500 attachments × 2 pages = $10. Layer 1.

Moderate screenshots. Rekognition DetectModerationLabels at $0.001/image. 2,000/month = $2. Layer 1.

Draft replies. Bedrock Converse to Claude Haiku. Ticket (~700 tokens) + KB excerpt (~1,500) + draft (~300 out) = 2,200 in + 300 out. At Haiku’s ~$1/$5 per million: ~$0.004/draft. 1,000/month: ~$4. Upgrade to Sonnet proportionally if quality disappoints. Layer 2.

Rank backlog. Deferred; or, if a Canvas prototype is acceptable, tens of dollars in training compute plus a small endpoint. Layer 3.

Total running cost for the five shipped features: well under $50/month.

Where Bedrock and managed services overlap

Some features could be done at either Layer 1 or Layer 2. Sentiment is the classic case. Comprehend has a dedicated trained classifier; Bedrock can classify via a prompt. Which is correct?

Rule of thumb: prefer the managed service when one exists.

Cost. Comprehend at $0.0001/100-character unit beats Bedrock per-token pricing for short-classification tasks by an order of magnitude.
Latency. A purpose-built endpoint beats a general-purpose LLM parsing the instruction every time.
Behaviour stability. Comprehend’s API doesn’t change when AWS retrains; a Bedrock prompt drifts when the vendor ships a new model version.

The flip side: when the task is bespoke or the classes aren’t standard, Bedrock wins. Classify tickets into seven company-specific categories that don’t map onto Comprehend’s entity types? Prompt the model. The taxonomy is in the prompt, not hidden in a service’s training data.

Not “always managed” or “always Bedrock” – fixed-schema tasks where the managed service fits, open-schema tasks where instruction-following is the feature.

What’s worth remembering

AWS AI sorts into three layers. Managed services for pre-built tasks. Bedrock for foundation-model access. SageMaker for custom training and hosting. Work down from the top.
The managed-service catalogue. Comprehend (text NLP), Translate, Textract (documents), Rekognition (images/video), Transcribe (speech to text), Polly (text to speech), Lex (chatbots), Kendra (search), Personalize (recommendations), Fraud Detector, Forecast.
Per-unit pricing shapes by service. Per-100-character (Comprehend), per-million-character (Translate, Polly), per-page (Textract), per-image (Rekognition), per-second (Transcribe), per-request (Lex), per-index-hour (Kendra), per-token (Bedrock), per-instance-hour (SageMaker).
Bedrock is serverless and per-token. One API over many foundation models. The model ID is a runtime parameter.
SageMaker is the build layer, not the buy layer. Reserve it for tasks that don’t exist in Layers 1 or 2. Canvas and Autopilot lower the skill bar, not the infrastructure layer.
Managed services beat Bedrock for fixed-schema, common tasks. Lower cost, lower latency, more stable. Use Bedrock when bespoke.
Most free tiers are per-month, 12 months, new-customer. Enough to prototype every feature without hitting billing.
The “no ML background” constraint filters aggressively. It eliminates SageMaker from the default answer for most problems and pushes teams toward Layers 1 and 2.