Why (and how) to integrate an LLM into a SaaS app

Published on April 1, 2026 · 4 min read

LLM
SaaS
TypeScript
Claude
Integration

“We’d like to put some AI into our app.” I get this request every week. Most of the time, what the client actually wants is to solve a concrete business problem — not stick a chatbot in some corner. Here’s how I cut through it.

Why do it now

Three reasons, no more:

Save user time. Summarize, classify, extract, rewrite. Tasks they do by hand today.
Unlock new use cases. Semantic search, contextual assistant, draft generation. Impossible three years ago.
Stay competitive. Your competitors already shipped it. Six months behind means a year to catch up.

A generic chatbot dropped in the corner of the UI is not a strategy, though. It’s friction without value.

The three use cases that actually work

Across my projects, these are the only ones that produce a measurable ROI.

Targeted text transformation

The user has one piece of text and wants another. Ticket summary, email rewrite, extraction from a PDF. On-demand invocation, clear context, structured output. Simple, profitable.

Search and Q&A on your data

Question in natural language, answer based on the account’s data. That’s RAG. Useful for internal docs, knowledge bases, ticket archives. Not to be confused with “a chatbot on the website.”

Assisted structured generation

The user wants to create a business object: product, campaign, template. The LLM pre-fills it from a short brief. The human edits. Acceptance rate measured, no blind automation.

Minimal architecture

The structure I deploy first on a Next.js or Hono project:

// app/api/llm/summarize/route.ts
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

export async function POST(req: Request) {
  const { text, userId } = await req.json();

  // 1. Per-user quotas
  const allowed = await checkQuota(userId);
  if (!allowed) {
    return Response.json({ error: "quota_exceeded" }, { status: 429 });
  }

  // 2. LLM call with versioned system prompt
  const response = await client.messages.create({
    model: "claude-sonnet-4-5",
    max_tokens: 1024,
    system: SUMMARIZE_SYSTEM_PROMPT,
    messages: [{ role: "user", content: text }],
  });

  // 3. Log usage for internal pricing
  await logUsage(userId, response.usage);

  const summary = response.content[0].type === "text"
    ? response.content[0].text
    : "";

  return Response.json({ summary });
}

Four non-negotiables:

Per-user quotas. Otherwise a single client drains your account overnight. I’ve seen it happen.
Versioned system prompt. You’ll change it ten times in the first month.
Usage logging. Tokens, cost, latency. Without this, you’re flying blind.
Clean error handling. Retry with backoff, fallback provider. Anthropic 500s do happen.

Classic pitfalls

Streaming everything without thinking

Streaming is useful on long interactive responses. Useless on a 200 ms structured extraction. Pick based on UX, not on what’s trendy.

Forgetting the cache

Two users, same question, same document. Paying twice makes no sense. A Redis cache keyed on hash(prompt + context) cuts 30 to 70% of costs on repetitive features.

Ignoring perceived latency

An LLM takes 2 to 10 seconds. Without a skeleton or streaming, your users think the app is broken. Test on 4G, not on your fiber connection.

Coupling yourself to a single provider

I wire Claude and OpenAI behind the same interface from day one. When a provider goes down (it happens) or triples its prices (also happens), I switch with one env variable.

interface LLMProvider {
  complete(prompt: string, opts: CompleteOpts): Promise<string>;
}

class ClaudeProvider implements LLMProvider { /* ... */ }
class OpenAIProvider implements LLMProvider { /* ... */ }

const llm: LLMProvider = pickProvider(process.env.LLM_PROVIDER);

The real topic: cost

An LLM feature that works technically can wreck your unit economics. Before going to prod:

Estimate the number of calls per user per day.
Multiply by average cost (input + output tokens).
Compare to your margin per user.

Above 20 to 30% of the margin, you have to make a call: cache, smaller model on simple cases, rate limit by plan, usage-based billing. I prefer an “AI as a paid add-on” plan over diluting the standard plan’s margin — it’s more honest both for the customer and for the business.

Where to start

My 2- to 4-week protocol:

Week 1. A use case that replaces a clear manual action. Not “a chatbot,” but “auto-summarize the weekly reports.”
Week 2. Local prototype, prompt iteration, 10 to 20 real cases.
Week 3. Integration with quotas, logs, fallback provider.
Week 4. Production behind a feature flag, gradual rollout.

That’s the framework I use to ship LLM integrations on European B2B SaaS in under a month, without blowing up the cost structure. If you have a concrete case in mind, let’s talk on a 30-minute call.