I upgraded an LLM SDK and expected a routine version bump.
Instead, I had to touch 15+ files, fix breaking changes across four providers, and spend the rest of the day hoping I hadn’t missed one.
That was the second time. I knew there’d be a third.
So I made a rule: Only 2 files in the entire codebase are allowed to import the LLM SDK.
One adapter, one provider registry. Everything else (18 activity files, 10 agent files, all workflows) operates without knowing which provider is being called, which model is selected, or how the SDK works.
A week later, the results:
- The next SDK upgrade touched 2 files. All 28 other files were unchanged.
- Provider switching became a config change, not a refactor.
- The final migration deleted more code than it added: 192 insertions, 688 deletions.
But here’s what mattered more than the SDK fix, solving this problem exposed a bigger one:
- I wasn’t just calling LLMs in 21 different places.
- I was reimplementing the same 7 cognitive operations (classify, draft, score, summarize, extract, plan, analyze) with slight variations everywhere.
- Fixing that turned out to be more valuable than the SDK isolation itself.
Here’s the full story.
The Problem: API/SDK Coupling, Concretely!#
Press enter or click to view image in full size
My project is a personal assistant that runs 24/7, handling many many tasks, including email triage, meeting prep, morning briefings, CRM updates, social media intelligence, job applicant scoring, and many more with a natural language workflow engine. All orchestrated by my own durable workflow engine, powered by LLM calls.
Before the refactor, the dependency flow looked like this:
Activity files (18+)
│ generateText(), generateObject(), tool()
│ each file imports from "ai" and "@ai-sdk/*"
▼
model-router.ts
│ selectModel() → budget gate → provider
│ also imports from "ai" and "@ai-sdk/*"
▼
@ai-sdk/anthropic, @ai-sdk/openai, and other providersThe key issue: the SDK appears in both the routing layer and the activity layer.
Every activity file had SDK calls inline. The ModelRouter service handled model selection and budget gating, but it returned raw SDK model objects that activities then used directly. The SDK was everywhere.
Three problems:#
API/SDK upgrades were expensive:
Vercel AI SDK v4 to v6 renamed maxTokens to maxOutputTokens, changed CoreMessage to ModelMessage, deprecated generateObject(), and renamed args to input on tool calls. Mechanical changes, but across 15+ files with different usage patterns. One missed rename meant a runtime crash, not a compile error.
Prompt patterns were duplicated:
Five activities classified content with five different prompt structures. Nine activities drafted outbound messages with nine different tone injections. When I improved a classification prompt in one place, I had to remember to update it in four other places. I often forgot.
The router abstracted selection but not usage:
The model router centralized which model to pick, but activities still held SDK-specific types. Switching a classification task from Claude Sonnet to GptOSS still meant changing import paths and type signatures in the activity code. Partial abstraction is worse than none. It hides complexity without removing it.
Most teams building production LLM systems end up in one of two places: either calling the SDK directly everywhere (tight coupling), or wrapping it in thin helpers that still leak provider details into business logic. LangChain-style abstractions try to solve this, but often replace one dependency with another. The goal here was different: make business logic completely unaware that an LLM SDK exists.
The Solution: Ports and Adapters for Multi-Provider Production LLM Systems#
The fix was not a new library. It was a boundary.
A ports-and-adapters pattern separates intent, execution, and policy.
The fix was a clean ports-and-adapters pattern, following hexagonal architecture pattern (Alistair Cockburn).
Hexagonal architecture (software)
If you’ve done backend development, you’ve probably used this for databases or HTTP clients. The insight is that it works just as well for LLM providers.
The architecture separates three concerns:
- Intent: what the system wants done (classify this email, draft this reply)
- Execution: how a provider is called (SDK-specific API, type translations)
- Policy: which model is used, cost limits, fallback chains (config-driven)
Here’s how the layers map:
Activities / Agents / NL Workflows
│ llmClassify(), llmDraft(), llmScore(), llmSummarize()
│ llmExtract(), llmPlan(), llmAnalyze()
▼
LLM Capabilities (src/ai/capabilities/)
│ Standardized prompts, temperature, token budgets, quality tracking
│ each capability picks the right taskType internally
▼
LLM Port (src/integrations/ports/llm-port.ts)
│ TypeScript interface, zero SDK imports
▼
Vercel AI v6 Adapter (1 file) + Provider Registry (1 file)
│ │
│ implements the port │ reads .env, creates SDK clients,
│ translates calls │ selects models, gates budgets
▼ ▼
@ai-sdk/anthropic, @ai-sdk/openai, and other providersThe shift: activities no longer call models. They call capabilities.
⚡ Once you stop calling models and start defining operations, the system becomes composable.
Let me walk through each layer.
Layer 1: The Port Interface#
Press enter or click to view image in full size
The LLMPort is a TypeScript interface with zero imports from any AI SDK. It only imports z from Zod (for schema typing). That's it.
// src/integrations/ports/llm-port.ts
import type { z } from "zod";
export type TaskType =
| "triage" | "classification" | "warmth-scoring"
| "data-extraction" | "summarization" | "research"
| "tone-draft" | "routine-draft" | "complex-reasoning"
| "structured-output" | "general";
export type LLMPriority = 0 | 1 | 2 | 3;
export interface LLMPort {
generateText(options: {
taskType: TaskType;
priority?: LLMPriority;
instructions?: string;
prompt: string;
maxOutputTokens?: number;
temperature?: number;
}): Promise<GenerateTextResult>;
generateStructured<T>(options: {
taskType: TaskType;
priority?: LLMPriority;
instructions?: string;
prompt: string;
schema: z.ZodType<T>;
schemaName?: string;
maxOutputTokens?: number;
temperature?: number;
}): Promise<GenerateStructuredResult<T>>;
runAgent(options: {
taskType: TaskType;
priority?: LLMPriority;
instructions: string;
messages: LLMMessage[];
tools: Record<string, ToolDefinition>;
maxSteps?: number;
}): Promise<AgentResult>;
streamText(options: {
taskType: TaskType;
priority?: LLMPriority;
instructions?: string;
prompt: string;
}): AsyncIterable<string>;
}Four methods. Every LLM interaction in the system is one of these: generate text, generate a typed object, run an agent loop, or stream text.
Notice what’s missing: there’s no model name parameter. No provider parameter. The caller says what it wants (taskType: "triage", priority: 1) and the infrastructure decides how. Activities express intent. They never choose models. That decision is always deferred to policy.
The result types are also SDK-independent:
export interface GenerateTextResult {
text: string;
usage: TokenUsage;
modelId: string; // for logging, not for routing
}The modelId comes back for logging and quality tracking. It's read-only information, never an input.
Layer 2: The Adapter (1 File) and Registry (1 File)#
The adapter implements LLMPort using the Vercel AI SDK. One file, vercel-ai-adapter.ts. This is the only file in the project that imports generateText and streamText from "ai".
// src/integrations/providers/llm/vercel-ai-adapter.ts
import { generateText, streamText, Output, jsonSchema, tool } from "ai";
export function createVercelAIAdapter(): LLMPort {
return {
async generateText(options) {
const { model } = await selectModel(options.taskType, options.priority);
const result = await generateText({
model,
system: options.instructions,
prompt: options.prompt,
maxOutputTokens: options.maxOutputTokens,
temperature: options.temperature,
});
return {
text: result.text,
usage: extractUsage(result),
modelId: result.response?.modelId ?? "unknown",
};
},
// ... generateStructured, runAgent, streamText
};
}The adapter translates between port types and SDK types. options.instructions becomes system. LLMMessage[] becomes ModelMessage[]. ToolDefinition becomes ToolSet. All the SDK-specific ceremony lives here and nowhere else.
The provider registry (provider-registry.ts) is the other file that touches the SDK. It creates provider instances from environment variables:
# .env
LLM_PROVIDER_GPTOSS_BEDROCK=provider-fast|model-fast|200
LLM_PROVIDER_CLAUDE_SONNET=anthropic|claude-sonnet-4-6-20250514|50
LLM_PROVIDER_GPT5=openai|gpt-5.4-turbo|30
TASK_ROUTE_TRIAGE=provider-fast,claude-haiku
TASK_ROUTE_TONE_DRAFT=claude-opus,claude-sonnet
TASK_ROUTE_COMPLEX_REASONING=claude-opus,claude-sonnet,provider-fastEach LLM_PROVIDER_* entry declares: which SDK, which model ID, and the hourly request budget. Each TASK_ROUTE_* entry maps a task type to a fallback chain. The registry parses these at startup and exposes a selectModel(taskType, priority) function.
Budget gating uses Redis: each provider alias gets an hourly counter. When the counter exceeds the budget, the registry skips to the next provider in the chain. P0 (critical) tasks bypass the budget entirely, so urgent triage never gets blocked because the morning briefing burned through the hour’s quota.
Want to add a new provider? One line in .env. Want to reroute classification to a cheaper model? One env var. No code changes.
⚡ Only two files in the entire codebase are allowed to import the LLM SDK.
Layer 3: Reusable LLM Capabilities (The Part That Surprised Me)#
I started this refactor to solve the SDK coupling problem. Then something more important became obvious.
I wasn’t calling “LLMs” in 21 different places.
I was performing the same set of cognitive operations over and over.
Five activities classified content with five different prompt structures. Nine drafted messages with nine different tone injections. Same operations, no shared implementation.
So I built seven capability functions that sit between the activities and the port:
// Usage in an activity file:
import { llmClassify } from "../ai/capabilities/index.js";
const result = await llmClassify({
content: emailBody,
schema: triageSchema,
schemaName: "email-triage",
rubric: TRIAGE_RUBRIC,
context: senderProfile,
});Each capability abstracts a full pattern, not just a prompt.
Every call goes through the same structure:
- Owns its prompt template
- Sets the right temperature
- Picks the right task type
- Logs quality events automatically
Here’s the llmClassify implementation:
export async function llmClassify<T>(input: {
content: string;
schema: z.ZodType<T>;
schemaName: string;
rubric?: string;
boundaryExamples?: string;
context?: string;
priority?: LLMPriority;
}): Promise<T> {
const llm = getLLMPort();
const systemParts = [
`<role>Classification system for executive assistant.</role>`,
`<domain_context>${BABAK_PROFILE}</domain_context>`,
];
if (input.rubric) {
systemParts.push(`<rules>\n${input.rubric}\n</rules>`);
}
if (input.boundaryExamples) {
systemParts.push(`<boundary_examples>\n${input.boundaryExamples}\n</boundary_examples>`);
}
systemParts.push("Classify decisively. Do not hedge.");
const result = await llm.generateStructured({
taskType: "triage",
instructions: systemParts.join("\n\n"),
prompt: `<content>\n${input.content}\n</content>\nClassify the content above.`,
schema: input.schema,
schemaName: input.schemaName,
temperature: 0,
});
await logQualityEvent({
taskType: `classify:${input.schemaName}`,
model: result.modelId,
outputText: JSON.stringify(result.data).slice(0, 500),
});
return result.data;
}The llmDraft capability does even more. It loads writing samples from PostgreSQL, selects the tone register (warm, firm, strategic, technical, public), injects an anti-sycophancy blacklist (Never say 'I wanted to reach out' or 'I hope this finds you well'), applies channel-specific constraints (SMS = 160 chars, email = 150-250 words), and uses completion prompting for short-message channels. Before, nine activity files each built their own version of this. Now it's one function.
The seven capabilities: llmClassify, llmScore, llmDraft, llmSummarize, llmExtract, llmPlan, llmAnalyze.
This is the part that surprised me. I set out to isolate the SDK, and the real payoff was that prompt engineering stopped being scattered strings and became reusable system assets. When I improve a classification prompt now, every classification in the system gets better. When I add boundary examples, every classifier benefits. The compound effect is significant.
The Before/After#
Press enter or click to view image in full size
⚡“The LLM stopped being a dependency and became infrastructure.”
Here’s what changed, concretely:
Before:
- 15+ files imported the SDK
- 5 classification prompts
- 9 drafting implementations
After:
- 2 files import the SDK
- 1 classification capability
- 1 drafting capability
The Phase 4 commit deleted 688 lines and added 192. The codebase got smaller. That’s my favorite number from this whole project, the best signal the abstraction is working. Abstraction that removes code instead of adding it.
The Proof: The SDK Upgrade That Didn’t Hurt#
Press enter or click to view image in full size
The real test came during the Vercel AI SDK v4 to v6 upgrade. Here’s the commit message:
Upgrade Vercel AI SDK to v6: ai@6, anthropic@3, openai@3, and others
Breaking changes applied (only 2 files + 1 minor fix):
- vercel-ai-adapter.ts: maxTokens to maxOutputTokens
- agent-runtime.ts: CoreMessage to ModelMessage, maxSteps to stopWhen(stepCountIs)
- activities.ts: non-null assert on optional tool.execute⚡All 18 activity files unchanged. All 10 agent files unchanged.
That last line is the whole point. A major SDK version upgrade with breaking API changes, and 28 out of 31 files didn’t change at all. They didn’t change because they don’t know the SDK exists.
If a core dependency upgrade touches your business logic, your boundaries are wrong.
How This Maps to Hexagonal Architecture#
If you’ve read about hexagonal architecture (Alistair Cockburn) or ports and adapters, this is the same idea. The port is the interface your business logic depends on. The adapter is the implementation that connects to external infrastructure. Business logic never imports from infrastructure; it only talks to the port.
Press enter or click to view image in full size
We already do this for databases, payment processors, and message queues. Nobody scatters raw PostgreSQL queries across their business logic. LLM providers belong in the same category: they’re infrastructure, not application logic.
When This Pattern Works (and When It Doesn’t)#
I won’t pretend this is always the right approach.
Use this when:
- You use 2+ LLM providers (or might switch in the future)
- You have 5+ LLM call sites across different files
- Your SDK upgrades keep causing multi-file changes
- You want centralized quality tracking and cost monitoring
- You want config-driven model routing without code changes
Skip this when:
- You have a small project with 1–2 LLM calls
- You’re prototyping and not yet committed to an architecture
- You’re using a provider-specific feature that doesn’t generalize (like Anthropic’s prompt caching, though you can add that to the adapter)
- You’re building a library that wraps an LLM SDK (you are the adapter)
What I’d Do Differently#
Two things. First, I’d skip the model router entirely and go straight to the port from day one. The router was a half-measure: it centralized model selection but still leaked SDK types to callers. If you’re going to abstract, do it properly from the start. It would have saved me an intermediate migration step.
Second, I’d define the capability functions before writing any activities. Start with “what are the 5 to 7 things I ask an LLM to do?” and build those as reusable functions first. I discovered the seven capability patterns by noticing duplication across 18 existing files. If I’d started with the capabilities, the activities would have been cleaner from the beginning.
The Takeaway#
The two-file rule is simple: only two files in your project should import from the LLM SDK. One adapter (translates your interface to SDK calls) and one registry (creates provider instances from config). Everything else talks to your interface.
The capability layer is a bonus, but it turned out to be the bigger win: once you have the port, you’ll naturally notice that the same 5 to 7 prompt patterns repeat across your codebase. Extract them. Each capability owns its prompt template, temperature, token budget, and quality tracking. Activities become pure intent: “triage this email,” “draft this reply,” “score this lead.” The LLM details vanish. At scale, LLM architecture decisions matter more than model choice.
Three things follow immediately:
- Upgrades become trivial (2 files, not 15)
- Providers become interchangeable (config, not code)
- Prompts become reusable assets (capabilities, not scattered strings)
The LLM stops being a dependency you manage.
It becomes infrastructure you configure.
And once you make that shift, everything else gets simpler.
This approach follows the same pattern I use for email providers (Gmail and Outlook share a MessagePort), calendar providers (Google Calendar and Outlook Calendar share a CalendarPort), and SMS (Android relay implements a MessagePort). The LLM is just another external service. Treat it like one.
Thank you for reading. If you’re wrestling with a similar problem in your codebase, I’d be happy to hear how you solved it.
Frequently Asked Questions#
What is the best way to architect LLM applications for multiple providers?
A modular architecture using ports and adapters allows you to decouple business logic from LLM providers, making systems scalable and maintainable. The key is a provider-agnostic interface that your business logic depends on, with a single adapter file per SDK.
How do you switch between LLM providers without changing application code?
Use a provider registry that reads configuration from environment variables. Each task type maps to a fallback chain of providers. Switching models or providers is a config change, not a code change.
Why is prompt reuse important in AI systems?
Reusable prompt capabilities (classify, draft, score, summarize, extract, plan, analyze) reduce duplication, improve consistency, and allow system-wide improvements from a single change. When you improve one classification prompt, every classifier in the system benefits.
What is the biggest architectural mistake in LLM system design?
Coupling business logic directly to SDK types and provider-specific APIs. This makes upgrades expensive, provider switching painful, and prompt improvements impossible to roll out consistently.
Is the ports-and-adapters pattern suitable for small AI projects?
Not always. If you have 1 to 2 LLM call sites and a single provider, the abstraction overhead isn’t worth it. This pattern pays off when you have multiple providers, multiple call sites, or expect SDK upgrades over time.

