Back to blog
Integrating AI into Production Applications: Lessons from Real Projects
Practical patterns for adding AI capabilities to production systems, from prompt engineering to cost optimization and error handling.
8 min read·Talha Bilal
Share:

Introduction
AI integration has moved from experimental feature to production necessity. Over the past year, I've integrated OpenAI, Claude, and other LLMs into several SaaS applications—from resume parsing to automated ticket triage to content generation.
The gap between a demo and a production AI feature is enormous. In this post, I'll share the patterns, pitfalls, and pragmatic approaches I've learned while shipping AI features to real users.
Architecture Pattern: The AI Service Layer
Don't scatter AI calls throughout your codebase. Create a dedicated service layer:
typescript
1// lib/ai/service.ts2import OpenAI from "openai";3
4export class AIService {5 private client: OpenAI;6 7 constructor() {8 this.client = new OpenAI({9 apiKey: process.env.OPENAI_API_KEY,10 });11 }12 13 async complete(prompt: string, options?: CompletionOptions) {14 // Centralized logging, error handling, retries15 return this.client.chat.completions.create({16 model: options?.model || "gpt-4o-mini",17 messages: [{ role: "user", content: prompt }],18 temperature: options?.temperature || 0.7,19 max_tokens: options?.maxTokens || 1000,20 });21 }22}23
24export const ai = new AIService();Benefits:
- Single place to update model versions
- Centralized cost tracking and logging
- Consistent error handling across all AI calls
- Easy to mock for testing
Prompt Engineering Patterns
Good prompts make the difference between a flaky feature and a reliable one.
1. System Message for Context
typescript
1async function analyzeResume(resumeText: string) {2 const response = await ai.client.chat.completions.create({3 model: "gpt-4o",4 messages: [5 {6 role: "system",7 content: `You are an expert resume analyzer. Extract key information and return it as structured JSON. Be precise and only extract information that is explicitly stated.`,8 },9 {10 role: "user",11 content: `Analyze this resume:\n\n${resumeText}`,12 },13 ],14 response_format: { type: "json_object" },15 });16 17 return JSON.parse(response.choices[0].message.content);18}Key principle: System messages set behavior, user messages provide data.
2. Few-Shot Prompting
For complex extraction tasks, show examples:
typescript
1const prompt = `Extract skills from job descriptions. Return as a JSON array.2
3Examples:4Input: "Looking for a React developer with TypeScript experience"5Output: ["React", "TypeScript"]6
7Input: "Senior backend engineer. Must know Python, Django, PostgreSQL."8Output: ["Python", "Django", "PostgreSQL"]9
10Now extract from this:11"${jobDescription}"`;3. Structured Output with JSON Schema
Use OpenAI's structured output feature for reliable parsing:
typescript
1const response = await ai.client.chat.completions.create({2 model: "gpt-4o-mini",3 messages: [4 {5 role: "system",6 content: "Extract candidate information from resumes.",7 },8 {9 role: "user",10 content: resumeText,11 },12 ],13 response_format: {14 type: "json_schema",15 json_schema: {16 name: "candidate_info",17 schema: {18 type: "object",19 properties: {20 name: { type: "string" },21 email: { type: "string" },22 skills: {23 type: "array",24 items: { type: "string" },25 },26 experience: {27 type: "array",28 items: {29 type: "object",30 properties: {31 company: { type: "string" },32 role: { type: "string" },33 duration: { type: "string" },34 },35 required: ["company", "role"],36 },37 },38 },39 required: ["name", "skills"],40 },41 },42 },43});This guarantees valid JSON in the response shape you need.
Cost Optimization
AI costs add up fast. Here's how to keep them under control:
1. Model Selection Strategy
typescript
1type TaskComplexity = "simple" | "moderate" | "complex";2
3function selectModel(complexity: TaskComplexity): string {4 switch (complexity) {5 case "simple":6 return "gpt-4o-mini"; // $0.15/$0.60 per 1M tokens7 case "moderate":8 return "gpt-4o"; // $2.50/$10 per 1M tokens9 case "complex":10 return "o1-preview"; // More expensive but higher quality11 }12}13
14// Example usage15async function categorizeTicket(title: string, description: string) {16 const response = await ai.client.chat.completions.create({17 model: selectModel("simple"), // Categorization is simple18 messages: [19 {20 role: "system",21 content: "Categorize this support ticket into: Bug, Feature Request, or Question.",22 },23 {24 role: "user",25 content: `Title: ${title}\nDescription: ${description}`,26 },27 ],28 });29 30 return response.choices[0].message.content;31}Rule of thumb:
- Use
gpt-4o-minifor classification, simple extraction, basic summarization - Use
gpt-4ofor complex reasoning, long documents, creative content - Use
o1-*models only when you need advanced reasoning
2. Caching AI Responses
Cache AI results aggressively:
typescript
1async function summarizeDocument(documentId: string, text: string) {2 const cacheKey = `summary:${documentId}:${hashContent(text)}`;3 4 // Check cache first5 const cached = await redis.get(cacheKey);6 if (cached) {7 logger.info("AI cache hit", { documentId });8 return cached;9 }10 11 // Cache miss: call AI12 const response = await ai.complete(13 `Summarize this document in 3 bullet points:\n\n${text}`14 );15 16 const summary = response.choices[0].message.content;17 18 // Cache for 30 days19 await redis.setex(cacheKey, 30 * 24 * 60 * 60, summary);20 21 logger.info("AI cache miss", { documentId, cost: estimateCost(response.usage) });22 23 return summary;24}3. Token Limiting
Prevent runaway costs with max_tokens:
typescript
1async function generateJobDescription(input: string) {2 const response = await ai.client.chat.completions.create({3 model: "gpt-4o-mini",4 messages: [5 {6 role: "system",7 content: "Generate a job description based on the input.",8 },9 {10 role: "user",11 content: input,12 },13 ],14 max_tokens: 500, // Limit output length15 });16 17 return response.choices[0].message.content;18}Error Handling and Reliability
AI APIs fail. Your code must handle it gracefully.
1. Retry with Exponential Backoff
typescript
1import { retry } from "@/lib/retry";2
3async function robustAICall(prompt: string) {4 return retry(5 async () => {6 return await ai.complete(prompt);7 },8 {9 retries: 3,10 minTimeout: 1000,11 factor: 2,12 onRetry: (err, attempt) => {13 logger.warn("AI call retry", { attempt, error: err.message });14 },15 }16 );17}2. Fallback Strategies
Always have a fallback:
typescript
1async function categorizeTicket(ticket: Ticket): Promise<string> {2 try {3 // Try AI categorization4 const category = await ai.categorize(ticket.description);5 return category;6 } catch (error) {7 logger.error("AI categorization failed", { ticketId: ticket.id, error });8 9 // Fallback to rule-based categorization10 return ruleBased Categorization(ticket);11 }12}13
14function ruleBasedCategorization(ticket: Ticket): string {15 const desc = ticket.description.toLowerCase();16 17 if (desc.includes("bug") || desc.includes("error")) return "Bug";18 if (desc.includes("feature") || desc.includes("add")) return "Feature Request";19 return "Question";20}3. Timeout Protection
Don't let AI calls hang forever:
typescript
1async function aiWithTimeout<T>(2 promise: Promise<T>,3 timeoutMs: number = 300004): Promise<T> {5 const timeout = new Promise<never>((_, reject) =>6 setTimeout(() => reject(new Error("AI call timeout")), timeoutMs)7 );8 9 return Promise.race([promise, timeout]);10}11
12// Usage13const result = await aiWithTimeout(14 ai.complete(prompt),15 15000 // 15 second timeout16);Streaming Responses for Better UX
For long-form generation, stream tokens to the user:
typescript
1export async function POST(req: Request) {2 const { prompt } = await req.json();3 4 const stream = await ai.client.chat.completions.create({5 model: "gpt-4o",6 messages: [{ role: "user", content: prompt }],7 stream: true,8 });9 10 // Stream tokens back to client11 const encoder = new TextEncoder();12 const readable = new ReadableStream({13 async start(controller) {14 for await (const chunk of stream) {15 const text = chunk.choices[0]?.delta?.content || "";16 controller.enqueue(encoder.encode(text));17 }18 controller.close();19 },20 });21 22 return new Response(readable, {23 headers: { "Content-Type": "text/plain; charset=utf-8" },24 });25}Client-side handling:
typescript
1async function streamAIResponse(prompt: string) {2 const response = await fetch("/api/ai/generate", {3 method: "POST",4 body: JSON.stringify({ prompt }),5 });6 7 const reader = response.body?.getReader();8 const decoder = new TextDecoder();9 10 let fullText = "";11 12 while (true) {13 const { done, value } = await reader.read();14 if (done) break;15 16 const chunk = decoder.decode(value);17 fullText += chunk;18 19 // Update UI with each chunk20 updateUI(fullText);21 }22}Cost Monitoring and Alerts
Track AI spending in real-time:
typescript
1async function logAIUsage(2 operation: string,3 usage: {4 prompt_tokens: number;5 completion_tokens: number;6 total_tokens: number;7 },8 model: string9) {10 const cost = calculateCost(usage, model);11 12 await db.aiLog.create({13 data: {14 operation,15 model,16 promptTokens: usage.prompt_tokens,17 completionTokens: usage.completion_tokens,18 totalTokens: usage.total_tokens,19 cost,20 timestamp: new Date(),21 },22 });23 24 // Alert if daily spend exceeds threshold25 const todaySpend = await getDailySpend();26 if (todaySpend > 100) {27 await sendAlert(`AI daily spend: $${todaySpend}`);28 }29}30
31function calculateCost(32 usage: { prompt_tokens: number; completion_tokens: number },33 model: string34): number {35 const rates = {36 "gpt-4o-mini": { input: 0.15, output: 0.60 }, // per 1M tokens37 "gpt-4o": { input: 2.5, output: 10.0 },38 };39 40 const rate = rates[model] || rates["gpt-4o"];41 42 return (43 (usage.prompt_tokens / 1_000_000) * rate.input +44 (usage.completion_tokens / 1_000_000) * rate.output45 );46}Testing AI Features
Testing AI is hard because outputs are non-deterministic. Here's my approach:
1. Test the Integration, Not the AI
typescript
1describe("AI Resume Parser", () => {2 it("should call OpenAI with correct parameters", async () => {3 const mockComplete = jest.spyOn(ai, "complete");4 5 await parseResume("sample resume text");6 7 expect(mockComplete).toHaveBeenCalledWith(8 expect.stringContaining("sample resume text"),9 expect.objectContaining({ model: "gpt-4o" })10 );11 });12 13 it("should handle AI errors gracefully", async () => {14 jest.spyOn(ai, "complete").mockRejectedValue(new Error("API error"));15 16 const result = await parseResume("text");17 18 expect(result.error).toBeDefined();19 expect(result.fallback).toBe(true);20 });21});2. Use Fixtures for AI Responses
typescript
1const MOCK_AI_RESPONSE = {2 id: "chatcmpl-123",3 choices: [4 {5 message: {6 role: "assistant",7 content: JSON.stringify({8 name: "John Doe",9 skills: ["JavaScript", "React", "Node.js"],10 }),11 },12 finish_reason: "stop",13 },14 ],15 usage: { prompt_tokens: 50, completion_tokens: 30, total_tokens: 80 },16};17
18describe("Resume Analysis", () => {19 it("should parse AI response correctly", async () => {20 jest.spyOn(ai.client.chat.completions, "create")21 .mockResolvedValue(MOCK_AI_RESPONSE);22 23 const result = await analyzeResume("resume text");24 25 expect(result.name).toBe("John Doe");26 expect(result.skills).toHaveLength(3);27 });28});Key Takeaways
- Create an AI service layer - centralize API calls, logging, and error handling
- Choose the right model - use cheaper models for simple tasks
- Cache aggressively - same input = same output, so cache it
- Handle failures gracefully - always have a fallback strategy
- Stream for better UX - don't make users wait for long generations
- Monitor costs - set up alerts before your bill explodes
- Test the integration - you can't test AI output, but you can test your code
AI is a powerful tool, but it requires careful engineering to work reliably in production. These patterns have helped me ship AI features that users trust and that don't break the bank.
Building AI-powered features? Let's chat through the contact form.
Continue reading
Related Articles

Backend Architecture for Modern SaaS Applications
A deep dive into scalable backend patterns, database design, and API architecture that power production SaaS platforms.
Read more

Building Real-Time Features with WebSockets in Next.js
How to implement WebSocket communication for live notifications, collaborative editing, and real-time dashboards in modern web applications.
Read more

Database Schema Design: Lessons from Production Systems
Practical patterns for designing maintainable, scalable PostgreSQL schemas for SaaS applications with real-world examples.
Read more
