Integrating AI into Production Applications: Lessons from Real Projects

Practical patterns for adding AI capabilities to production systems, from prompt engineering to cost optimization and error handling.

January 22, 20258 min read·Talha Bilal

Integrating AI into Production Applications: Lessons from Real Projects

Introduction

AI integration has moved from experimental feature to production necessity. Over the past year, I've integrated OpenAI, Claude, and other LLMs into several SaaS applications—from resume parsing to automated ticket triage to content generation.

The gap between a demo and a production AI feature is enormous. In this post, I'll share the patterns, pitfalls, and pragmatic approaches I've learned while shipping AI features to real users.

Architecture Pattern: The AI Service Layer

Don't scatter AI calls throughout your codebase. Create a dedicated service layer:

typescript

1// lib/ai/service.ts
2import OpenAI from "openai";
3
4export class AIService {
5  private client: OpenAI;
6  
7  constructor() {
8    this.client = new OpenAI({
9      apiKey: process.env.OPENAI_API_KEY,
10    });
11  }
12  
13  async complete(prompt: string, options?: CompletionOptions) {
14    // Centralized logging, error handling, retries
15    return this.client.chat.completions.create({
16      model: options?.model || "gpt-4o-mini",
17      messages: [{ role: "user", content: prompt }],
18      temperature: options?.temperature || 0.7,
19      max_tokens: options?.maxTokens || 1000,
20    });
21  }
22}
23
24export const ai = new AIService();

Benefits:

Single place to update model versions
Centralized cost tracking and logging
Consistent error handling across all AI calls
Easy to mock for testing

Prompt Engineering Patterns

Good prompts make the difference between a flaky feature and a reliable one.

1. System Message for Context

typescript

1async function analyzeResume(resumeText: string) {
2  const response = await ai.client.chat.completions.create({
3    model: "gpt-4o",
4    messages: [
5      {
6        role: "system",
7        content: `You are an expert resume analyzer. Extract key information and return it as structured JSON. Be precise and only extract information that is explicitly stated.`,
8      },
9      {
10        role: "user",
11        content: `Analyze this resume:\n\n${resumeText}`,
12      },
13    ],
14    response_format: { type: "json_object" },
15  });
16  
17  return JSON.parse(response.choices[0].message.content);
18}

Key principle: System messages set behavior, user messages provide data.

2. Few-Shot Prompting

For complex extraction tasks, show examples:

typescript

1const prompt = `Extract skills from job descriptions. Return as a JSON array.
2
3Examples:
4Input: "Looking for a React developer with TypeScript experience"
5Output: ["React", "TypeScript"]
6
7Input: "Senior backend engineer. Must know Python, Django, PostgreSQL."
8Output: ["Python", "Django", "PostgreSQL"]
9
10Now extract from this:
11"${jobDescription}"`;

3. Structured Output with JSON Schema

Use OpenAI's structured output feature for reliable parsing:

typescript

1const response = await ai.client.chat.completions.create({
2  model: "gpt-4o-mini",
3  messages: [
4    {
5      role: "system",
6      content: "Extract candidate information from resumes.",
7    },
8    {
9      role: "user",
10      content: resumeText,
11    },
12  ],
13  response_format: {
14    type: "json_schema",
15    json_schema: {
16      name: "candidate_info",
17      schema: {
18        type: "object",
19        properties: {
20          name: { type: "string" },
21          email: { type: "string" },
22          skills: {
23            type: "array",
24            items: { type: "string" },
25          },
26          experience: {
27            type: "array",
28            items: {
29              type: "object",
30              properties: {
31                company: { type: "string" },
32                role: { type: "string" },
33                duration: { type: "string" },
34              },
35              required: ["company", "role"],
36            },
37          },
38        },
39        required: ["name", "skills"],
40      },
41    },
42  },
43});

This guarantees valid JSON in the response shape you need.

Cost Optimization

AI costs add up fast. Here's how to keep them under control:

1. Model Selection Strategy

typescript

1type TaskComplexity = "simple" | "moderate" | "complex";
2
3function selectModel(complexity: TaskComplexity): string {
4  switch (complexity) {
5    case "simple":
6      return "gpt-4o-mini"; // $0.15/$0.60 per 1M tokens
7    case "moderate":
8      return "gpt-4o"; // $2.50/$10 per 1M tokens
9    case "complex":
10      return "o1-preview"; // More expensive but higher quality
11  }
12}
13
14// Example usage
15async function categorizeTicket(title: string, description: string) {
16  const response = await ai.client.chat.completions.create({
17    model: selectModel("simple"), // Categorization is simple
18    messages: [
19      {
20        role: "system",
21        content: "Categorize this support ticket into: Bug, Feature Request, or Question.",
22      },
23      {
24        role: "user",
25        content: `Title: ${title}\nDescription: ${description}`,
26      },
27    ],
28  });
29  
30  return response.choices[0].message.content;
31}

Rule of thumb:

Use gpt-4o-mini for classification, simple extraction, basic summarization
Use gpt-4o for complex reasoning, long documents, creative content
Use o1-* models only when you need advanced reasoning

2. Caching AI Responses

Cache AI results aggressively:

typescript

1async function summarizeDocument(documentId: string, text: string) {
2  const cacheKey = `summary:${documentId}:${hashContent(text)}`;
3  
4  // Check cache first
5  const cached = await redis.get(cacheKey);
6  if (cached) {
7    logger.info("AI cache hit", { documentId });
8    return cached;
9  }
10  
11  // Cache miss: call AI
12  const response = await ai.complete(
13    `Summarize this document in 3 bullet points:\n\n${text}`
14  );
15  
16  const summary = response.choices[0].message.content;
17  
18  // Cache for 30 days
19  await redis.setex(cacheKey, 30 * 24 * 60 * 60, summary);
20  
21  logger.info("AI cache miss", { documentId, cost: estimateCost(response.usage) });
22  
23  return summary;
24}

3. Token Limiting

Prevent runaway costs with max_tokens:

typescript

1async function generateJobDescription(input: string) {
2  const response = await ai.client.chat.completions.create({
3    model: "gpt-4o-mini",
4    messages: [
5      {
6        role: "system",
7        content: "Generate a job description based on the input.",
8      },
9      {
10        role: "user",
11        content: input,
12      },
13    ],
14    max_tokens: 500, // Limit output length
15  });
16  
17  return response.choices[0].message.content;
18}

Error Handling and Reliability

AI APIs fail. Your code must handle it gracefully.

1. Retry with Exponential Backoff

typescript

1import { retry } from "@/lib/retry";
2
3async function robustAICall(prompt: string) {
4  return retry(
5    async () => {
6      return await ai.complete(prompt);
7    },
8    {
9      retries: 3,
10      minTimeout: 1000,
11      factor: 2,
12      onRetry: (err, attempt) => {
13        logger.warn("AI call retry", { attempt, error: err.message });
14      },
15    }
16  );
17}

2. Fallback Strategies

Always have a fallback:

typescript

1async function categorizeTicket(ticket: Ticket): Promise<string> {
2  try {
3    // Try AI categorization
4    const category = await ai.categorize(ticket.description);
5    return category;
6  } catch (error) {
7    logger.error("AI categorization failed", { ticketId: ticket.id, error });
8    
9    // Fallback to rule-based categorization
10    return ruleBased Categorization(ticket);
11  }
12}
13
14function ruleBasedCategorization(ticket: Ticket): string {
15  const desc = ticket.description.toLowerCase();
16  
17  if (desc.includes("bug") || desc.includes("error")) return "Bug";
18  if (desc.includes("feature") || desc.includes("add")) return "Feature Request";
19  return "Question";
20}

3. Timeout Protection

Don't let AI calls hang forever:

typescript

1async function aiWithTimeout<T>(
2  promise: Promise<T>,
3  timeoutMs: number = 30000
4): Promise<T> {
5  const timeout = new Promise<never>((_, reject) =>
6    setTimeout(() => reject(new Error("AI call timeout")), timeoutMs)
7  );
8  
9  return Promise.race([promise, timeout]);
10}
11
12// Usage
13const result = await aiWithTimeout(
14  ai.complete(prompt),
15  15000 // 15 second timeout
16);

Streaming Responses for Better UX

For long-form generation, stream tokens to the user:

typescript

1export async function POST(req: Request) {
2  const { prompt } = await req.json();
3  
4  const stream = await ai.client.chat.completions.create({
5    model: "gpt-4o",
6    messages: [{ role: "user", content: prompt }],
7    stream: true,
8  });
9  
10  // Stream tokens back to client
11  const encoder = new TextEncoder();
12  const readable = new ReadableStream({
13    async start(controller) {
14      for await (const chunk of stream) {
15        const text = chunk.choices[0]?.delta?.content || "";
16        controller.enqueue(encoder.encode(text));
17      }
18      controller.close();
19    },
20  });
21  
22  return new Response(readable, {
23    headers: { "Content-Type": "text/plain; charset=utf-8" },
24  });
25}

Client-side handling:

typescript

1async function streamAIResponse(prompt: string) {
2  const response = await fetch("/api/ai/generate", {
3    method: "POST",
4    body: JSON.stringify({ prompt }),
5  });
6  
7  const reader = response.body?.getReader();
8  const decoder = new TextDecoder();
9  
10  let fullText = "";
11  
12  while (true) {
13    const { done, value } = await reader.read();
14    if (done) break;
15    
16    const chunk = decoder.decode(value);
17    fullText += chunk;
18    
19    // Update UI with each chunk
20    updateUI(fullText);
21  }
22}

Cost Monitoring and Alerts

Track AI spending in real-time:

typescript

1async function logAIUsage(
2  operation: string,
3  usage: {
4    prompt_tokens: number;
5    completion_tokens: number;
6    total_tokens: number;
7  },
8  model: string
9) {
10  const cost = calculateCost(usage, model);
11  
12  await db.aiLog.create({
13    data: {
14      operation,
15      model,
16      promptTokens: usage.prompt_tokens,
17      completionTokens: usage.completion_tokens,
18      totalTokens: usage.total_tokens,
19      cost,
20      timestamp: new Date(),
21    },
22  });
23  
24  // Alert if daily spend exceeds threshold
25  const todaySpend = await getDailySpend();
26  if (todaySpend > 100) {
27    await sendAlert(`AI daily spend: $${todaySpend}`);
28  }
29}
30
31function calculateCost(
32  usage: { prompt_tokens: number; completion_tokens: number },
33  model: string
34): number {
35  const rates = {
36    "gpt-4o-mini": { input: 0.15, output: 0.60 }, // per 1M tokens
37    "gpt-4o": { input: 2.5, output: 10.0 },
38  };
39  
40  const rate = rates[model] || rates["gpt-4o"];
41  
42  return (
43    (usage.prompt_tokens / 1_000_000) * rate.input +
44    (usage.completion_tokens / 1_000_000) * rate.output
45  );
46}

Testing AI Features

Testing AI is hard because outputs are non-deterministic. Here's my approach:

1. Test the Integration, Not the AI

typescript

1describe("AI Resume Parser", () => {
2  it("should call OpenAI with correct parameters", async () => {
3    const mockComplete = jest.spyOn(ai, "complete");
4    
5    await parseResume("sample resume text");
6    
7    expect(mockComplete).toHaveBeenCalledWith(
8      expect.stringContaining("sample resume text"),
9      expect.objectContaining({ model: "gpt-4o" })
10    );
11  });
12  
13  it("should handle AI errors gracefully", async () => {
14    jest.spyOn(ai, "complete").mockRejectedValue(new Error("API error"));
15    
16    const result = await parseResume("text");
17    
18    expect(result.error).toBeDefined();
19    expect(result.fallback).toBe(true);
20  });
21});

2. Use Fixtures for AI Responses

typescript

1const MOCK_AI_RESPONSE = {
2  id: "chatcmpl-123",
3  choices: [
4    {
5      message: {
6        role: "assistant",
7        content: JSON.stringify({
8          name: "John Doe",
9          skills: ["JavaScript", "React", "Node.js"],
10        }),
11      },
12      finish_reason: "stop",
13    },
14  ],
15  usage: { prompt_tokens: 50, completion_tokens: 30, total_tokens: 80 },
16};
17
18describe("Resume Analysis", () => {
19  it("should parse AI response correctly", async () => {
20    jest.spyOn(ai.client.chat.completions, "create")
21      .mockResolvedValue(MOCK_AI_RESPONSE);
22    
23    const result = await analyzeResume("resume text");
24    
25    expect(result.name).toBe("John Doe");
26    expect(result.skills).toHaveLength(3);
27  });
28});

Key Takeaways

Create an AI service layer - centralize API calls, logging, and error handling
Choose the right model - use cheaper models for simple tasks
Cache aggressively - same input = same output, so cache it
Handle failures gracefully - always have a fallback strategy
Stream for better UX - don't make users wait for long generations
Monitor costs - set up alerts before your bill explodes
Test the integration - you can't test AI output, but you can test your code

AI is a powerful tool, but it requires careful engineering to work reliably in production. These patterns have helped me ship AI features that users trust and that don't break the bank.

Building AI-powered features? Let's chat through the contact form.

Continue reading

NextBackend Architecture for Modern SaaS Applications

Backend Architecture SaaS6 min read

Backend Architecture for Modern SaaS Applications

A deep dive into scalable backend patterns, database design, and API architecture that power production SaaS platforms.

January 15, 2025Read more

WebSockets Real-Time Next.js8 min read

Building Real-Time Features with WebSockets in Next.js

How to implement WebSocket communication for live notifications, collaborative editing, and real-time dashboards in modern web applications.

January 10, 2025Read more

Database PostgreSQL Schema Design10 min read

Database Schema Design: Lessons from Production Systems

Practical patterns for designing maintainable, scalable PostgreSQL schemas for SaaS applications with real-world examples.

January 5, 2025Read more

Back to blog

AI OpenAI LLM Production Engineering

Integrating AI into Production Applications: Lessons from Real Projects

Practical patterns for adding AI capabilities to production systems, from prompt engineering to cost optimization and error handling.

January 22, 20258 min read·Talha Bilal

Introduction

The gap between a demo and a production AI feature is enormous. In this post, I'll share the patterns, pitfalls, and pragmatic approaches I've learned while shipping AI features to real users.

Architecture Pattern: The AI Service Layer

Don't scatter AI calls throughout your codebase. Create a dedicated service layer:

typescript

1// lib/ai/service.ts
2import OpenAI from "openai";
3
4export class AIService {
5  private client: OpenAI;
6  
7  constructor() {
8    this.client = new OpenAI({
9      apiKey: process.env.OPENAI_API_KEY,
10    });
11  }
12  
13  async complete(prompt: string, options?: CompletionOptions) {
14    // Centralized logging, error handling, retries
15    return this.client.chat.completions.create({
16      model: options?.model || "gpt-4o-mini",
17      messages: [{ role: "user", content: prompt }],
18      temperature: options?.temperature || 0.7,
19      max_tokens: options?.maxTokens || 1000,
20    });
21  }
22}
23
24export const ai = new AIService();

Benefits:

Single place to update model versions
Centralized cost tracking and logging
Consistent error handling across all AI calls
Easy to mock for testing

Prompt Engineering Patterns

Good prompts make the difference between a flaky feature and a reliable one.

1. System Message for Context

typescript

1async function analyzeResume(resumeText: string) {
2  const response = await ai.client.chat.completions.create({
3    model: "gpt-4o",
4    messages: [
5      {
6        role: "system",
7        content: `You are an expert resume analyzer. Extract key information and return it as structured JSON. Be precise and only extract information that is explicitly stated.`,
8      },
9      {
10        role: "user",
11        content: `Analyze this resume:\n\n${resumeText}`,
12      },
13    ],
14    response_format: { type: "json_object" },
15  });
16  
17  return JSON.parse(response.choices[0].message.content);
18}

Key principle: System messages set behavior, user messages provide data.

2. Few-Shot Prompting

For complex extraction tasks, show examples:

typescript

1const prompt = `Extract skills from job descriptions. Return as a JSON array.
2
3Examples:
4Input: "Looking for a React developer with TypeScript experience"
5Output: ["React", "TypeScript"]
6
7Input: "Senior backend engineer. Must know Python, Django, PostgreSQL."
8Output: ["Python", "Django", "PostgreSQL"]
9
10Now extract from this:
11"${jobDescription}"`;

3. Structured Output with JSON Schema

Use OpenAI's structured output feature for reliable parsing:

typescript

1const response = await ai.client.chat.completions.create({
2  model: "gpt-4o-mini",
3  messages: [
4    {
5      role: "system",
6      content: "Extract candidate information from resumes.",
7    },
8    {
9      role: "user",
10      content: resumeText,
11    },
12  ],
13  response_format: {
14    type: "json_schema",
15    json_schema: {
16      name: "candidate_info",
17      schema: {
18        type: "object",
19        properties: {
20          name: { type: "string" },
21          email: { type: "string" },
22          skills: {
23            type: "array",
24            items: { type: "string" },
25          },
26          experience: {
27            type: "array",
28            items: {
29              type: "object",
30              properties: {
31                company: { type: "string" },
32                role: { type: "string" },
33                duration: { type: "string" },
34              },
35              required: ["company", "role"],
36            },
37          },
38        },
39        required: ["name", "skills"],
40      },
41    },
42  },
43});

This guarantees valid JSON in the response shape you need.

Cost Optimization

AI costs add up fast. Here's how to keep them under control:

1. Model Selection Strategy

typescript

1type TaskComplexity = "simple" | "moderate" | "complex";
2
3function selectModel(complexity: TaskComplexity): string {
4  switch (complexity) {
5    case "simple":
6      return "gpt-4o-mini"; // $0.15/$0.60 per 1M tokens
7    case "moderate":
8      return "gpt-4o"; // $2.50/$10 per 1M tokens
9    case "complex":
10      return "o1-preview"; // More expensive but higher quality
11  }
12}
13
14// Example usage
15async function categorizeTicket(title: string, description: string) {
16  const response = await ai.client.chat.completions.create({
17    model: selectModel("simple"), // Categorization is simple
18    messages: [
19      {
20        role: "system",
21        content: "Categorize this support ticket into: Bug, Feature Request, or Question.",
22      },
23      {
24        role: "user",
25        content: `Title: ${title}\nDescription: ${description}`,
26      },
27    ],
28  });
29  
30  return response.choices[0].message.content;
31}

Rule of thumb:

Use gpt-4o-mini for classification, simple extraction, basic summarization
Use gpt-4o for complex reasoning, long documents, creative content
Use o1-* models only when you need advanced reasoning

2. Caching AI Responses

Cache AI results aggressively:

typescript

1async function summarizeDocument(documentId: string, text: string) {
2  const cacheKey = `summary:${documentId}:${hashContent(text)}`;
3  
4  // Check cache first
5  const cached = await redis.get(cacheKey);
6  if (cached) {
7    logger.info("AI cache hit", { documentId });
8    return cached;
9  }
10  
11  // Cache miss: call AI
12  const response = await ai.complete(
13    `Summarize this document in 3 bullet points:\n\n${text}`
14  );
15  
16  const summary = response.choices[0].message.content;
17  
18  // Cache for 30 days
19  await redis.setex(cacheKey, 30 * 24 * 60 * 60, summary);
20  
21  logger.info("AI cache miss", { documentId, cost: estimateCost(response.usage) });
22  
23  return summary;
24}

3. Token Limiting

Prevent runaway costs with max_tokens:

typescript

1async function generateJobDescription(input: string) {
2  const response = await ai.client.chat.completions.create({
3    model: "gpt-4o-mini",
4    messages: [
5      {
6        role: "system",
7        content: "Generate a job description based on the input.",
8      },
9      {
10        role: "user",
11        content: input,
12      },
13    ],
14    max_tokens: 500, // Limit output length
15  });
16  
17  return response.choices[0].message.content;
18}

Error Handling and Reliability

AI APIs fail. Your code must handle it gracefully.

1. Retry with Exponential Backoff

typescript

1import { retry } from "@/lib/retry";
2
3async function robustAICall(prompt: string) {
4  return retry(
5    async () => {
6      return await ai.complete(prompt);
7    },
8    {
9      retries: 3,
10      minTimeout: 1000,
11      factor: 2,
12      onRetry: (err, attempt) => {
13        logger.warn("AI call retry", { attempt, error: err.message });
14      },
15    }
16  );
17}

2. Fallback Strategies

Always have a fallback:

typescript

1async function categorizeTicket(ticket: Ticket): Promise<string> {
2  try {
3    // Try AI categorization
4    const category = await ai.categorize(ticket.description);
5    return category;
6  } catch (error) {
7    logger.error("AI categorization failed", { ticketId: ticket.id, error });
8    
9    // Fallback to rule-based categorization
10    return ruleBased Categorization(ticket);
11  }
12}
13
14function ruleBasedCategorization(ticket: Ticket): string {
15  const desc = ticket.description.toLowerCase();
16  
17  if (desc.includes("bug") || desc.includes("error")) return "Bug";
18  if (desc.includes("feature") || desc.includes("add")) return "Feature Request";
19  return "Question";
20}

3. Timeout Protection

Don't let AI calls hang forever:

typescript

1async function aiWithTimeout<T>(
2  promise: Promise<T>,
3  timeoutMs: number = 30000
4): Promise<T> {
5  const timeout = new Promise<never>((_, reject) =>
6    setTimeout(() => reject(new Error("AI call timeout")), timeoutMs)
7  );
8  
9  return Promise.race([promise, timeout]);
10}
11
12// Usage
13const result = await aiWithTimeout(
14  ai.complete(prompt),
15  15000 // 15 second timeout
16);

Streaming Responses for Better UX

For long-form generation, stream tokens to the user:

typescript

1export async function POST(req: Request) {
2  const { prompt } = await req.json();
3  
4  const stream = await ai.client.chat.completions.create({
5    model: "gpt-4o",
6    messages: [{ role: "user", content: prompt }],
7    stream: true,
8  });
9  
10  // Stream tokens back to client
11  const encoder = new TextEncoder();
12  const readable = new ReadableStream({
13    async start(controller) {
14      for await (const chunk of stream) {
15        const text = chunk.choices[0]?.delta?.content || "";
16        controller.enqueue(encoder.encode(text));
17      }
18      controller.close();
19    },
20  });
21  
22  return new Response(readable, {
23    headers: { "Content-Type": "text/plain; charset=utf-8" },
24  });
25}

Client-side handling:

typescript

1async function streamAIResponse(prompt: string) {
2  const response = await fetch("/api/ai/generate", {
3    method: "POST",
4    body: JSON.stringify({ prompt }),
5  });
6  
7  const reader = response.body?.getReader();
8  const decoder = new TextDecoder();
9  
10  let fullText = "";
11  
12  while (true) {
13    const { done, value } = await reader.read();
14    if (done) break;
15    
16    const chunk = decoder.decode(value);
17    fullText += chunk;
18    
19    // Update UI with each chunk
20    updateUI(fullText);
21  }
22}

Cost Monitoring and Alerts

Track AI spending in real-time:

typescript

1async function logAIUsage(
2  operation: string,
3  usage: {
4    prompt_tokens: number;
5    completion_tokens: number;
6    total_tokens: number;
7  },
8  model: string
9) {
10  const cost = calculateCost(usage, model);
11  
12  await db.aiLog.create({
13    data: {
14      operation,
15      model,
16      promptTokens: usage.prompt_tokens,
17      completionTokens: usage.completion_tokens,
18      totalTokens: usage.total_tokens,
19      cost,
20      timestamp: new Date(),
21    },
22  });
23  
24  // Alert if daily spend exceeds threshold
25  const todaySpend = await getDailySpend();
26  if (todaySpend > 100) {
27    await sendAlert(`AI daily spend: $${todaySpend}`);
28  }
29}
30
31function calculateCost(
32  usage: { prompt_tokens: number; completion_tokens: number },
33  model: string
34): number {
35  const rates = {
36    "gpt-4o-mini": { input: 0.15, output: 0.60 }, // per 1M tokens
37    "gpt-4o": { input: 2.5, output: 10.0 },
38  };
39  
40  const rate = rates[model] || rates["gpt-4o"];
41  
42  return (
43    (usage.prompt_tokens / 1_000_000) * rate.input +
44    (usage.completion_tokens / 1_000_000) * rate.output
45  );
46}

Testing AI Features

Testing AI is hard because outputs are non-deterministic. Here's my approach:

1. Test the Integration, Not the AI

typescript

1describe("AI Resume Parser", () => {
2  it("should call OpenAI with correct parameters", async () => {
3    const mockComplete = jest.spyOn(ai, "complete");
4    
5    await parseResume("sample resume text");
6    
7    expect(mockComplete).toHaveBeenCalledWith(
8      expect.stringContaining("sample resume text"),
9      expect.objectContaining({ model: "gpt-4o" })
10    );
11  });
12  
13  it("should handle AI errors gracefully", async () => {
14    jest.spyOn(ai, "complete").mockRejectedValue(new Error("API error"));
15    
16    const result = await parseResume("text");
17    
18    expect(result.error).toBeDefined();
19    expect(result.fallback).toBe(true);
20  });
21});

2. Use Fixtures for AI Responses

typescript

1const MOCK_AI_RESPONSE = {
2  id: "chatcmpl-123",
3  choices: [
4    {
5      message: {
6        role: "assistant",
7        content: JSON.stringify({
8          name: "John Doe",
9          skills: ["JavaScript", "React", "Node.js"],
10        }),
11      },
12      finish_reason: "stop",
13    },
14  ],
15  usage: { prompt_tokens: 50, completion_tokens: 30, total_tokens: 80 },
16};
17
18describe("Resume Analysis", () => {
19  it("should parse AI response correctly", async () => {
20    jest.spyOn(ai.client.chat.completions, "create")
21      .mockResolvedValue(MOCK_AI_RESPONSE);
22    
23    const result = await analyzeResume("resume text");
24    
25    expect(result.name).toBe("John Doe");
26    expect(result.skills).toHaveLength(3);
27  });
28});

Key Takeaways

Create an AI service layer - centralize API calls, logging, and error handling
Choose the right model - use cheaper models for simple tasks
Cache aggressively - same input = same output, so cache it
Handle failures gracefully - always have a fallback strategy
Stream for better UX - don't make users wait for long generations
Monitor costs - set up alerts before your bill explodes
Test the integration - you can't test AI output, but you can test your code

AI is a powerful tool, but it requires careful engineering to work reliably in production. These patterns have helped me ship AI features that users trust and that don't break the bank.

Building AI-powered features? Let's chat through the contact form.

Continue reading

NextBackend Architecture for Modern SaaS Applications

Backend Architecture SaaS6 min read

Backend Architecture for Modern SaaS Applications

A deep dive into scalable backend patterns, database design, and API architecture that power production SaaS platforms.

January 15, 2025Read more

WebSockets Real-Time Next.js8 min read

Building Real-Time Features with WebSockets in Next.js

How to implement WebSocket communication for live notifications, collaborative editing, and real-time dashboards in modern web applications.

January 10, 2025Read more

Database PostgreSQL Schema Design10 min read

Database Schema Design: Lessons from Production Systems

Practical patterns for designing maintainable, scalable PostgreSQL schemas for SaaS applications with real-world examples.

January 5, 2025Read more

Integrating AI into Production Applications: Lessons from Real Projects

Continue reading

Related Articles

Backend Architecture for Modern SaaS Applications

Building Real-Time Features with WebSockets in Next.js

Database Schema Design: Lessons from Production Systems

Integrating AI into Production Applications: Lessons from Real Projects

Continue reading

Related Articles

Backend Architecture for Modern SaaS Applications

Building Real-Time Features with WebSockets in Next.js

Database Schema Design: Lessons from Production Systems