Building ChatBot Foundation: A Production-Ready LLM Framework for TypeScript

How I architected a transport-agnostic, plug-and-play chatbot framework with conversation memory, tool calling, and streaming support.

June 30, 202621 min read·Talha Bilal

Building ChatBot Foundation: A Production-Ready LLM Framework for TypeScript

Introduction

Building AI-powered chat systems from scratch means reinventing the same infrastructure repeatedly: conversation memory, prompt management, tool execution, streaming responses, session lifecycle, and error handling. Every project becomes a custom implementation with subtle bugs that only surface in production.

ChatBot Foundation solves this by providing a production-grade, transport-agnostic chatbot framework for TypeScript. Like Auth.js abstracts authentication, ChatBot Foundation abstracts the complexity of building conversational AI systems. You bring your application logic and domain expertise—the framework handles the plumbing.

This case study walks through the architectural decisions, implementation details, and engineering tradeoffs behind building a framework that works equally well for REST APIs, WebSocket servers, CLI tools, and background workers.

Diagram Placeholder

TODO: Add a Mermaid flowchart showing the high-level system architecture: HTTP Layer → Service Layer → AI Layer (Chains, LLM, Memory, Tools, Prompts, Agents).

The Problem

Most chatbot implementations start simple but accumulate complexity:

Conversation memory requires session management, TTL expiry, and sliding window pruning
Tool calling needs schema validation, error handling, and result serialization
Streaming responses demand SSE setup, heartbeat management, and graceful disconnection
Prompt engineering becomes scattered across controllers, mixing business logic with AI primitives
Error handling leaks implementation details into HTTP responses
Configuration lives in dozens of environment variables without validation

Developers end up with tightly coupled code where HTTP concerns bleed into AI logic, making it impossible to reuse the same chatbot in CLI tools or background jobs. Testing becomes painful because everything depends on Express request/response objects.

Goals

The framework needed to achieve five core objectives:

1. Transport Agnosticism

The service layer must have zero HTTP dependencies. A chatbot created with createChatbot() should work identically whether called from an Express controller, WebSocket handler, CLI command, or background worker.

2. Production Readiness

The framework must include everything needed for production: structured logging, graceful shutdown, rate limiting, security headers, environment validation, and error handling that never leaks stack traces.

3. Developer Experience

The API surface should support both zero-config usage (createChatbot() just works) and deep customization (per-request LLM overrides, lifecycle hooks, custom tools, memory configuration).

4. Extensibility

Core architecture must support extensions without modification: RAG pipelines, multi-agent orchestration, Redis memory adapters, and alternative LLM providers should plug in without touching the framework code.

5. Type Safety

Full TypeScript coverage with Zod runtime validation. Invalid configurations should crash on startup, not fail silently in production.

Architecture Overview

The system follows a three-layer architecture with strict separation of concerns:

HTTP Layer

Express controllers, routes, and middleware. Handles rate limiting, request validation, error formatting, and SSE streaming setup. This layer knows about HTTP but nothing about AI primitives.

Service Layer

ChatService orchestrates LLM, memory, and tools. Contains all business logic but has zero HTTP dependencies. Exposes methods like chat(), stream(), getHistory(), and clearSession(). This layer is the integration point for any transport.

AI Layer

LLM factories, prompt templates, conversation memory, tool registry, chains, and agents. These primitives know nothing about HTTP, services, or sessions—they're pure functional components that compose into higher-level abstractions.

This layering makes it trivial to reuse the same chatbot across different transports:

typescript

1// REST endpoint
2app.post("/chat", async (req, res) => {
3  const response = await chatbot.chat(req.body);
4  res.json(response);
5});
6
7// CLI command
8const response = await chatbot.chat({
9  sessionId: "cli-user",
10  message: process.argv[2],
11});
12console.log(response.content);
13
14// Background worker
15queue.process(async (job) => {
16  return await chatbot.chat(job.data);
17});

No changes to chatbot—it doesn't know or care about the transport.

Key Technical Decisions

1. LangChain as Orchestration Layer

Decision: Use LangChain for LLM orchestration rather than calling OpenAI SDK directly.

Rationale:

Provider abstraction: Switching from OpenAI to Anthropic requires changing one import, not rewriting prompt composition
Battle-tested primitives: ChatPromptTemplate, BufferWindowMemory, DynamicStructuredTool handle edge cases we'd otherwise miss
Agent support: LangChain's AgentExecutor provides multi-step reasoning and tool orchestration out of the box
Ecosystem: Extensions like vector store connectors and document loaders are already built

Tradeoff: LangChain adds bundle size and abstraction overhead. We mitigate this by wrapping LangChain with our own factory functions (createLLM, createPrompt) so consumers never import LangChain directly. If we need to swap orchestration layers later, only internal modules change.

2. Zod for Runtime Validation

Decision: Validate all environment variables, request bodies, and tool inputs with Zod schemas.

Rationale:

Fail-fast: Invalid configuration crashes on startup with detailed error messages rather than failing silently in production
Type inference: Zod schemas generate TypeScript types, eliminating manual type definitions
Coercion: Environment variables arrive as strings but need to be numbers/booleans—Zod coerces automatically
Tool safety: LLM-supplied tool arguments are validated against schemas before execution, preventing malformed inputs from bypassing application logic

Implementation: Every public-facing shape has a Zod schema co-located with its TypeScript type in src/types/chat.types.ts. Middleware validates request bodies before controllers run. The ToolRegistry validates tool inputs before calling execute().

3. In-Memory Store with TTL as Default Memory

Decision: Default to an in-memory Map with TTL timers rather than requiring Redis.

Rationale:

Zero dependencies: Framework works immediately without external services
Development simplicity: Local development doesn't need Docker or Redis setup
Sufficient for many use cases: Applications with < 10K concurrent sessions fit comfortably in memory
Easy upgrade path: MemoryStore interface is defined—implementing Redis means writing one adapter class

Tradeoff: In-memory storage doesn't survive restarts and doesn't scale horizontally. For production systems with high session counts, the Redis adapter should be implemented (interface is already defined, stub is documented).

Memory Management:

Sliding window pruning: Oldest messages dropped first when maxMessages is exceeded
TTL expiry: Sessions auto-delete after SESSION_TTL_SECONDS of inactivity
Lazy hydration: LangChain BufferWindowMemory instances are created on first access and hydrated from persistent store

4. Streaming via Server-Sent Events

Decision: Use SSE for streaming responses rather than WebSockets.

Rationale:

Simpler protocol: SSE is unidirectional—client receives events, no need to handle client messages
Built-in reconnection: Browsers automatically reconnect dropped SSE connections
HTTP-friendly: Works through standard HTTP proxies and load balancers
Native API: EventSource API is built into browsers, no library needed

Implementation: The controller sets SSE headers, starts a 15-second heartbeat to keep the connection alive, and iterates over chatService.stream() (an async generator). Each chunk is written as data: {JSON}\n\n. Connection cleanup happens automatically on client disconnect via the close event.

5. Lifecycle Hooks for Cross-Cutting Concerns

Decision: Support beforeChat and afterChat hooks rather than baking auth/logging/moderation into the framework.

Rationale:

Separation of concerns: Authentication, rate limiting, content moderation, cost tracking, and analytics are application-specific—they don't belong in a chatbot framework
Composability: Hooks can be layered (auth → logging → moderation) without modifying framework code
Testability: Tests can inject mock hooks or skip them entirely

Usage:

typescript

1const chatbot = createChatbot({
2  hooks: {
3    beforeChat: async (req) => {
4      // Auth, rate limit checks, input sanitization
5      if (!await isAuthorized(req.sessionId)) {
6        throw new ChatError("UNAUTHORIZED", "Invalid session");
7      }
8      return req;
9    },
10    afterChat: async (res) => {
11      // Log tokens, track costs, run content filters
12      await logTokenUsage(res.usage);
13      return res;
14    },
15  },
16});

Core Features

Conversation Memory

Session-based message history with configurable retention and automatic expiry:

typescript

1const chatbot = createChatbot({ memory: true });
2
3await chatbot.chat({ 
4  sessionId: "user-123", 
5  message: "My name is Alice." 
6});
7
8// Later turn—chatbot remembers context
9await chatbot.chat({ 
10  sessionId: "user-123", 
11  message: "What's my name?" 
12});
13// → "Your name is Alice."

Under the hood:

MemoryManager lazily creates a LangChain BufferWindowMemory for each session
Memory is hydrated from the persistent store on first access (supports restarts and horizontal scaling)
After each exchange, user + assistant messages are saved via saveContext()
Raw messages are persisted to the backing store for history retrieval
Sliding window pruning keeps the last maxMessages
TTL timer resets on every access—idle sessions expire automatically

Dynamic Tool Calling

typescript

1import { z } from "zod";
2
3const weatherTool = {
4  name: "getWeather",
5  description: "Get current weather for a city.",
6  schema: z.object({
7    city: z.string(),
8    units: z.enum(["celsius", "fahrenheit"]).default("celsius"),
9  }),
10  execute: async ({ city, units }) => {
11    const data = await weatherAPI.fetch(city);
12    return { city, temperature: data.temp, units };
13  },
14};
15
16const chatbot = createChatbot({ tools: [weatherTool] });
17
18await chatbot.chat({
19  sessionId: "user-456",
20  message: "What's the weather in Tokyo?",
21});
22// LLM calls getWeather({ city: "Tokyo", units: "celsius" })
23// → "The weather in Tokyo is 22°C, partly cloudy."

Safety: Every tool input is validated against its Zod schema before execute() runs. Invalid inputs return a ToolResult with an error field—the tool never throws, preventing LLM errors from crashing the application.

Streaming Responses

Async generator pattern for low-latency UX:

typescript

1for await (const chunk of chatbot.stream({
2  sessionId: "user-789",
3  message: "Explain microservices.",
4})) {
5  if (chunk.type === "delta") process.stdout.write(chunk.content);
6  if (chunk.type === "done") console.log("\n✅ Done");
7  if (chunk.type === "error") console.error(chunk.error);
8}

Controller implementation: The SSE endpoint sets Content-Type: text/event-stream, starts a heartbeat to prevent proxy timeouts, and writes each chunk as data: {JSON}\n\n. The controller never buffers the full response—chunks are flushed immediately.

Per-Request LLM Overrides

Customize model, temperature, and system prompt per call:

typescript

1await chatbot.chat({
2  sessionId: "creative-session",
3  message: "Write a haiku about TypeScript.",
4  options: {
5    model: "gpt-4o",
6    temperature: 1.2,
7    systemPrompt: "You are a creative poet who loves programming.",
8  },
9});

Use cases:

A/B testing different models
User preference for verbose vs. concise responses
High temperature for creative tasks, low temperature for factual queries

Domain-Specific System Prompts

Built-in templates for common use cases:

typescript

1import { SYSTEM_PROMPTS, buildSystemPrompt } from "./src/exports";
2
3// Customer support chatbot
4const supportBot = createChatbot({
5  systemPrompt: buildSystemPrompt("customerSupport", {
6    company: "TechCorp",
7    additionalInstructions: "Always mention our 30-day return policy.",
8  }),
9});
10
11// Coding assistant
12const codingBot = createChatbot({
13  systemPrompt: SYSTEM_PROMPTS.codingAssistant,
14  llm: { temperature: 0.2 },
15});
16
17// RAG-augmented assistant
18const ragBot = createChatbot({
19  systemPrompt: SYSTEM_PROMPTS.ragAssistant,
20});

Available prompts:

general — Sensible default for most applications
customerSupport — Polite, solution-focused, escalation-aware
codingAssistant — Opinionated, pragmatic, proactive about edge cases
dataAnalyst — Quantitative, precise, visualization-aware
ragAssistant — Cite sources, stay grounded in provided context

Diagram Placeholder

TODO: Add a Mermaid sequence diagram showing the full request lifecycle: Client → Controller → Service → Chain → LLM → Memory → Response.

Implementation Highlights

1. Environment Validation

All configuration lives in src/config/env.ts and is validated with Zod on startup:

typescript

1const EnvSchema = z.object({
2  OPENAI_API_KEY: z.string().min(1, "OPENAI_API_KEY is required"),
3  PORT: z.coerce.number().default(3000),
4  DEFAULT_MODEL: z.string().default("gpt-4.1-mini"),
5  MEMORY_MAX_MESSAGES: z.coerce.number().positive().default(50),
6  // ... 15+ more variables
7});
8
9export const env = EnvSchema.parse(process.env);

Behavior: If validation fails, the application crashes immediately with a detailed error message listing every invalid variable. This is intentional—misconfiguration should never silently degrade into production bugs.

2. Structured Error Handling

All errors are wrapped in ChatError with semantic codes:

typescript

1export class ChatError extends Error {
2  constructor(
3    public readonly code: ChatErrorCode,
4    message: string,
5    public readonly cause?: unknown
6  ) {
7    super(message);
8    this.name = "ChatError";
9  }
10}

Error codes map to HTTP status:

INVALID_REQUEST → 400
UNAUTHORIZED → 401
RATE_LIMITED → 429
LLM_ERROR, TOOL_ERROR → 502 (upstream failure)
MEMORY_ERROR, INTERNAL_ERROR → 500

Global error handler:

typescript

1export function errorHandler(err, req, res, next) {
2  if (err instanceof ChatError) {
3    const status = ERROR_STATUS_MAP[err.code] ?? 500;
4    res.status(status).json({
5      success: false,
6      error: err.message,
7      code: err.code,
8      ...(isProd ? {} : { stack: err.stack }),
9    });
10    return;
11  }
12  // Handle other error types...
13}

Stack traces only appear in development. Production responses never leak implementation details.

3. Tool Registry Pattern

Tools are stored in a Map for O(1) lookup and converted to LangChain DynamicStructuredTool instances on demand:

typescript

1export class ToolRegistry {
2  private readonly tools = new Map<string, ToolDefinition>();
3
4  register<TInput>(tool: ToolDefinition<TInput>): this {
5    this.tools.set(tool.name, tool);
6    return this; // Fluent API
7  }
8
9  async execute(name: string, rawInput: unknown): Promise<ToolResult> {
10    const tool = this.tools.get(name);
11    if (!tool) {
12      return { toolName: name, output: null, error: "Tool not found" };
13    }
14
15    const parsed = tool.schema.safeParse(rawInput);
16    if (!parsed.success) {
17      return { toolName: name, output: null, error: parsed.error.message };
18    }
19
20    try {
21      const output = await tool.execute(parsed.data);
22      return { toolName: name, output, latencyMs: Date.now() - start };
23    } catch (err) {
24      return { toolName: name, output: null, error: err.message };
25    }
26  }
27}

Key design:

Never throws—always returns ToolResult with error field
Schema validation happens before execution
Execution latency is tracked automatically
Errors are captured and returned to LLM as tool output

4. Prompt Composition

Prompts are built with LangChain's ChatPromptTemplate but wrapped in factory functions:

typescript

1export function createPrompt(config: PromptConfig): ChatPromptTemplate {
2  return ChatPromptTemplate.fromMessages([
3    SystemMessagePromptTemplate.fromTemplate(config.system),
4    new MessagesPlaceholder("history"), // Injected by memory manager
5    HumanMessagePromptTemplate.fromTemplate(config.user ?? "{input}"),
6  ]);
7}

Variants:

createStatelessPrompt() — No history placeholder (for single-turn chains)
createFewShotPrompt() — Prepends example exchanges for classification tasks

Variable injection: Prompts use {variable} syntax. Variables are supplied at invocation time:

typescript

1const prompt = createPrompt({
2  system: "You are a {role} specializing in {domain}.",
3});
4
5await chain.invoke({ 
6  input: "user message", 
7  role: "architect", 
8  domain: "distributed systems" 
9});

5. Chain Composition

ChatChain wraps the LangChain pipeline with domain-specific logic:

typescript

1export class ChatChain {
2  async invoke(request: ChatRequest): Promise<ChatResponse> {
3    // 1. Resolve LLM (merges defaults + per-request overrides)
4    const llm = createLLM({
5      ...this.config.llmConfig,
6      ...request.options,
7    });
8
9    // 2. Build prompt
10    const prompt = createPrompt({ system: this.config.systemPrompt });
11
12    // 3. Fetch history
13    const memory = await this.memoryManager.getLangChainMemory(request.sessionId);
14    const { history } = await memory.loadMemoryVariables({});
15
16    // 4. Invoke chain
17    const chain = RunnableSequence.from([prompt, llm, new StringOutputParser()]);
18    const content = await chain.invoke({ input: request.message, history });
19
20    // 5. Persist exchange
21    await memory.saveContext(
22      { input: request.message }, 
23      { output: content }
24    );
25
26    return { sessionId, messageId: uuidv4(), content, role: "assistant" };
27  }
28}

Streaming variant: The stream() method follows the same pattern but yields chunks as they arrive, then persists the full exchange after streaming completes.

Extensions

RAG Pipeline

Location: extensions/rag/ragPipeline.ts

Architecture:

Documents are chunked and embedded with OpenAI text-embedding-3-small
Embeddings are stored in a vector store (in-memory for dev, Pinecone/Weaviate/pgvector for production)
On each query, the top-K most similar documents are retrieved
Retrieved context is injected into the system prompt
LLM answers based on context, cites sources

Usage:

typescript

1const rag = new RAGPipeline({ topK: 4 });
2
3await rag.addDocuments([
4  { id: "doc-1", content: "Our refund policy allows returns within 30 days." },
5  { id: "doc-2", content: "Premium subscribers get priority support." },
6]);
7
8const request = await rag.buildRequest({
9  sessionId: "user-123",
10  message: "What is your refund policy?",
11});
12
13const response = await chatbot.chat(request);
14// → "According to our policy, returns are allowed within 30 days. [Source 1]"

Why it's an extension: RAG isn't needed for most chatbots. Keeping it in extensions/ means it's not bundled unless explicitly imported. Applications that need RAG wire it in during the request flow—the framework never needs to know about it.

Multi-Agent Orchestration

Location: extensions/multiAgent/orchestrator.ts

Pattern: A router analyzes user intent and dispatches to specialist sub-agents (coding, support, data analysis, general).

Current implementation: Uses keyword matching. Production systems should replace this with an LLM intent classifier.

Usage:

typescript

1const orchestrator = new AgentOrchestrator();
2
3const response = await orchestrator.route({
4  sessionId: "user-456",
5  message: "How do I write a TypeScript generic function?",
6});
7// → Routed to "coding" specialist (temperature: 0.2)

Why it's an extension: Most applications use a single chatbot persona. Multi-agent routing adds latency and complexity. By keeping it as an opt-in extension, simple use cases stay simple.

Screenshot Placeholder

TODO: Add screenshot showing example API response with session ID, message ID, content, model, token usage, and latency.

Performance Considerations

Memory Footprint

Measurement: Each session stores:

Session metadata: ~200 bytes
Message history: ~500 bytes per message pair (user + assistant)
LangChain memory overhead: ~1KB

Capacity: A session with 50 messages consumes ~26KB. A server with 1GB available memory can hold ~38K concurrent sessions comfortably.

Scaling: For higher session counts, implement the Redis adapter (interface is defined). Redis-backed memory supports horizontal scaling and survives restarts.

Streaming Latency

First token latency: 200-500ms (depends on OpenAI API)

Chunk delivery: SSE writes chunks immediately—no buffering. The 15-second heartbeat prevents proxy timeouts without affecting latency.

Network optimization:

gzip compression is handled by Express middleware
SSE messages are minimal JSON ({ type: "delta", content: "..." })

Rate Limiting

Configured per-IP with express-rate-limit:

typescript

1const limiter = rateLimit({
2  windowMs: env.RATE_LIMIT_WINDOW_MS, // 60 seconds
3  max: env.RATE_LIMIT_MAX_REQUESTS,   // 60 requests
4  standardHeaders: true,
5});

Why per-IP: Session-based rate limiting requires authentication. Per-IP is a reasonable default for public APIs. Applications with auth should add session-based rate limiting in the beforeChat hook.

Security Considerations

Input Validation

Every request body is validated with Zod before reaching the service layer:

typescript

1router.post("/chat", validate(ChatRequestSchema), chat);

Invalid requests return 422 with field-level error details. Malformed JSON returns 400.

Helmet.js Security Headers

All responses include:

X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Content Security Policy (disabled in dev for easier debugging)

CORS Configuration

Development: origin: "*" (allow all origins for local testing)

Production: Reads ALLOWED_ORIGINS from environment (comma-separated list)

Secrets Management

All secrets live in .env:

.env is gitignored
.env.example documents required variables without exposing values
Zod validation crashes on missing secrets

Tool Safety

Tools are the largest attack surface (they execute arbitrary code). Mitigations:

Schema validation: LLM-supplied arguments are validated before execution
Error containment: Tools never throw—errors are captured and returned as tool output
Read-only marking: Tools can be marked readonly: true (useful for audit layers)
Permission system: Applications can filter tools based on user roles

Developer Experience

Zero-Config Usage

typescript

1import { createChatbot } from "chatbot-foundation";
2
3const chatbot = createChatbot();
4
5await chatbot.chat({
6  sessionId: "user-123",
7  message: "Hello!",
8});

Works immediately. All defaults are read from .env.

Deep Customization

typescript

1const chatbot = createChatbot({
2  systemPrompt: "You are a helpful assistant.",
3  llm: { model: "gpt-4o", temperature: 0.7 },
4  memory: { maxMessages: 100, ttlSeconds: 7200 },
5  tools: [weatherTool, calculatorTool],
6  hooks: {
7    beforeChat: async (req) => { /* auth */ },
8    afterChat: async (res) => { /* logging */ },
9  },
10});

Every default can be overridden.

Public API Surface

Everything consumers need is exported from src/exports.ts:

typescript

1export { createChatbot, ChatService } from "./services/chatService";
2export { createLLM, createStreamingLLM } from "./ai/llm/openai";
3export { SYSTEM_PROMPTS, buildSystemPrompt } from "./ai/prompts/systemPrompt";
4export { registerTool, toolRegistry } from "./ai/tools/toolRegistry";
5export { MemoryManager } from "./ai/memory/memoryManager";
6export { createFunctionsAgent, runAgent } from "./ai/agents/agentFactory";
7// + all types and schemas

Internal modules are not re-exported—implementation details stay private.

Examples

examples/usage.ts contains 10 runnable examples:

Minimal usage
Custom system prompts
Conversation memory
Custom tools
Streaming
Per-request overrides
Lifecycle hooks
Agents with tools
Domain-specific chatbots
Session management

Run with npx tsx examples/usage.ts.

Testing Strategy

Current state: No automated tests.

Recommended approach:

Unit tests: Test service layer methods with mocked LLM responses
Integration tests: Test full request flow with in-memory store
Contract tests: Validate OpenAI API assumptions (model names, response shapes)

Why testing is easier: Because the service layer has no HTTP dependencies, tests can call chatbot.chat() directly without starting a server or mocking Express.

Challenges & Tradeoffs

1. LangChain Abstraction Overhead

Challenge: LangChain adds ~2MB to the bundle and introduces indirection (e.g., RunnableSequence.from([prompt, llm, parser]) instead of direct API calls).

Tradeoff: We gain provider abstraction and battle-tested primitives at the cost of bundle size. For applications where bundle size matters, we could add a "lite" mode that uses OpenAI SDK directly.

Mitigation: All LangChain usage is wrapped in factory functions. If we need to swap orchestration layers, only internal modules change.

2. In-Memory Store Doesn't Scale Horizontally

Challenge: In-memory sessions don't survive restarts and can't be shared across instances.

Tradeoff: Zero dependencies for dev/testing vs. Redis requirement for production scale. Most applications don't need horizontal scaling on day one.

Mitigation: MemoryStore interface is defined. Implementing Redis means writing one adapter class—no framework changes needed.

3. Streaming Requires SSE, Not WebSockets

Challenge: SSE is unidirectional. Applications that need bidirectional communication (e.g., "stop generation" button) must use WebSockets.

Tradeoff: Simpler protocol and better proxy compatibility vs. lack of client-to-server messaging.

Mitigation: The stream() method is transport-agnostic—it returns an async generator. Controllers can wire this to SSE, WebSockets, or any other transport.

4. No Built-In Authentication

Challenge: The framework doesn't handle auth because every application's requirements differ (JWT, session cookies, OAuth, API keys).

Tradeoff: Smaller framework surface vs. requiring integration work.

Mitigation: Authentication is injected via the beforeChat hook. The framework provides the integration point—applications provide the policy.

Diagram Placeholder

TODO: Add a Mermaid class diagram showing the relationships between ChatService, ChatChain, MemoryManager, ToolRegistry, and LLM factories.

Lessons Learned

1. Transport Agnosticism Requires Discipline

Early versions of ChatService accepted Express Request objects. This seemed harmless until we needed to use the chatbot in a CLI tool—suddenly the entire API was coupled to HTTP.

Lesson: Define service layer interfaces first, then build adapters. Never let transport concerns leak into business logic.

2. Fail-Fast Configuration Saves Debug Time

Silently defaulting missing environment variables led to confusing production bugs ("Why is the chatbot using the wrong model?"). Switching to validated configuration with detailed error messages eliminated an entire class of issues.

Lesson: Crash loudly on invalid config. The 10 seconds spent fixing .env saves hours of debugging.

3. Memory Management Is Subtle

Initial implementation stored raw LangChain messages directly. This worked until we needed to retrieve history for the /history endpoint—LangChain's internal format wasn't suitable for API responses.

Lesson: Maintain two representations: LangChain memory for chain invocation, raw messages for API retrieval. They're conceptually the same but serve different consumers.

4. Tool Validation Prevents Subtle Bugs

Without schema validation, the LLM once supplied { city: "Tokyo", unit: "celsius" } (note: unit not units). The tool threw, the chain failed, and the user saw a cryptic error.

Lesson: Validate at the boundary. Zod schemas catch typos, missing fields, and type mismatches before they propagate.

5. Hooks Beat Inheritance

Early design used subclassing (CustomerSupportChatbot extends ChatService). This became unmaintainable when we needed auth + logging + moderation—multiple inheritance isn't supported in JS.

Lesson: Hooks compose better than inheritance. Middleware patterns win for cross-cutting concerns.

Future Improvements

1. Redis Memory Adapter

What: Implement RedisStore class that satisfies the MemoryStore interface.

Why: Enables horizontal scaling and survives restarts.

Effort: ~200 lines (ioredis setup + CRUD operations).

2. Observability Layer

What: Integrate LangSmith or LangFuse for tracing, token counting, and latency monitoring.

Why: Production systems need visibility into LLM calls—prompt versions, token costs, failure rates.

Effort: Moderate (requires LangChain callback setup).

3. Prompt Versioning

What: Store prompt templates in a database with version IDs. Log which version was used for each request.

Why: A/B testing and gradual rollout of prompt changes.

Effort: Moderate (requires prompt storage layer + migration from hardcoded templates).

4. Multi-LLM Support

What: Add factories for Anthropic, Mistral, Cohere, local models (Ollama).

Why: Different models have different strengths. Cost-sensitive applications can route cheap queries to gpt-4.1-mini, expensive queries to gpt-4o.

Effort: Low (add factory functions, LangChain handles the rest).

5. Test Suite

What: Unit tests for service layer, integration tests for full request flow, contract tests for OpenAI API.

Why: Confidence when refactoring, regression prevention, onboarding documentation.

Effort: High (requires test infrastructure + fixture setup).

Key Takeaways

Layered architecture enables reuse. By keeping the service layer transport-agnostic, the same chatbot works for REST, WebSocket, CLI, and background jobs without modification.
Fail-fast configuration prevents production bugs. Zod validation at startup crashes immediately with detailed errors rather than silently degrading in production.
Hooks compose better than inheritance. Middleware patterns (beforeChat/afterChat) support arbitrary composition of auth, logging, moderation, and analytics without framework changes.
LangChain provides leverage at the cost of abstraction. We gain provider abstraction and battle-tested primitives but pay in bundle size and indirection. Wrapping LangChain in factory functions keeps the door open for future swaps.
Type safety catches errors early. Zod schemas generate TypeScript types, validate runtime data, and coerce environment variables—eliminating manual type definitions and entire classes of bugs.

Conclusion

ChatBot Foundation demonstrates that production-grade infrastructure doesn't require custom implementations for every project. By abstracting common patterns—conversation memory, tool calling, streaming, prompt management—into a reusable framework, applications can focus on domain logic rather than plumbing.

The three-layer architecture (HTTP → Service → AI) ensures that the same chatbot works across any transport. Zod-based validation guarantees that invalid configurations never reach production. Lifecycle hooks provide integration points for auth, logging, and moderation without coupling the framework to specific implementations.

The framework currently powers conversational AI systems but is designed to extend: RAG pipelines, multi-agent orchestration, Redis memory adapters, and alternative LLM providers all plug in without modifying core code.

For teams building AI-powered applications, ChatBot Foundation eliminates the infrastructure work and lets you ship faster—bring your business logic, the framework handles the rest.

View the complete source code and documentation on GitHub. Have questions about building production AI systems? Get in touch.

Continue reading

NextBuilding an AI-Powered Lead Management System with Next.js

Next.jsAITypeScript17 min read

Building an AI-Powered Lead Management System with Next.js

How I built a production-ready CRM that automatically qualifies leads using Google Gemini AI, automates follow-ups, and generates executive insights.

June 30, 2026Read more

Back to blog

TypeScript LangChain AI System Design

Building ChatBot Foundation: A Production-Ready LLM Framework for TypeScript

How I architected a transport-agnostic, plug-and-play chatbot framework with conversation memory, tool calling, and streaming support.

June 30, 202621 min read·Talha Bilal

Introduction

Diagram Placeholder

TODO: Add a Mermaid flowchart showing the high-level system architecture: HTTP Layer → Service Layer → AI Layer (Chains, LLM, Memory, Tools, Prompts, Agents).

The Problem

Most chatbot implementations start simple but accumulate complexity:

Conversation memory requires session management, TTL expiry, and sliding window pruning
Tool calling needs schema validation, error handling, and result serialization
Streaming responses demand SSE setup, heartbeat management, and graceful disconnection
Prompt engineering becomes scattered across controllers, mixing business logic with AI primitives
Error handling leaks implementation details into HTTP responses
Configuration lives in dozens of environment variables without validation

Goals

The framework needed to achieve five core objectives:

1. Transport Agnosticism

2. Production Readiness

3. Developer Experience

The API surface should support both zero-config usage (createChatbot() just works) and deep customization (per-request LLM overrides, lifecycle hooks, custom tools, memory configuration).

4. Extensibility

5. Type Safety

Full TypeScript coverage with Zod runtime validation. Invalid configurations should crash on startup, not fail silently in production.

Architecture Overview

The system follows a three-layer architecture with strict separation of concerns:

HTTP Layer

Express controllers, routes, and middleware. Handles rate limiting, request validation, error formatting, and SSE streaming setup. This layer knows about HTTP but nothing about AI primitives.

Service Layer

AI Layer

This layering makes it trivial to reuse the same chatbot across different transports:

typescript

1// REST endpoint
2app.post("/chat", async (req, res) => {
3  const response = await chatbot.chat(req.body);
4  res.json(response);
5});
6
7// CLI command
8const response = await chatbot.chat({
9  sessionId: "cli-user",
10  message: process.argv[2],
11});
12console.log(response.content);
13
14// Background worker
15queue.process(async (job) => {
16  return await chatbot.chat(job.data);
17});

No changes to chatbot—it doesn't know or care about the transport.

Key Technical Decisions

1. LangChain as Orchestration Layer

Decision: Use LangChain for LLM orchestration rather than calling OpenAI SDK directly.

Rationale:

Provider abstraction: Switching from OpenAI to Anthropic requires changing one import, not rewriting prompt composition
Battle-tested primitives: ChatPromptTemplate, BufferWindowMemory, DynamicStructuredTool handle edge cases we'd otherwise miss
Agent support: LangChain's AgentExecutor provides multi-step reasoning and tool orchestration out of the box
Ecosystem: Extensions like vector store connectors and document loaders are already built

2. Zod for Runtime Validation

Decision: Validate all environment variables, request bodies, and tool inputs with Zod schemas.

Rationale:

Fail-fast: Invalid configuration crashes on startup with detailed error messages rather than failing silently in production
Type inference: Zod schemas generate TypeScript types, eliminating manual type definitions
Coercion: Environment variables arrive as strings but need to be numbers/booleans—Zod coerces automatically
Tool safety: LLM-supplied tool arguments are validated against schemas before execution, preventing malformed inputs from bypassing application logic

3. In-Memory Store with TTL as Default Memory

Decision: Default to an in-memory Map with TTL timers rather than requiring Redis.

Rationale:

Zero dependencies: Framework works immediately without external services
Development simplicity: Local development doesn't need Docker or Redis setup
Sufficient for many use cases: Applications with < 10K concurrent sessions fit comfortably in memory
Easy upgrade path: MemoryStore interface is defined—implementing Redis means writing one adapter class

Memory Management:

Sliding window pruning: Oldest messages dropped first when maxMessages is exceeded
TTL expiry: Sessions auto-delete after SESSION_TTL_SECONDS of inactivity
Lazy hydration: LangChain BufferWindowMemory instances are created on first access and hydrated from persistent store

4. Streaming via Server-Sent Events

Decision: Use SSE for streaming responses rather than WebSockets.

Rationale:

Simpler protocol: SSE is unidirectional—client receives events, no need to handle client messages
Built-in reconnection: Browsers automatically reconnect dropped SSE connections
HTTP-friendly: Works through standard HTTP proxies and load balancers
Native API: EventSource API is built into browsers, no library needed

5. Lifecycle Hooks for Cross-Cutting Concerns

Decision: Support beforeChat and afterChat hooks rather than baking auth/logging/moderation into the framework.

Rationale:

Separation of concerns: Authentication, rate limiting, content moderation, cost tracking, and analytics are application-specific—they don't belong in a chatbot framework
Composability: Hooks can be layered (auth → logging → moderation) without modifying framework code
Testability: Tests can inject mock hooks or skip them entirely

Usage:

typescript

1const chatbot = createChatbot({
2  hooks: {
3    beforeChat: async (req) => {
4      // Auth, rate limit checks, input sanitization
5      if (!await isAuthorized(req.sessionId)) {
6        throw new ChatError("UNAUTHORIZED", "Invalid session");
7      }
8      return req;
9    },
10    afterChat: async (res) => {
11      // Log tokens, track costs, run content filters
12      await logTokenUsage(res.usage);
13      return res;
14    },
15  },
16});

Core Features

Conversation Memory

Session-based message history with configurable retention and automatic expiry:

typescript

1const chatbot = createChatbot({ memory: true });
2
3await chatbot.chat({ 
4  sessionId: "user-123", 
5  message: "My name is Alice." 
6});
7
8// Later turn—chatbot remembers context
9await chatbot.chat({ 
10  sessionId: "user-123", 
11  message: "What's my name?" 
12});
13// → "Your name is Alice."

Under the hood:

MemoryManager lazily creates a LangChain BufferWindowMemory for each session
Memory is hydrated from the persistent store on first access (supports restarts and horizontal scaling)
After each exchange, user + assistant messages are saved via saveContext()
Raw messages are persisted to the backing store for history retrieval
Sliding window pruning keeps the last maxMessages
TTL timer resets on every access—idle sessions expire automatically

Dynamic Tool Calling

typescript

1import { z } from "zod";
2
3const weatherTool = {
4  name: "getWeather",
5  description: "Get current weather for a city.",
6  schema: z.object({
7    city: z.string(),
8    units: z.enum(["celsius", "fahrenheit"]).default("celsius"),
9  }),
10  execute: async ({ city, units }) => {
11    const data = await weatherAPI.fetch(city);
12    return { city, temperature: data.temp, units };
13  },
14};
15
16const chatbot = createChatbot({ tools: [weatherTool] });
17
18await chatbot.chat({
19  sessionId: "user-456",
20  message: "What's the weather in Tokyo?",
21});
22// LLM calls getWeather({ city: "Tokyo", units: "celsius" })
23// → "The weather in Tokyo is 22°C, partly cloudy."

Streaming Responses

Async generator pattern for low-latency UX:

typescript

1for await (const chunk of chatbot.stream({
2  sessionId: "user-789",
3  message: "Explain microservices.",
4})) {
5  if (chunk.type === "delta") process.stdout.write(chunk.content);
6  if (chunk.type === "done") console.log("\n✅ Done");
7  if (chunk.type === "error") console.error(chunk.error);
8}

Per-Request LLM Overrides

Customize model, temperature, and system prompt per call:

typescript

1await chatbot.chat({
2  sessionId: "creative-session",
3  message: "Write a haiku about TypeScript.",
4  options: {
5    model: "gpt-4o",
6    temperature: 1.2,
7    systemPrompt: "You are a creative poet who loves programming.",
8  },
9});

Use cases:

A/B testing different models
User preference for verbose vs. concise responses
High temperature for creative tasks, low temperature for factual queries

Domain-Specific System Prompts

Built-in templates for common use cases:

typescript

1import { SYSTEM_PROMPTS, buildSystemPrompt } from "./src/exports";
2
3// Customer support chatbot
4const supportBot = createChatbot({
5  systemPrompt: buildSystemPrompt("customerSupport", {
6    company: "TechCorp",
7    additionalInstructions: "Always mention our 30-day return policy.",
8  }),
9});
10
11// Coding assistant
12const codingBot = createChatbot({
13  systemPrompt: SYSTEM_PROMPTS.codingAssistant,
14  llm: { temperature: 0.2 },
15});
16
17// RAG-augmented assistant
18const ragBot = createChatbot({
19  systemPrompt: SYSTEM_PROMPTS.ragAssistant,
20});

Available prompts:

general — Sensible default for most applications
customerSupport — Polite, solution-focused, escalation-aware
codingAssistant — Opinionated, pragmatic, proactive about edge cases
dataAnalyst — Quantitative, precise, visualization-aware
ragAssistant — Cite sources, stay grounded in provided context

Diagram Placeholder

TODO: Add a Mermaid sequence diagram showing the full request lifecycle: Client → Controller → Service → Chain → LLM → Memory → Response.

Implementation Highlights

1. Environment Validation

All configuration lives in src/config/env.ts and is validated with Zod on startup:

typescript

1const EnvSchema = z.object({
2  OPENAI_API_KEY: z.string().min(1, "OPENAI_API_KEY is required"),
3  PORT: z.coerce.number().default(3000),
4  DEFAULT_MODEL: z.string().default("gpt-4.1-mini"),
5  MEMORY_MAX_MESSAGES: z.coerce.number().positive().default(50),
6  // ... 15+ more variables
7});
8
9export const env = EnvSchema.parse(process.env);

2. Structured Error Handling

All errors are wrapped in ChatError with semantic codes:

typescript

1export class ChatError extends Error {
2  constructor(
3    public readonly code: ChatErrorCode,
4    message: string,
5    public readonly cause?: unknown
6  ) {
7    super(message);
8    this.name = "ChatError";
9  }
10}

Error codes map to HTTP status:

INVALID_REQUEST → 400
UNAUTHORIZED → 401
RATE_LIMITED → 429
LLM_ERROR, TOOL_ERROR → 502 (upstream failure)
MEMORY_ERROR, INTERNAL_ERROR → 500

Global error handler:

typescript

1export function errorHandler(err, req, res, next) {
2  if (err instanceof ChatError) {
3    const status = ERROR_STATUS_MAP[err.code] ?? 500;
4    res.status(status).json({
5      success: false,
6      error: err.message,
7      code: err.code,
8      ...(isProd ? {} : { stack: err.stack }),
9    });
10    return;
11  }
12  // Handle other error types...
13}

Stack traces only appear in development. Production responses never leak implementation details.

3. Tool Registry Pattern

Tools are stored in a Map for O(1) lookup and converted to LangChain DynamicStructuredTool instances on demand:

typescript

1export class ToolRegistry {
2  private readonly tools = new Map<string, ToolDefinition>();
3
4  register<TInput>(tool: ToolDefinition<TInput>): this {
5    this.tools.set(tool.name, tool);
6    return this; // Fluent API
7  }
8
9  async execute(name: string, rawInput: unknown): Promise<ToolResult> {
10    const tool = this.tools.get(name);
11    if (!tool) {
12      return { toolName: name, output: null, error: "Tool not found" };
13    }
14
15    const parsed = tool.schema.safeParse(rawInput);
16    if (!parsed.success) {
17      return { toolName: name, output: null, error: parsed.error.message };
18    }
19
20    try {
21      const output = await tool.execute(parsed.data);
22      return { toolName: name, output, latencyMs: Date.now() - start };
23    } catch (err) {
24      return { toolName: name, output: null, error: err.message };
25    }
26  }
27}

Key design:

Never throws—always returns ToolResult with error field
Schema validation happens before execution
Execution latency is tracked automatically
Errors are captured and returned to LLM as tool output

4. Prompt Composition

Prompts are built with LangChain's ChatPromptTemplate but wrapped in factory functions:

typescript

1export function createPrompt(config: PromptConfig): ChatPromptTemplate {
2  return ChatPromptTemplate.fromMessages([
3    SystemMessagePromptTemplate.fromTemplate(config.system),
4    new MessagesPlaceholder("history"), // Injected by memory manager
5    HumanMessagePromptTemplate.fromTemplate(config.user ?? "{input}"),
6  ]);
7}

Variants:

createStatelessPrompt() — No history placeholder (for single-turn chains)
createFewShotPrompt() — Prepends example exchanges for classification tasks

Variable injection: Prompts use {variable} syntax. Variables are supplied at invocation time:

typescript

1const prompt = createPrompt({
2  system: "You are a {role} specializing in {domain}.",
3});
4
5await chain.invoke({ 
6  input: "user message", 
7  role: "architect", 
8  domain: "distributed systems" 
9});

5. Chain Composition

ChatChain wraps the LangChain pipeline with domain-specific logic:

typescript

1export class ChatChain {
2  async invoke(request: ChatRequest): Promise<ChatResponse> {
3    // 1. Resolve LLM (merges defaults + per-request overrides)
4    const llm = createLLM({
5      ...this.config.llmConfig,
6      ...request.options,
7    });
8
9    // 2. Build prompt
10    const prompt = createPrompt({ system: this.config.systemPrompt });
11
12    // 3. Fetch history
13    const memory = await this.memoryManager.getLangChainMemory(request.sessionId);
14    const { history } = await memory.loadMemoryVariables({});
15
16    // 4. Invoke chain
17    const chain = RunnableSequence.from([prompt, llm, new StringOutputParser()]);
18    const content = await chain.invoke({ input: request.message, history });
19
20    // 5. Persist exchange
21    await memory.saveContext(
22      { input: request.message }, 
23      { output: content }
24    );
25
26    return { sessionId, messageId: uuidv4(), content, role: "assistant" };
27  }
28}

Streaming variant: The stream() method follows the same pattern but yields chunks as they arrive, then persists the full exchange after streaming completes.

Extensions

RAG Pipeline

Location: extensions/rag/ragPipeline.ts

Architecture:

Documents are chunked and embedded with OpenAI text-embedding-3-small
Embeddings are stored in a vector store (in-memory for dev, Pinecone/Weaviate/pgvector for production)
On each query, the top-K most similar documents are retrieved
Retrieved context is injected into the system prompt
LLM answers based on context, cites sources

Usage:

typescript

1const rag = new RAGPipeline({ topK: 4 });
2
3await rag.addDocuments([
4  { id: "doc-1", content: "Our refund policy allows returns within 30 days." },
5  { id: "doc-2", content: "Premium subscribers get priority support." },
6]);
7
8const request = await rag.buildRequest({
9  sessionId: "user-123",
10  message: "What is your refund policy?",
11});
12
13const response = await chatbot.chat(request);
14// → "According to our policy, returns are allowed within 30 days. [Source 1]"

Multi-Agent Orchestration

Location: extensions/multiAgent/orchestrator.ts

Pattern: A router analyzes user intent and dispatches to specialist sub-agents (coding, support, data analysis, general).

Current implementation: Uses keyword matching. Production systems should replace this with an LLM intent classifier.

Usage:

typescript

1const orchestrator = new AgentOrchestrator();
2
3const response = await orchestrator.route({
4  sessionId: "user-456",
5  message: "How do I write a TypeScript generic function?",
6});
7// → Routed to "coding" specialist (temperature: 0.2)

Why it's an extension: Most applications use a single chatbot persona. Multi-agent routing adds latency and complexity. By keeping it as an opt-in extension, simple use cases stay simple.

Screenshot Placeholder

TODO: Add screenshot showing example API response with session ID, message ID, content, model, token usage, and latency.

Performance Considerations

Memory Footprint

Measurement: Each session stores:

Session metadata: ~200 bytes
Message history: ~500 bytes per message pair (user + assistant)
LangChain memory overhead: ~1KB

Capacity: A session with 50 messages consumes ~26KB. A server with 1GB available memory can hold ~38K concurrent sessions comfortably.

Scaling: For higher session counts, implement the Redis adapter (interface is defined). Redis-backed memory supports horizontal scaling and survives restarts.

Streaming Latency

First token latency: 200-500ms (depends on OpenAI API)

Chunk delivery: SSE writes chunks immediately—no buffering. The 15-second heartbeat prevents proxy timeouts without affecting latency.

Network optimization:

gzip compression is handled by Express middleware
SSE messages are minimal JSON ({ type: "delta", content: "..." })

Rate Limiting

Configured per-IP with express-rate-limit:

typescript

1const limiter = rateLimit({
2  windowMs: env.RATE_LIMIT_WINDOW_MS, // 60 seconds
3  max: env.RATE_LIMIT_MAX_REQUESTS,   // 60 requests
4  standardHeaders: true,
5});

Security Considerations

Input Validation

Every request body is validated with Zod before reaching the service layer:

typescript

1router.post("/chat", validate(ChatRequestSchema), chat);

Invalid requests return 422 with field-level error details. Malformed JSON returns 400.

Helmet.js Security Headers

All responses include:

X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Content Security Policy (disabled in dev for easier debugging)

CORS Configuration

Development: origin: "*" (allow all origins for local testing)

Production: Reads ALLOWED_ORIGINS from environment (comma-separated list)

Secrets Management

All secrets live in .env:

.env is gitignored
.env.example documents required variables without exposing values
Zod validation crashes on missing secrets

Tool Safety

Tools are the largest attack surface (they execute arbitrary code). Mitigations:

Schema validation: LLM-supplied arguments are validated before execution
Error containment: Tools never throw—errors are captured and returned as tool output
Read-only marking: Tools can be marked readonly: true (useful for audit layers)
Permission system: Applications can filter tools based on user roles

Developer Experience

Zero-Config Usage

typescript

1import { createChatbot } from "chatbot-foundation";
2
3const chatbot = createChatbot();
4
5await chatbot.chat({
6  sessionId: "user-123",
7  message: "Hello!",
8});

Works immediately. All defaults are read from .env.

Deep Customization

typescript

1const chatbot = createChatbot({
2  systemPrompt: "You are a helpful assistant.",
3  llm: { model: "gpt-4o", temperature: 0.7 },
4  memory: { maxMessages: 100, ttlSeconds: 7200 },
5  tools: [weatherTool, calculatorTool],
6  hooks: {
7    beforeChat: async (req) => { /* auth */ },
8    afterChat: async (res) => { /* logging */ },
9  },
10});

Every default can be overridden.

Public API Surface

Everything consumers need is exported from src/exports.ts:

typescript

1export { createChatbot, ChatService } from "./services/chatService";
2export { createLLM, createStreamingLLM } from "./ai/llm/openai";
3export { SYSTEM_PROMPTS, buildSystemPrompt } from "./ai/prompts/systemPrompt";
4export { registerTool, toolRegistry } from "./ai/tools/toolRegistry";
5export { MemoryManager } from "./ai/memory/memoryManager";
6export { createFunctionsAgent, runAgent } from "./ai/agents/agentFactory";
7// + all types and schemas

Internal modules are not re-exported—implementation details stay private.

Examples

examples/usage.ts contains 10 runnable examples:

Minimal usage
Custom system prompts
Conversation memory
Custom tools
Streaming
Per-request overrides
Lifecycle hooks
Agents with tools
Domain-specific chatbots
Session management

Run with npx tsx examples/usage.ts.

Testing Strategy

Current state: No automated tests.

Recommended approach:

Unit tests: Test service layer methods with mocked LLM responses
Integration tests: Test full request flow with in-memory store
Contract tests: Validate OpenAI API assumptions (model names, response shapes)

Why testing is easier: Because the service layer has no HTTP dependencies, tests can call chatbot.chat() directly without starting a server or mocking Express.

Challenges & Tradeoffs

1. LangChain Abstraction Overhead

Challenge: LangChain adds ~2MB to the bundle and introduces indirection (e.g., RunnableSequence.from([prompt, llm, parser]) instead of direct API calls).

Mitigation: All LangChain usage is wrapped in factory functions. If we need to swap orchestration layers, only internal modules change.

2. In-Memory Store Doesn't Scale Horizontally

Challenge: In-memory sessions don't survive restarts and can't be shared across instances.

Tradeoff: Zero dependencies for dev/testing vs. Redis requirement for production scale. Most applications don't need horizontal scaling on day one.

Mitigation: MemoryStore interface is defined. Implementing Redis means writing one adapter class—no framework changes needed.

3. Streaming Requires SSE, Not WebSockets

Challenge: SSE is unidirectional. Applications that need bidirectional communication (e.g., "stop generation" button) must use WebSockets.

Tradeoff: Simpler protocol and better proxy compatibility vs. lack of client-to-server messaging.

Mitigation: The stream() method is transport-agnostic—it returns an async generator. Controllers can wire this to SSE, WebSockets, or any other transport.

4. No Built-In Authentication

Challenge: The framework doesn't handle auth because every application's requirements differ (JWT, session cookies, OAuth, API keys).

Tradeoff: Smaller framework surface vs. requiring integration work.

Mitigation: Authentication is injected via the beforeChat hook. The framework provides the integration point—applications provide the policy.

Diagram Placeholder

TODO: Add a Mermaid class diagram showing the relationships between ChatService, ChatChain, MemoryManager, ToolRegistry, and LLM factories.

Lessons Learned

1. Transport Agnosticism Requires Discipline

Early versions of ChatService accepted Express Request objects. This seemed harmless until we needed to use the chatbot in a CLI tool—suddenly the entire API was coupled to HTTP.

Lesson: Define service layer interfaces first, then build adapters. Never let transport concerns leak into business logic.

2. Fail-Fast Configuration Saves Debug Time

Lesson: Crash loudly on invalid config. The 10 seconds spent fixing .env saves hours of debugging.

3. Memory Management Is Subtle

Lesson: Maintain two representations: LangChain memory for chain invocation, raw messages for API retrieval. They're conceptually the same but serve different consumers.

4. Tool Validation Prevents Subtle Bugs

Without schema validation, the LLM once supplied { city: "Tokyo", unit: "celsius" } (note: unit not units). The tool threw, the chain failed, and the user saw a cryptic error.

Lesson: Validate at the boundary. Zod schemas catch typos, missing fields, and type mismatches before they propagate.

5. Hooks Beat Inheritance

Early design used subclassing (CustomerSupportChatbot extends ChatService). This became unmaintainable when we needed auth + logging + moderation—multiple inheritance isn't supported in JS.

Lesson: Hooks compose better than inheritance. Middleware patterns win for cross-cutting concerns.

Future Improvements

1. Redis Memory Adapter

What: Implement RedisStore class that satisfies the MemoryStore interface.

Why: Enables horizontal scaling and survives restarts.

Effort: ~200 lines (ioredis setup + CRUD operations).

2. Observability Layer

What: Integrate LangSmith or LangFuse for tracing, token counting, and latency monitoring.

Why: Production systems need visibility into LLM calls—prompt versions, token costs, failure rates.

Effort: Moderate (requires LangChain callback setup).

3. Prompt Versioning

What: Store prompt templates in a database with version IDs. Log which version was used for each request.

Why: A/B testing and gradual rollout of prompt changes.

Effort: Moderate (requires prompt storage layer + migration from hardcoded templates).

4. Multi-LLM Support

What: Add factories for Anthropic, Mistral, Cohere, local models (Ollama).

Why: Different models have different strengths. Cost-sensitive applications can route cheap queries to gpt-4.1-mini, expensive queries to gpt-4o.

Effort: Low (add factory functions, LangChain handles the rest).

5. Test Suite

What: Unit tests for service layer, integration tests for full request flow, contract tests for OpenAI API.

Why: Confidence when refactoring, regression prevention, onboarding documentation.

Effort: High (requires test infrastructure + fixture setup).

Key Takeaways

Layered architecture enables reuse. By keeping the service layer transport-agnostic, the same chatbot works for REST, WebSocket, CLI, and background jobs without modification.
Fail-fast configuration prevents production bugs. Zod validation at startup crashes immediately with detailed errors rather than silently degrading in production.
Hooks compose better than inheritance. Middleware patterns (beforeChat/afterChat) support arbitrary composition of auth, logging, moderation, and analytics without framework changes.
LangChain provides leverage at the cost of abstraction. We gain provider abstraction and battle-tested primitives but pay in bundle size and indirection. Wrapping LangChain in factory functions keeps the door open for future swaps.
Type safety catches errors early. Zod schemas generate TypeScript types, validate runtime data, and coerce environment variables—eliminating manual type definitions and entire classes of bugs.

Conclusion

For teams building AI-powered applications, ChatBot Foundation eliminates the infrastructure work and lets you ship faster—bring your business logic, the framework handles the rest.

View the complete source code and documentation on GitHub. Have questions about building production AI systems? Get in touch.

Continue reading

NextBuilding an AI-Powered Lead Management System with Next.js

Next.jsAITypeScript17 min read

Building an AI-Powered Lead Management System with Next.js

How I built a production-ready CRM that automatically qualifies leads using Google Gemini AI, automates follow-ups, and generates executive insights.

June 30, 2026Read more

Building ChatBot Foundation: A Production-Ready LLM Framework for TypeScript

Continue reading

Related Articles

Building an AI-Powered Lead Management System with Next.js

Building ChatBot Foundation: A Production-Ready LLM Framework for TypeScript