Back to blog
Building ChatBot Foundation: A Production-Ready LLM Framework for TypeScript
How I architected a transport-agnostic, plug-and-play chatbot framework with conversation memory, tool calling, and streaming support.
21 min read·Talha Bilal
Share:

Introduction
Building AI-powered chat systems from scratch means reinventing the same infrastructure repeatedly: conversation memory, prompt management, tool execution, streaming responses, session lifecycle, and error handling. Every project becomes a custom implementation with subtle bugs that only surface in production.
ChatBot Foundation solves this by providing a production-grade, transport-agnostic chatbot framework for TypeScript. Like Auth.js abstracts authentication, ChatBot Foundation abstracts the complexity of building conversational AI systems. You bring your application logic and domain expertise—the framework handles the plumbing.
This case study walks through the architectural decisions, implementation details, and engineering tradeoffs behind building a framework that works equally well for REST APIs, WebSocket servers, CLI tools, and background workers.
Diagram Placeholder
TODO: Add a Mermaid flowchart showing the high-level system architecture: HTTP Layer → Service Layer → AI Layer (Chains, LLM, Memory, Tools, Prompts, Agents).
The Problem
Most chatbot implementations start simple but accumulate complexity:
- Conversation memory requires session management, TTL expiry, and sliding window pruning
- Tool calling needs schema validation, error handling, and result serialization
- Streaming responses demand SSE setup, heartbeat management, and graceful disconnection
- Prompt engineering becomes scattered across controllers, mixing business logic with AI primitives
- Error handling leaks implementation details into HTTP responses
- Configuration lives in dozens of environment variables without validation
Developers end up with tightly coupled code where HTTP concerns bleed into AI logic, making it impossible to reuse the same chatbot in CLI tools or background jobs. Testing becomes painful because everything depends on Express request/response objects.
Goals
The framework needed to achieve five core objectives:
1. Transport Agnosticism
The service layer must have zero HTTP dependencies. A chatbot created with
createChatbot() should work identically whether called from an Express controller, WebSocket handler, CLI command, or background worker.2. Production Readiness
The framework must include everything needed for production: structured logging, graceful shutdown, rate limiting, security headers, environment validation, and error handling that never leaks stack traces.
3. Developer Experience
The API surface should support both zero-config usage (
createChatbot() just works) and deep customization (per-request LLM overrides, lifecycle hooks, custom tools, memory configuration).4. Extensibility
Core architecture must support extensions without modification: RAG pipelines, multi-agent orchestration, Redis memory adapters, and alternative LLM providers should plug in without touching the framework code.
5. Type Safety
Full TypeScript coverage with Zod runtime validation. Invalid configurations should crash on startup, not fail silently in production.
Architecture Overview
The system follows a three-layer architecture with strict separation of concerns:
HTTP Layer
Express controllers, routes, and middleware. Handles rate limiting, request validation, error formatting, and SSE streaming setup. This layer knows about HTTP but nothing about AI primitives.
Service Layer
ChatService orchestrates LLM, memory, and tools. Contains all business logic but has zero HTTP dependencies. Exposes methods like chat(), stream(), getHistory(), and clearSession(). This layer is the integration point for any transport.AI Layer
LLM factories, prompt templates, conversation memory, tool registry, chains, and agents. These primitives know nothing about HTTP, services, or sessions—they're pure functional components that compose into higher-level abstractions.
This layering makes it trivial to reuse the same chatbot across different transports:
typescript
1// REST endpoint2app.post("/chat", async (req, res) => {3 const response = await chatbot.chat(req.body);4 res.json(response);5});6
7// CLI command8const response = await chatbot.chat({9 sessionId: "cli-user",10 message: process.argv[2],11});12console.log(response.content);13
14// Background worker15queue.process(async (job) => {16 return await chatbot.chat(job.data);17});No changes to
chatbot—it doesn't know or care about the transport.Key Technical Decisions
1. LangChain as Orchestration Layer
Decision: Use LangChain for LLM orchestration rather than calling OpenAI SDK directly.
Rationale:
- Provider abstraction: Switching from OpenAI to Anthropic requires changing one import, not rewriting prompt composition
- Battle-tested primitives:
ChatPromptTemplate,BufferWindowMemory,DynamicStructuredToolhandle edge cases we'd otherwise miss - Agent support: LangChain's
AgentExecutorprovides multi-step reasoning and tool orchestration out of the box - Ecosystem: Extensions like vector store connectors and document loaders are already built
Tradeoff: LangChain adds bundle size and abstraction overhead. We mitigate this by wrapping LangChain with our own factory functions (
createLLM, createPrompt) so consumers never import LangChain directly. If we need to swap orchestration layers later, only internal modules change.2. Zod for Runtime Validation
Decision: Validate all environment variables, request bodies, and tool inputs with Zod schemas.
Rationale:
- Fail-fast: Invalid configuration crashes on startup with detailed error messages rather than failing silently in production
- Type inference: Zod schemas generate TypeScript types, eliminating manual type definitions
- Coercion: Environment variables arrive as strings but need to be numbers/booleans—Zod coerces automatically
- Tool safety: LLM-supplied tool arguments are validated against schemas before execution, preventing malformed inputs from bypassing application logic
Implementation:
Every public-facing shape has a Zod schema co-located with its TypeScript type in
src/types/chat.types.ts. Middleware validates request bodies before controllers run. The ToolRegistry validates tool inputs before calling execute().3. In-Memory Store with TTL as Default Memory
Decision: Default to an in-memory
Map with TTL timers rather than requiring Redis.Rationale:
- Zero dependencies: Framework works immediately without external services
- Development simplicity: Local development doesn't need Docker or Redis setup
- Sufficient for many use cases: Applications with < 10K concurrent sessions fit comfortably in memory
- Easy upgrade path:
MemoryStoreinterface is defined—implementing Redis means writing one adapter class
Tradeoff: In-memory storage doesn't survive restarts and doesn't scale horizontally. For production systems with high session counts, the Redis adapter should be implemented (interface is already defined, stub is documented).
Memory Management:
- Sliding window pruning: Oldest messages dropped first when
maxMessagesis exceeded - TTL expiry: Sessions auto-delete after
SESSION_TTL_SECONDSof inactivity - Lazy hydration: LangChain
BufferWindowMemoryinstances are created on first access and hydrated from persistent store
4. Streaming via Server-Sent Events
Decision: Use SSE for streaming responses rather than WebSockets.
Rationale:
- Simpler protocol: SSE is unidirectional—client receives events, no need to handle client messages
- Built-in reconnection: Browsers automatically reconnect dropped SSE connections
- HTTP-friendly: Works through standard HTTP proxies and load balancers
- Native API:
EventSourceAPI is built into browsers, no library needed
Implementation:
The controller sets SSE headers, starts a 15-second heartbeat to keep the connection alive, and iterates over
chatService.stream() (an async generator). Each chunk is written as data: {JSON}\n\n. Connection cleanup happens automatically on client disconnect via the close event.5. Lifecycle Hooks for Cross-Cutting Concerns
Decision: Support
beforeChat and afterChat hooks rather than baking auth/logging/moderation into the framework.Rationale:
- Separation of concerns: Authentication, rate limiting, content moderation, cost tracking, and analytics are application-specific—they don't belong in a chatbot framework
- Composability: Hooks can be layered (auth → logging → moderation) without modifying framework code
- Testability: Tests can inject mock hooks or skip them entirely
Usage:
typescript
1const chatbot = createChatbot({2 hooks: {3 beforeChat: async (req) => {4 // Auth, rate limit checks, input sanitization5 if (!await isAuthorized(req.sessionId)) {6 throw new ChatError("UNAUTHORIZED", "Invalid session");7 }8 return req;9 },10 afterChat: async (res) => {11 // Log tokens, track costs, run content filters12 await logTokenUsage(res.usage);13 return res;14 },15 },16});Core Features
Conversation Memory
Session-based message history with configurable retention and automatic expiry:
typescript
1const chatbot = createChatbot({ memory: true });2
3await chatbot.chat({ 4 sessionId: "user-123", 5 message: "My name is Alice." 6});7
8// Later turn—chatbot remembers context9await chatbot.chat({ 10 sessionId: "user-123", 11 message: "What's my name?" 12});13// → "Your name is Alice."Under the hood:
MemoryManagerlazily creates a LangChainBufferWindowMemoryfor each session- Memory is hydrated from the persistent store on first access (supports restarts and horizontal scaling)
- After each exchange, user + assistant messages are saved via
saveContext() - Raw messages are persisted to the backing store for history retrieval
- Sliding window pruning keeps the last
maxMessages - TTL timer resets on every access—idle sessions expire automatically
Dynamic Tool Calling
Register functions that the LLM can invoke with runtime validation:
typescript
1import { z } from "zod";2
3const weatherTool = {4 name: "getWeather",5 description: "Get current weather for a city.",6 schema: z.object({7 city: z.string(),8 units: z.enum(["celsius", "fahrenheit"]).default("celsius"),9 }),10 execute: async ({ city, units }) => {11 const data = await weatherAPI.fetch(city);12 return { city, temperature: data.temp, units };13 },14};15
16const chatbot = createChatbot({ tools: [weatherTool] });17
18await chatbot.chat({19 sessionId: "user-456",20 message: "What's the weather in Tokyo?",21});22// LLM calls getWeather({ city: "Tokyo", units: "celsius" })23// → "The weather in Tokyo is 22°C, partly cloudy."Safety:
Every tool input is validated against its Zod schema before
execute() runs. Invalid inputs return a ToolResult with an error field—the tool never throws, preventing LLM errors from crashing the application.Streaming Responses
Async generator pattern for low-latency UX:
typescript
1for await (const chunk of chatbot.stream({2 sessionId: "user-789",3 message: "Explain microservices.",4})) {5 if (chunk.type === "delta") process.stdout.write(chunk.content);6 if (chunk.type === "done") console.log("\n✅ Done");7 if (chunk.type === "error") console.error(chunk.error);8}Controller implementation:
The SSE endpoint sets
Content-Type: text/event-stream, starts a heartbeat to prevent proxy timeouts, and writes each chunk as data: {JSON}\n\n. The controller never buffers the full response—chunks are flushed immediately.Per-Request LLM Overrides
Customize model, temperature, and system prompt per call:
typescript
1await chatbot.chat({2 sessionId: "creative-session",3 message: "Write a haiku about TypeScript.",4 options: {5 model: "gpt-4o",6 temperature: 1.2,7 systemPrompt: "You are a creative poet who loves programming.",8 },9});Use cases:
- A/B testing different models
- User preference for verbose vs. concise responses
- High temperature for creative tasks, low temperature for factual queries
Domain-Specific System Prompts
Built-in templates for common use cases:
typescript
1import { SYSTEM_PROMPTS, buildSystemPrompt } from "./src/exports";2
3// Customer support chatbot4const supportBot = createChatbot({5 systemPrompt: buildSystemPrompt("customerSupport", {6 company: "TechCorp",7 additionalInstructions: "Always mention our 30-day return policy.",8 }),9});10
11// Coding assistant12const codingBot = createChatbot({13 systemPrompt: SYSTEM_PROMPTS.codingAssistant,14 llm: { temperature: 0.2 },15});16
17// RAG-augmented assistant18const ragBot = createChatbot({19 systemPrompt: SYSTEM_PROMPTS.ragAssistant,20});Available prompts:
general— Sensible default for most applicationscustomerSupport— Polite, solution-focused, escalation-awarecodingAssistant— Opinionated, pragmatic, proactive about edge casesdataAnalyst— Quantitative, precise, visualization-awareragAssistant— Cite sources, stay grounded in provided context
Diagram Placeholder
TODO: Add a Mermaid sequence diagram showing the full request lifecycle: Client → Controller → Service → Chain → LLM → Memory → Response.
Implementation Highlights
1. Environment Validation
All configuration lives in
src/config/env.ts and is validated with Zod on startup:typescript
1const EnvSchema = z.object({2 OPENAI_API_KEY: z.string().min(1, "OPENAI_API_KEY is required"),3 PORT: z.coerce.number().default(3000),4 DEFAULT_MODEL: z.string().default("gpt-4.1-mini"),5 MEMORY_MAX_MESSAGES: z.coerce.number().positive().default(50),6 // ... 15+ more variables7});8
9export const env = EnvSchema.parse(process.env);Behavior:
If validation fails, the application crashes immediately with a detailed error message listing every invalid variable. This is intentional—misconfiguration should never silently degrade into production bugs.
2. Structured Error Handling
All errors are wrapped in
ChatError with semantic codes:typescript
1export class ChatError extends Error {2 constructor(3 public readonly code: ChatErrorCode,4 message: string,5 public readonly cause?: unknown6 ) {7 super(message);8 this.name = "ChatError";9 }10}Error codes map to HTTP status:
INVALID_REQUEST→ 400UNAUTHORIZED→ 401RATE_LIMITED→ 429LLM_ERROR,TOOL_ERROR→ 502 (upstream failure)MEMORY_ERROR,INTERNAL_ERROR→ 500
Global error handler:
typescript
1export function errorHandler(err, req, res, next) {2 if (err instanceof ChatError) {3 const status = ERROR_STATUS_MAP[err.code] ?? 500;4 res.status(status).json({5 success: false,6 error: err.message,7 code: err.code,8 ...(isProd ? {} : { stack: err.stack }),9 });10 return;11 }12 // Handle other error types...13}Stack traces only appear in development. Production responses never leak implementation details.
3. Tool Registry Pattern
Tools are stored in a
Map for O(1) lookup and converted to LangChain DynamicStructuredTool instances on demand:typescript
1export class ToolRegistry {2 private readonly tools = new Map<string, ToolDefinition>();3
4 register<TInput>(tool: ToolDefinition<TInput>): this {5 this.tools.set(tool.name, tool);6 return this; // Fluent API7 }8
9 async execute(name: string, rawInput: unknown): Promise<ToolResult> {10 const tool = this.tools.get(name);11 if (!tool) {12 return { toolName: name, output: null, error: "Tool not found" };13 }14
15 const parsed = tool.schema.safeParse(rawInput);16 if (!parsed.success) {17 return { toolName: name, output: null, error: parsed.error.message };18 }19
20 try {21 const output = await tool.execute(parsed.data);22 return { toolName: name, output, latencyMs: Date.now() - start };23 } catch (err) {24 return { toolName: name, output: null, error: err.message };25 }26 }27}Key design:
- Never throws—always returns
ToolResultwith error field - Schema validation happens before execution
- Execution latency is tracked automatically
- Errors are captured and returned to LLM as tool output
4. Prompt Composition
Prompts are built with LangChain's
ChatPromptTemplate but wrapped in factory functions:typescript
1export function createPrompt(config: PromptConfig): ChatPromptTemplate {2 return ChatPromptTemplate.fromMessages([3 SystemMessagePromptTemplate.fromTemplate(config.system),4 new MessagesPlaceholder("history"), // Injected by memory manager5 HumanMessagePromptTemplate.fromTemplate(config.user ?? "{input}"),6 ]);7}Variants:
createStatelessPrompt()— No history placeholder (for single-turn chains)createFewShotPrompt()— Prepends example exchanges for classification tasks
Variable injection:
Prompts use
{variable} syntax. Variables are supplied at invocation time:typescript
1const prompt = createPrompt({2 system: "You are a {role} specializing in {domain}.",3});4
5await chain.invoke({ 6 input: "user message", 7 role: "architect", 8 domain: "distributed systems" 9});5. Chain Composition
ChatChain wraps the LangChain pipeline with domain-specific logic:typescript
1export class ChatChain {2 async invoke(request: ChatRequest): Promise<ChatResponse> {3 // 1. Resolve LLM (merges defaults + per-request overrides)4 const llm = createLLM({5 ...this.config.llmConfig,6 ...request.options,7 });8
9 // 2. Build prompt10 const prompt = createPrompt({ system: this.config.systemPrompt });11
12 // 3. Fetch history13 const memory = await this.memoryManager.getLangChainMemory(request.sessionId);14 const { history } = await memory.loadMemoryVariables({});15
16 // 4. Invoke chain17 const chain = RunnableSequence.from([prompt, llm, new StringOutputParser()]);18 const content = await chain.invoke({ input: request.message, history });19
20 // 5. Persist exchange21 await memory.saveContext(22 { input: request.message }, 23 { output: content }24 );25
26 return { sessionId, messageId: uuidv4(), content, role: "assistant" };27 }28}Streaming variant:
The
stream() method follows the same pattern but yields chunks as they arrive, then persists the full exchange after streaming completes.Extensions
RAG Pipeline
Location:
extensions/rag/ragPipeline.tsArchitecture:
- Documents are chunked and embedded with OpenAI
text-embedding-3-small - Embeddings are stored in a vector store (in-memory for dev, Pinecone/Weaviate/pgvector for production)
- On each query, the top-K most similar documents are retrieved
- Retrieved context is injected into the system prompt
- LLM answers based on context, cites sources
Usage:
typescript
1const rag = new RAGPipeline({ topK: 4 });2
3await rag.addDocuments([4 { id: "doc-1", content: "Our refund policy allows returns within 30 days." },5 { id: "doc-2", content: "Premium subscribers get priority support." },6]);7
8const request = await rag.buildRequest({9 sessionId: "user-123",10 message: "What is your refund policy?",11});12
13const response = await chatbot.chat(request);14// → "According to our policy, returns are allowed within 30 days. [Source 1]"Why it's an extension:
RAG isn't needed for most chatbots. Keeping it in
extensions/ means it's not bundled unless explicitly imported. Applications that need RAG wire it in during the request flow—the framework never needs to know about it.Multi-Agent Orchestration
Location:
extensions/multiAgent/orchestrator.tsPattern:
A router analyzes user intent and dispatches to specialist sub-agents (coding, support, data analysis, general).
Current implementation:
Uses keyword matching. Production systems should replace this with an LLM intent classifier.
Usage:
typescript
1const orchestrator = new AgentOrchestrator();2
3const response = await orchestrator.route({4 sessionId: "user-456",5 message: "How do I write a TypeScript generic function?",6});7// → Routed to "coding" specialist (temperature: 0.2)Why it's an extension:
Most applications use a single chatbot persona. Multi-agent routing adds latency and complexity. By keeping it as an opt-in extension, simple use cases stay simple.
Screenshot Placeholder
TODO: Add screenshot showing example API response with session ID, message ID, content, model, token usage, and latency.
Performance Considerations
Memory Footprint
Measurement:
Each session stores:
- Session metadata: ~200 bytes
- Message history: ~500 bytes per message pair (user + assistant)
- LangChain memory overhead: ~1KB
Capacity:
A session with 50 messages consumes ~26KB. A server with 1GB available memory can hold ~38K concurrent sessions comfortably.
Scaling:
For higher session counts, implement the Redis adapter (interface is defined). Redis-backed memory supports horizontal scaling and survives restarts.
Streaming Latency
First token latency: 200-500ms (depends on OpenAI API)
Chunk delivery: SSE writes chunks immediately—no buffering. The 15-second heartbeat prevents proxy timeouts without affecting latency.
Network optimization:
- gzip compression is handled by Express middleware
- SSE messages are minimal JSON (
{ type: "delta", content: "..." })
Rate Limiting
Configured per-IP with
express-rate-limit:typescript
1const limiter = rateLimit({2 windowMs: env.RATE_LIMIT_WINDOW_MS, // 60 seconds3 max: env.RATE_LIMIT_MAX_REQUESTS, // 60 requests4 standardHeaders: true,5});Why per-IP:
Session-based rate limiting requires authentication. Per-IP is a reasonable default for public APIs. Applications with auth should add session-based rate limiting in the
beforeChat hook.Security Considerations
Input Validation
Every request body is validated with Zod before reaching the service layer:
typescript
1router.post("/chat", validate(ChatRequestSchema), chat);Invalid requests return 422 with field-level error details. Malformed JSON returns 400.
Helmet.js Security Headers
All responses include:
X-Content-Type-Options: nosniffX-Frame-Options: SAMEORIGINX-XSS-Protection: 1; mode=block- Content Security Policy (disabled in dev for easier debugging)
CORS Configuration
Development:
origin: "*" (allow all origins for local testing)Production: Reads
ALLOWED_ORIGINS from environment (comma-separated list)Secrets Management
All secrets live in
.env:.envis gitignored.env.exampledocuments required variables without exposing values- Zod validation crashes on missing secrets
Tool Safety
Tools are the largest attack surface (they execute arbitrary code). Mitigations:
- Schema validation: LLM-supplied arguments are validated before execution
- Error containment: Tools never throw—errors are captured and returned as tool output
- Read-only marking: Tools can be marked
readonly: true(useful for audit layers) - Permission system: Applications can filter tools based on user roles
Developer Experience
Zero-Config Usage
typescript
1import { createChatbot } from "chatbot-foundation";2
3const chatbot = createChatbot();4
5await chatbot.chat({6 sessionId: "user-123",7 message: "Hello!",8});Works immediately. All defaults are read from
.env.Deep Customization
typescript
1const chatbot = createChatbot({2 systemPrompt: "You are a helpful assistant.",3 llm: { model: "gpt-4o", temperature: 0.7 },4 memory: { maxMessages: 100, ttlSeconds: 7200 },5 tools: [weatherTool, calculatorTool],6 hooks: {7 beforeChat: async (req) => { /* auth */ },8 afterChat: async (res) => { /* logging */ },9 },10});Every default can be overridden.
Public API Surface
Everything consumers need is exported from
src/exports.ts:typescript
1export { createChatbot, ChatService } from "./services/chatService";2export { createLLM, createStreamingLLM } from "./ai/llm/openai";3export { SYSTEM_PROMPTS, buildSystemPrompt } from "./ai/prompts/systemPrompt";4export { registerTool, toolRegistry } from "./ai/tools/toolRegistry";5export { MemoryManager } from "./ai/memory/memoryManager";6export { createFunctionsAgent, runAgent } from "./ai/agents/agentFactory";7// + all types and schemasInternal modules are not re-exported—implementation details stay private.
Examples
examples/usage.ts contains 10 runnable examples:- Minimal usage
- Custom system prompts
- Conversation memory
- Custom tools
- Streaming
- Per-request overrides
- Lifecycle hooks
- Agents with tools
- Domain-specific chatbots
- Session management
Run with
npx tsx examples/usage.ts.Testing Strategy
Current state: No automated tests.
Recommended approach:
- Unit tests: Test service layer methods with mocked LLM responses
- Integration tests: Test full request flow with in-memory store
- Contract tests: Validate OpenAI API assumptions (model names, response shapes)
Why testing is easier:
Because the service layer has no HTTP dependencies, tests can call
chatbot.chat() directly without starting a server or mocking Express.Challenges & Tradeoffs
1. LangChain Abstraction Overhead
Challenge: LangChain adds ~2MB to the bundle and introduces indirection (e.g.,
RunnableSequence.from([prompt, llm, parser]) instead of direct API calls).Tradeoff: We gain provider abstraction and battle-tested primitives at the cost of bundle size. For applications where bundle size matters, we could add a "lite" mode that uses OpenAI SDK directly.
Mitigation: All LangChain usage is wrapped in factory functions. If we need to swap orchestration layers, only internal modules change.
2. In-Memory Store Doesn't Scale Horizontally
Challenge: In-memory sessions don't survive restarts and can't be shared across instances.
Tradeoff: Zero dependencies for dev/testing vs. Redis requirement for production scale. Most applications don't need horizontal scaling on day one.
Mitigation:
MemoryStore interface is defined. Implementing Redis means writing one adapter class—no framework changes needed.3. Streaming Requires SSE, Not WebSockets
Challenge: SSE is unidirectional. Applications that need bidirectional communication (e.g., "stop generation" button) must use WebSockets.
Tradeoff: Simpler protocol and better proxy compatibility vs. lack of client-to-server messaging.
Mitigation: The
stream() method is transport-agnostic—it returns an async generator. Controllers can wire this to SSE, WebSockets, or any other transport.4. No Built-In Authentication
Challenge: The framework doesn't handle auth because every application's requirements differ (JWT, session cookies, OAuth, API keys).
Tradeoff: Smaller framework surface vs. requiring integration work.
Mitigation: Authentication is injected via the
beforeChat hook. The framework provides the integration point—applications provide the policy.Diagram Placeholder
TODO: Add a Mermaid class diagram showing the relationships between ChatService, ChatChain, MemoryManager, ToolRegistry, and LLM factories.
Lessons Learned
1. Transport Agnosticism Requires Discipline
Early versions of
ChatService accepted Express Request objects. This seemed harmless until we needed to use the chatbot in a CLI tool—suddenly the entire API was coupled to HTTP.Lesson: Define service layer interfaces first, then build adapters. Never let transport concerns leak into business logic.
2. Fail-Fast Configuration Saves Debug Time
Silently defaulting missing environment variables led to confusing production bugs ("Why is the chatbot using the wrong model?"). Switching to validated configuration with detailed error messages eliminated an entire class of issues.
Lesson: Crash loudly on invalid config. The 10 seconds spent fixing
.env saves hours of debugging.3. Memory Management Is Subtle
Initial implementation stored raw LangChain messages directly. This worked until we needed to retrieve history for the
/history endpoint—LangChain's internal format wasn't suitable for API responses.Lesson: Maintain two representations: LangChain memory for chain invocation, raw messages for API retrieval. They're conceptually the same but serve different consumers.
4. Tool Validation Prevents Subtle Bugs
Without schema validation, the LLM once supplied
{ city: "Tokyo", unit: "celsius" } (note: unit not units). The tool threw, the chain failed, and the user saw a cryptic error.Lesson: Validate at the boundary. Zod schemas catch typos, missing fields, and type mismatches before they propagate.
5. Hooks Beat Inheritance
Early design used subclassing (
CustomerSupportChatbot extends ChatService). This became unmaintainable when we needed auth + logging + moderation—multiple inheritance isn't supported in JS.Lesson: Hooks compose better than inheritance. Middleware patterns win for cross-cutting concerns.
Future Improvements
1. Redis Memory Adapter
What: Implement
RedisStore class that satisfies the MemoryStore interface.Why: Enables horizontal scaling and survives restarts.
Effort: ~200 lines (ioredis setup + CRUD operations).
2. Observability Layer
What: Integrate LangSmith or LangFuse for tracing, token counting, and latency monitoring.
Why: Production systems need visibility into LLM calls—prompt versions, token costs, failure rates.
Effort: Moderate (requires LangChain callback setup).
3. Prompt Versioning
What: Store prompt templates in a database with version IDs. Log which version was used for each request.
Why: A/B testing and gradual rollout of prompt changes.
Effort: Moderate (requires prompt storage layer + migration from hardcoded templates).
4. Multi-LLM Support
What: Add factories for Anthropic, Mistral, Cohere, local models (Ollama).
Why: Different models have different strengths. Cost-sensitive applications can route cheap queries to
gpt-4.1-mini, expensive queries to gpt-4o.Effort: Low (add factory functions, LangChain handles the rest).
5. Test Suite
What: Unit tests for service layer, integration tests for full request flow, contract tests for OpenAI API.
Why: Confidence when refactoring, regression prevention, onboarding documentation.
Effort: High (requires test infrastructure + fixture setup).
Key Takeaways
-
Layered architecture enables reuse. By keeping the service layer transport-agnostic, the same chatbot works for REST, WebSocket, CLI, and background jobs without modification.
-
Fail-fast configuration prevents production bugs. Zod validation at startup crashes immediately with detailed errors rather than silently degrading in production.
-
Hooks compose better than inheritance. Middleware patterns (beforeChat/afterChat) support arbitrary composition of auth, logging, moderation, and analytics without framework changes.
-
LangChain provides leverage at the cost of abstraction. We gain provider abstraction and battle-tested primitives but pay in bundle size and indirection. Wrapping LangChain in factory functions keeps the door open for future swaps.
-
Type safety catches errors early. Zod schemas generate TypeScript types, validate runtime data, and coerce environment variables—eliminating manual type definitions and entire classes of bugs.
Conclusion
ChatBot Foundation demonstrates that production-grade infrastructure doesn't require custom implementations for every project. By abstracting common patterns—conversation memory, tool calling, streaming, prompt management—into a reusable framework, applications can focus on domain logic rather than plumbing.
The three-layer architecture (HTTP → Service → AI) ensures that the same chatbot works across any transport. Zod-based validation guarantees that invalid configurations never reach production. Lifecycle hooks provide integration points for auth, logging, and moderation without coupling the framework to specific implementations.
The framework currently powers conversational AI systems but is designed to extend: RAG pipelines, multi-agent orchestration, Redis memory adapters, and alternative LLM providers all plug in without modifying core code.
For teams building AI-powered applications, ChatBot Foundation eliminates the infrastructure work and lets you ship faster—bring your business logic, the framework handles the rest.
View the complete source code and documentation on GitHub. Have questions about building production AI systems? Get in touch.

