Skip to main content
AI Backend & Architecture

Building PrismBot: A Multi-Tenant, Multi-Agent Chat Architecture

A multi-tenant AI chatbot backend built with NestJS and TypeScript that utilizes LangGraph for a supervisor-specialist agent routing system, delivering real-time responses and autonomous CRM updates via WebSockets.

Platform

SaaS Backend

Duration

Ongoing Development

3-Stage

RAG pipeline with parallel query expansion

100%

Tenant isolation across vectors, KB, and CRM

2

Specialist agents (RAG Expert & CRM Sales)

Project overview

The PrismBot architecture proves that moving away from monolithic LLM prompts toward a multi-agent, supervisor-driven model drastically improves reliability and feature depth. By strictly isolating tenant context and decoupling the WebSocket layer from the LLM execution, we created a highly scalable platform.

Platform

SaaS Backend

Duration

Ongoing Development

Type

AI Backend & Architecture

Stack

7 technologies

The challenge

Single-agent LLM systems struggle with context bloat and lack the ability to simultaneously query complex knowledge bases while also updating customer relationship databases autonomously. Additionally, serving multiple enterprise tenants requires strict data isolation to prevent data leakage.

Static AI pipelines cannot easily update CRM records based on conversation context.

Monolithic LLMs are vulnerable to jailbreaks and script injections.

Data leakage risks in shared vector databases and chat histories.

Tightly coupled WebSocket logic makes it hard to scale AI agent tools.

What we set out to do

  • 01

    Build a strict multi-tenant architecture scoping requests to specific tenants from the first step.

  • 02

    Implement a Supervisor LLM that never answers questions, but strictly routes to specialist agents.

  • 03

    Create a comprehensive RAG pipeline with query expansion and MMR reranking.

  • 04

    Automate CRM profile updates using conversational context without manual intervention.

  • 05

    Decouple Socket.io infrastructure from agent logic to allow safe admin interventions.

How we solved it

1

Strict Multi-Tenancy Architecture

Every request is scoped to a tenant by resolving the domain during the WebSocket handshake. Vector embeddings are isolated via namespace columns, and OpenAI API keys are decrypted per tenant.

Key decision

Injecting a custom YAML tenant context block into every LLM call.

Result

Absolute data isolation and highly customized bot personas per client.

2

LangGraph Supervisor Routing

Deployed a Supervisor LLM that only evaluates the flow state (Guardrails → Flow 2 → Flow 3 → Flow 1). It checks for greetings or CRM replies before routing genuine queries to specialized agents.

Key decision

Restricting the supervisor from answering questions directly.

Result

Efficient, modular execution where specialized agents handle focused tasks.

3

Three-Stage RAG Pipeline

Built an advanced RAG Expert agent that runs a 3-stage pipeline: Query Expansion (generating 3 variants via OpenAI), parallel embedding and PGVector search, and MMR reranking (λ=0.7) to balance relevance and diversity

Key decision

Expanding fetched chunks to include ±1 neighboring chunks for coherence.

Result

Highly accurate, context-aware answers deduplicated by chunk ID

4

Autonomous CRM Sales Agent

Created a mandatory CRM agent that runs after the RAG pipeline. It reads the MongoDB user record, infers new signals from the chat, and updates the database without overwriting existing maps.

Key decision

Utilizing dot-notation $set in MongoDB to cleanly merge new user data.

Result

Automated collection of user information for warmer human handoffs.

5

Decoupled WebSocket Delivery Bridge

Abstracted WebSocket delivery by creating GatewayCallbacks closures bound to the socket. The agents only know they are calling a shared send_response tool.

Key decision

Utilizing a single send_response tool for all message delivery across agents.

Result

Agents remain agnostic to Socket.io internals, enabling clean architecture.

Tech stack

NNestJSTTypeScriptLLangGraphSSocket.IOMMongoDBPPostgreSQL + PGVectorOOpenAI

What we learned

The PrismBot architecture proves that moving away from monolithic LLM prompts toward a multi-agent, supervisor-driven model drastically improves reliability and feature depth. By strictly isolating tenant context and decoupling the WebSocket layer from the LLM execution, we created a highly scalable platform.

  • 01

    Parallel query expansion significantly improves the recall of relevant vector chunks.

  • 02

    Abstracting WebSockets via closures allows LLM agents to safely trigger UI updates without knowing the transport layer.

  • 03

    Running a mandatory CRM agent post-query ensures user profiles stay updated dynamically.

  • 04

    Pre-pipeline guardrails are essential to prevent costly and dangerous prompt injections.

Ready to build something that matters?

We solve problems that don't have Stack Overflow answers. Let's talk.

Book a Discovery Call