AI Integration

AI-Powered Support Intelligence Platform

Enterprise SaaS Platform

Key Outcome

68% reduction in ticket resolution time — $380K annual support cost saved

The Challenge

The client's support team was handling over 4,000 tickets per month with a 14-person team. Average resolution time was 6.2 hours, first-response SLA was being missed 31% of the time, and agent burnout was measurably impacting quality. Off-the-shelf chatbots had been tried and failed — they couldn't handle the product's technical complexity.

Our Approach

We audited the existing knowledge base, identified structural gaps, and rebuilt the retrieval architecture from the ground up. Rather than replacing agents, we designed the system to augment them — surfacing the right context at the right moment so every agent operates at the level of the best agent on the team. We ran a 3-week parallel evaluation before full rollout, iterating on chunking strategy and prompt design based on real agent feedback.

Tech Stack

Claude Opus 4.8RAGMCPPineconeNext.jsFastAPI
Explore our AI Integration service

System Architecture

How the system flows

TicketInboundClassifierClaude Haiku 4.5Vector DBPineconeRetrieverHybrid + rerankMCP ToolsCRM · KBClaude Opus 4.8RAG draftAgent UILive assist

Build Pipeline

How We Built It

01

Knowledge Base Ingestion & Chunking

Ingested 12,000+ documentation pages, support history, and product specs. Applied semantic chunking with 512-token windows and 20% overlap. Each chunk was tagged with product area, version, and confidence metadata to enable filtered retrieval.

02

Vector Embedding & Indexing

Embedded all chunks using Gemini 3.1 embeddings and stored in Pinecone with a hybrid search configuration (dense + BM25 sparse vectors). Namespace separation by product line allowed targeted retrieval without cross-contamination of unrelated knowledge.

03

Query Classification & MCP Routing

An incoming ticket is first classified by a lightweight Claude Haiku 4.5 call into one of 8 query types (billing, technical, onboarding, escalation, etc.). An orchestration layer exposes the CRM, knowledge base, and billing systems as MCP (Model Context Protocol) servers, so the router can pull account-specific context through standardised tool-calls without bespoke integrations per data source.

04

Retrieval-Augmented Generation

The top-k retrieved chunks (k=6, semantically reranked) are injected into a carefully designed system prompt that instructs Claude Opus 4.8 to draft a reply in the client's brand voice. The prompt enforces citation format and instructs the model to flag uncertainty rather than hallucinate.

05

Agent UI Integration

The draft reply surfaces in a collapsible panel inside the existing Zendesk interface via a custom app built with the Zendesk Apps Framework. Agents see the draft, the source documents it drew from, and a confidence indicator. One click inserts the draft into the reply field — editable before sending.

06

Feedback Loop & Model Evaluation

Every accepted or edited reply is logged with the delta between AI draft and final sent message. We built a lightweight evaluation dashboard in LangSmith that tracks drift, edit rate by query type, and citation accuracy. Monthly model reviews use this data to refine prompts and update retrieval configuration.

Results

What We Delivered

68%

Reduction in Resolution Time

Average ticket resolution dropped from 6.2 hours to under 2 hours within 8 weeks of full rollout.

$380K

Annual Cost Saving

Eliminated the need for 4 planned support hires as the existing team absorbed increased ticket volume with AI assistance.

94%

Agent Adoption Rate

Near-total voluntary adoption within 6 weeks — agents reported the tool as the most useful software introduced in two years.

LET'S WORKTOGETHER

Work with us if average isn't your thing. Drop it, we'll build it!

SAY HELLO