Get in touch
team@kovazu.comEnterprise SaaS Platform
Key Outcome
68% reduction in ticket resolution time — $380K annual support cost saved
The Challenge
The client's support team was handling over 4,000 tickets per month with a 14-person team. Average resolution time was 6.2 hours, first-response SLA was being missed 31% of the time, and agent burnout was measurably impacting quality. Off-the-shelf chatbots had been tried and failed — they couldn't handle the product's technical complexity.
Our Approach
We audited the existing knowledge base, identified structural gaps, and rebuilt the retrieval architecture from the ground up. Rather than replacing agents, we designed the system to augment them — surfacing the right context at the right moment so every agent operates at the level of the best agent on the team. We ran a 3-week parallel evaluation before full rollout, iterating on chunking strategy and prompt design based on real agent feedback.
Tech Stack
System Architecture
Build Pipeline
Ingested 12,000+ documentation pages, support history, and product specs. Applied semantic chunking with 512-token windows and 20% overlap. Each chunk was tagged with product area, version, and confidence metadata to enable filtered retrieval.
Embedded all chunks using Gemini 3.1 embeddings and stored in Pinecone with a hybrid search configuration (dense + BM25 sparse vectors). Namespace separation by product line allowed targeted retrieval without cross-contamination of unrelated knowledge.
An incoming ticket is first classified by a lightweight Claude Haiku 4.5 call into one of 8 query types (billing, technical, onboarding, escalation, etc.). An orchestration layer exposes the CRM, knowledge base, and billing systems as MCP (Model Context Protocol) servers, so the router can pull account-specific context through standardised tool-calls without bespoke integrations per data source.
The top-k retrieved chunks (k=6, semantically reranked) are injected into a carefully designed system prompt that instructs Claude Opus 4.8 to draft a reply in the client's brand voice. The prompt enforces citation format and instructs the model to flag uncertainty rather than hallucinate.
The draft reply surfaces in a collapsible panel inside the existing Zendesk interface via a custom app built with the Zendesk Apps Framework. Agents see the draft, the source documents it drew from, and a confidence indicator. One click inserts the draft into the reply field — editable before sending.
Every accepted or edited reply is logged with the delta between AI draft and final sent message. We built a lightweight evaluation dashboard in LangSmith that tracks drift, edit rate by query type, and citation accuracy. Monthly model reviews use this data to refine prompts and update retrieval configuration.
Results
68%
Reduction in Resolution Time
Average ticket resolution dropped from 6.2 hours to under 2 hours within 8 weeks of full rollout.
$380K
Annual Cost Saving
Eliminated the need for 4 planned support hires as the existing team absorbed increased ticket volume with AI assistance.
94%
Agent Adoption Rate
Near-total voluntary adoption within 6 weeks — agents reported the tool as the most useful software introduced in two years.
TOGETHERWork with us if average isn't your thing. Drop it, we'll build it!
SAY HELLO