Folio · Case No. 03 UK anonymised
Grounded Q and A System for a UK Regulated Services Sector
How we built a grounded Q and A assistant that answers only from approved source text, refuses unsupported questions and logs every interaction for review.
Outcome metrics
A UK regulated services firm was answering the same set of questions by hand every week. Most of the questions were predictable: eligibility, timelines, rights, obligations, documents, next steps and edge cases that sounded urgent to the person asking. The staff knew the answers, but they also knew the risk. A quick unsupported answer could contradict the firm’s approved guidance or drift away from current regulation.
The client did not need a chatbot that sounded clever. They needed a Q and A system that could stay inside the evidence. If the answer was present in approved source text, the system should give a clear response and cite the basis for it internally. If the answer was not present, it should refuse gracefully and route the person to a human. That boundary was the whole project.
The problem
The firm had roughly forty recurring questions that created a steady admin burden. None of them justified a long advisory call on their own, but each one required care. Staff were copying fragments from guidance notes, editing them for tone, checking whether the question had a regulatory angle, and then deciding whether to answer or ask for more information.
The risk was not speed. The risk was confidence without support. In a regulated sector, a fluent wrong answer is worse than no answer at all. The client had already seen generic AI tools produce plausible wording that missed the qualifying conditions. That made them rightly sceptical. They wanted automation only if the system could be constrained.
The source material was also small enough to be dangerous. There were about 18,000 approved characters in the first version, based on the live public guidance and internal explanations. That is enough to answer common questions, but not enough to improvise. The system had to know when it did not know.
The approach
We began by turning the source material into a controlled knowledge base. We removed duplicate wording, split the content into answerable sections, and gave each section a plain-English label. We also marked the boundaries: what the system could explain, what it should not advise on, and which questions should always pass to a human.
The AI orchestrator was designed around refusal, not only response. Every user question goes through a relevance check before an answer is attempted. If the question is outside the approved scope, the system says so clearly and offers the correct contact route. If the question is inside scope but needs personal facts the user has not provided, the system asks a narrow follow-up question instead of guessing.
The answer step is grounded in the approved source text. The system does not answer from general knowledge, memory or internet assumptions. It retrieves the relevant passages, checks whether they are sufficient, and writes a short answer in the firm’s tone. Where the evidence is thin, the answer is withheld and the interaction is routed for review.
We also built an email gate because the client wanted every unresolved or high-intent question captured. If a person asks something the system cannot answer, or if the question suggests they need case-specific help, the widget collects an email address and passes the full conversation to the team. That means a refusal is not a dead end. It becomes a better handover.
What the system actually does
The web widget accepts a natural-language question and sends it to the AI orchestrator. The orchestrator classifies the question, retrieves approved source sections, decides whether the evidence is strong enough, and then either answers, asks a follow-up, or refuses with a human contact path. Every interaction is stored in an audit log with the question, decision, answer type and handover status.
The team reviews the log weekly. They can see which questions were answered, which were refused, and which source sections were used most often. That review loop is important because it turns real user questions into maintenance work. When the same refused question appears repeatedly, the team can decide whether to add approved wording or keep it out of scope.
The system is deliberately modest. It does not pretend to be an adviser. It does not produce long speculative answers. It handles the common explanatory layer, captures the next step, and gives staff a clear record when human review is needed.
That modesty also made adoption easier. Staff could test the widget against questions they already knew, compare the answer with the approved guidance, and see why the system refused borderline cases. The audit log changed the conversation from trust us to inspect the evidence. For a regulated team, that difference mattered.
The outcome
Eighteen thousand characters of approved source text now answer the majority of routine questions. Across six months of logged usage, 82 percent of frequently asked questions were resolved without staff intervention. The client recorded zero hallucinations because unsupported questions were refused rather than padded out with invented detail.
The anonymised client quote captured the value well: “The useful part is not that it answers everything. It is that it knows when to stop, and our team can see exactly what happened.”
What we would do differently next time
We would build the editorial workflow earlier. The first version worked because the source material was small and carefully prepared, but the client quickly discovered new questions from real users. Updating the knowledge base was easy enough for us, but the internal approval process should have been part of the first delivery.
We would also separate public guidance from internal guidance more explicitly. Some answers were safe to show publicly, while others were useful for staff but not appropriate for an automated web response. In the second version, we would tag each source section by audience from day one.
The main lesson is that grounded systems need a narrower ambition than most people expect. The win is not a chatbot that can talk about anything. The win is a reliable assistant that answers a known set of questions, refuses the rest, and gives the human team a better starting point.
For regulated sectors, that restraint is the product.
Does this look like the kind of system your team needs?
Tell us the constraint and we'll tell you what we'd build, in what order, and what it would cost. 30 minutes on a call.
Book a discovery call