01
Your knowledge changes often.
Prices, policies, contracts, fixtures, regulation. Anything you would have to retrain a model for, you can just re-index instead.
Retrieval-augmented generation, the way it should be done.
AI assistants that answer from your own documents, policies and data. Grounded in the source, with citations. Built, hosted and operated by WiseSolutions in London for UK teams that need evidence, not guesses.
Section 01 · What it means
A general-purpose AI model does not know your prices, your contracts, your internal policies or last week’s board minutes. Ask it about those and you get a fluent guess. That is the wrong failure mode for a London firm answering client questions, a UK clinic checking procedures, or a lettings team explaining a policy to landlords.
A RAG system fixes this by giving the model a librarian. When a question arrives, the system first searches a private index of your material, retrieves the passages that matter, and only then asks the model to compose an answer using those passages. Every answer can carry a citation back to the source paragraph, file or policy version.
Under the surface, that means embeddings, semantic search, hybrid retrieval, a vector database, re-ranking, a chunking strategy and a grounding policy. WiseSolutions handles that plumbing so the end user sees a plain answer and a source link.
The result is an assistant that sounds useful because it has done the reading. For regulated industries, that is the whole point. UK teams do not need a chatbot that improvises. They need a system that can say: here is the answer, here is the citation, and here is where the source stops.
Three things RAG always needs
Get any of these wrong and the system either hallucinates or refuses to answer simple questions. We have seen both.
Section 02 · When it’s the right answer
01
Prices, policies, contracts, fixtures, regulation. Anything you would have to retrain a model for, you can just re-index instead.
02
Compliance, legal, healthcare, finance. The answer has to be traceable back to a paragraph in a document. Every time.
03
A 600-page handbook, three years of meeting notes, a property portfolio. Retrieval surfaces the few paragraphs that matter for each question.
04
Refunds, lawsuits, regulator letters, reputational damage. Grounded answers with refusals beat fluent guesses every time.
And when RAG is not the right answer: when you just want a chatbot that sounds friendly, when your data is small enough to fit in a single prompt, or when what you actually need is a tone or behaviour change. Those are different problems. WiseSolutions will tell you before we build the wrong thing.
Section 03 · RAG vs alternatives
RAG versus fine-tuning is the first decision. Fine-tuning can teach an AI model a pattern: a classification style, a format, a tone, a repeated decision boundary. It is not the cleanest way to keep a policy library current. If the office handbook changes on Friday, we would rather re-index the source than retrain behaviour into a model and hope everyone remembers which version is live.
RAG versus long-context is the second decision. Long-context models are useful when the user needs to reason over a large pack of material in one sitting. They are less useful when a business needs repeatable answers over thousands of changing files. A retrieval layer gives us control over chunking strategy, source filters, hybrid retrieval, re-ranking and citations. It also keeps running cost predictable.
RAG versus traditional search is the third decision. Search returns documents. A grounded assistant returns an answer, explains its basis and links back to the evidence. For many London and UK teams, the winning setup is not pure semantic search or pure keyword search. It is hybrid retrieval, where exact names, dates and policy codes still matter alongside meaning.
We also check governance before architecture. The ICO guidance on AI and data protection, the UK government introduction to AI assurance, and the OWASP Top 10 for LLM Applications point to the same need: know what data is used, test the system, document the risk and keep humans accountable.
How we decide
Most production systems combine two of these. The mistake is pretending one pattern solves every knowledge problem.
Section 04 · How we build one
Phase 01
Where the material actually lives. Drive, SharePoint, Notion, Airtable, a folder of PDFs. We index every format and flag the documents that need cleaning before they are useful.
Phase 02
Chunking, embeddings, metadata and the search layer. We choose the index strategy that fits the corpus, not the other way around. Long-form policy reads differently from a flat product catalogue.
Phase 03
How the assistant answers when retrieval is confident, weak, or empty. Refuse, defer, escalate to a human. Citations on every answer. The wrong policy here is where most failed RAG projects break.
Phase 04
A web widget, a team chat bot, a WhatsApp number, an internal admin tool. Then logging, an evaluation suite, an alert when answer quality drifts, and a re-index cadence that matches how often the source changes.
Section 05 · What we’ve shipped
Regulated services sector · UK · anonymised
A property services firm needed a public-facing tool that could answer tenant and landlord questions about the UK Renters’ Rights Act 2025. We indexed the act and supporting guidance, wired the assistant to refuse anything outside scope, and gated access with an email step that quietly logged every question into a database for the team to review.
Professional services firm · UK · anonymised
A firm with offices in three UK cities needed staff to stop asking the same compliance questions in Slack. We indexed the firm’s policy library and built an internal assistant that answers in plain language with a citation to the exact paragraph. When the answer is not in the policy, it refuses and tells the staff member who to ask.
UK lettings agency · UK · anonymised
A lettings agency was answering the same questions across email, WhatsApp and internal notes: what applies to this property, which fee is allowed, what changed after the latest policy update. We designed a property knowledge base that ties each answer to portfolio notes, tenancy guidance and approved operating procedures, with semantic search for plain-language questions and keyword filters for property codes.
Names redacted out of respect for our clients. We can share full details on a call.
Further reading: the RAG mistake everyone makes in 2026 · how we test models against a knowledge base before shipping.
Section 06 · What it costs
Proof of concept
from £6,000
Production system
from £15,000
Managed operations
from £400/month
Model usage and hosting are passed through at cost.
Section 07 · Production detail
The demo version of RAG is usually simple: upload documents, ask questions, get impressive answers. The production version is different. It has to know which documents are approved, which users can see which material, how long logs are kept, how answers are reviewed, and who owns a correction when the system finds weak evidence.
WiseSolutions starts that work before the first index is built. We ask where the source of truth lives, which documents should never be indexed, which teams need separate access, and which questions the assistant must refuse. A UK business does not need a clever demo if the operating rules are missing.
The hardest production decision is usually not the vector database. It is document ownership. If HR owns one policy, compliance owns another and operations keeps a working copy in Drive, the assistant needs a rule for conflicts. Otherwise the system will confidently cite the wrong version.
We also design the review loop. A grounded answer can still be incomplete if the right document was missing, the chunk was too small, the metadata was wrong or the user asked in a way the retrieval layer did not expect. The system should expose those misses instead of hiding them behind a polished paragraph.
Production questions
The answers to these questions shape the architecture more than any vendor choice.
Section 08 · Quality control
01
We collect real questions from staff, support tickets, sales calls and documents. Synthetic tests help, but live wording tells us how people actually ask.
02
For each question, we define which passages should be retrieved. If the right evidence does not appear, the answer should not be trusted.
03
We test questions the assistant should not answer, including missing policies, personal data requests, legal judgement and outdated documents.
04
We rerun the same questions after document updates, embedding changes and retrieval tuning so the team can see when quality moves.
This is where many RAG projects fail. The interface looks finished, but nobody has measured whether the right passages are being retrieved. We treat retrieval quality as the product.
Section 09 · Data handling
A RAG system is only as clean as the material it reads. PDFs can hide tables, scans can lose footnotes, folders can contain drafts, and old policies can sit beside current versions. We inspect that mess directly instead of pretending the index will sort it out.
The ingestion process usually needs document type detection, OCR for scanned files, metadata extraction, chunking rules, deduplication and access filters. For a London office with ten folders, that can be simple. For a UK services group with years of operational files, it becomes a controlled pipeline.
We also decide how citations should look. A staff assistant may only need a file name and paragraph. A client-facing assistant may need a source title, date, version and a short explanation of why that source was used. The citation is part of the trust mechanism.
If the source changes often, we build the update path into the system. That may be a nightly re-index, a webhook from the document store, a manual approval queue or a staged release where new documents are tested before the public assistant sees them.
Ingestion checks
The better the source pipeline, the less pressure we put on the model to guess.
Section 10 · Governance
We do not want a private knowledge system to feel mysterious. The team should know which sources are indexed, how many documents failed ingestion, which questions are being refused, and where users are still asking a person because the assistant is not good enough yet.
For GDPR-sensitive work, we document personal data flows, retention, access controls and deletion paths. That is practical governance, not paperwork for its own sake. It tells the business what the assistant is allowed to know and how that knowledge can be removed.
We prefer dashboards that show a few useful numbers: total questions, answered questions, refused questions, low-confidence retrievals, document updates and reviewed failures. If those numbers are moving in the wrong direction, the system needs attention.
The goal is a knowledge system that earns trust slowly. It answers well, cites clearly, refuses when the source is weak and gives the client a way to improve it month by month.
Monthly review
RAG improves when real usage is reviewed. It decays when nobody looks.
Section 11 · Questions we get asked
A RAG system is an AI assistant that answers questions using your own documents, policies and data instead of generic internet knowledge. We index the source material, the AI model looks up the most relevant passages at query time, and the answer is grounded in those passages with citations back to the originals.
RAG wins when your information changes often (policies, prices, contracts), when you need citations back to the source, when the dataset is too large to fit in a prompt, or when the cost of a wrong answer is high. Fine-tuning makes more sense when you need a model to adopt a tone or behaviour. The two are not mutually exclusive.
You choose. We can run the document store, vector database, logs and workflow layer on UK or EU infrastructure. For sensitive content we default to a setup where source documents stay inside your environment or a UK-hosted environment WiseSolutions operates.
A focused proof of concept on a single document corpus is usually four to six weeks. A production system with access controls, audit logs and a user-facing surface (web, Slack, WhatsApp) is typically eight to twelve weeks.
Hosting and model usage for a small-team RAG system typically lands between £200 and £800 per month, depending on volume and which model is used. We size and quote this upfront so there are no surprises.
By default, no. We configure the system to refuse, or to clearly mark answers that go beyond the source. You decide the policy. For regulated industries we usually lock it to source-only answers with citations.
RAG changes what the AI model can look up. Fine-tuning changes how a model behaves. If the question is about current policies, contract clauses, price lists or internal procedures, RAG is usually the first answer. If the issue is tone, formatting or a repeated classification style, fine-tuning may be useful later.
Yes, but quality varies. Native PDFs are usually straightforward. Scanned PDFs and images need OCR, layout detection and a review step because tables, stamps and handwritten notes can be misread. We flag low-confidence documents instead of pretending every page indexed cleanly.
Before launch we build a test set of real questions, expected source passages, refusal cases and edge cases. We measure retrieval accuracy, citation quality, answer usefulness, refusal behaviour and support-ticket reduction after launch.
Yes. We choose the vector database around the corpus, compliance needs and operating budget. Pinecone and Weaviate are both valid choices. For some UK businesses, Postgres with vector search is simpler and cheaper to operate.
We set a re-index cadence that matches the source. Some systems re-index nightly. Others watch folders or databases and update within minutes. Each answer can show which source version it used so old material is easier to detect.
Yes, if it is designed that way. We map personal data, define lawful basis and retention, restrict access, log usage, support deletion, and run a data protection impact assessment when the risk profile calls for one.
§ Enquiry · VI · rag-discovery
Tell us what you are trying to ground an assistant in. We will reply within one working day with first questions or a discovery call slot.
A 30-minute call.
We'll tell you whether we're the right fit and what it'd look like.