Retrieval-augmented generation, the way it should be done.

Private RAG systems for UK businesses.

AI assistants that answer from your own documents, policies and data. Grounded in the source, with citations. Built, hosted and operated by WiseSolutions in London for UK teams that need evidence, not guesses.

Book a discovery call See what we’ve built →

Section 01 · What it means

A model that looks things up before it answers.

A general-purpose AI model does not know your prices, your contracts, your internal policies or last week’s board minutes. Ask it about those and you get a fluent guess. That is the wrong failure mode for a London firm answering client questions, a UK clinic checking procedures, or a lettings team explaining a policy to landlords.

A RAG system fixes this by giving the model a librarian. When a question arrives, the system first searches a private index of your material, retrieves the passages that matter, and only then asks the model to compose an answer using those passages. Every answer can carry a citation back to the source paragraph, file or policy version.

Under the surface, that means embeddings, semantic search, hybrid retrieval, a vector database, re-ranking, a chunking strategy and a grounding policy. WiseSolutions handles that plumbing so the end user sees a plain answer and a source link.

The result is an assistant that sounds useful because it has done the reading. For regulated industries, that is the whole point. UK teams do not need a chatbot that improvises. They need a system that can say: here is the answer, here is the citation, and here is where the source stops.

Three things RAG always needs

An index of your source material, kept fresh as documents change.
A retrieval step that finds the right passages for a given question.
A grounding policy that tells the model how to behave when the answer is not in the source.

Get any of these wrong and the system either hallucinates or refuses to answer simple questions. We have seen both.

Section 02 · When it’s the right answer

RAG is the right tool when one of these is true.

Your knowledge changes often.

Prices, policies, contracts, fixtures, regulation. Anything you would have to retrain a model for, you can just re-index instead.

You need citations.

Compliance, legal, healthcare, finance. The answer has to be traceable back to a paragraph in a document. Every time.

The source is too big for a prompt.

A 600-page handbook, three years of meeting notes, a property portfolio. Retrieval surfaces the few paragraphs that matter for each question.

A wrong answer is expensive.

Refunds, lawsuits, regulator letters, reputational damage. Grounded answers with refusals beat fluent guesses every time.

And when RAG is not the right answer: when you just want a chatbot that sounds friendly, when your data is small enough to fit in a single prompt, or when what you actually need is a tone or behaviour change. Those are different problems. WiseSolutions will tell you before we build the wrong thing.

Section 03 · RAG vs alternatives

RAG is not the only pattern. It is the one we use when the source matters.

RAG versus fine-tuning is the first decision. Fine-tuning can teach an AI model a pattern: a classification style, a format, a tone, a repeated decision boundary. It is not the cleanest way to keep a policy library current. If the office handbook changes on Friday, we would rather re-index the source than retrain behaviour into a model and hope everyone remembers which version is live.

RAG versus long-context is the second decision. Long-context models are useful when the user needs to reason over a large pack of material in one sitting. They are less useful when a business needs repeatable answers over thousands of changing files. A retrieval layer gives us control over chunking strategy, source filters, hybrid retrieval, re-ranking and citations. It also keeps running cost predictable.

RAG versus traditional search is the third decision. Search returns documents. A grounded assistant returns an answer, explains its basis and links back to the evidence. For many London and UK teams, the winning setup is not pure semantic search or pure keyword search. It is hybrid retrieval, where exact names, dates and policy codes still matter alongside meaning.

We also check governance before architecture. The ICO guidance on AI and data protection, the UK government introduction to AI assurance, and the OWASP Top 10 for LLM Applications point to the same need: know what data is used, test the system, document the risk and keep humans accountable.

How we decide

Use RAG when evidence, freshness and citations matter.
Use fine-tuning when the behaviour or format is the product.
Use long-context when the work is one-off review, not an operating system.
Use search when people need documents, not answers.

Most production systems combine two of these. The mistake is pretending one pattern solves every knowledge problem.

Section 04 · How we build one

From source documents to a grounded assistant, in four phases.

Phase 01

Source audit

Where the material actually lives. Drive, SharePoint, Notion, Airtable, a folder of PDFs. We index every format and flag the documents that need cleaning before they are useful.
Phase 02

Retrieval design

Chunking, embeddings, metadata and the search layer. We choose the index strategy that fits the corpus, not the other way around. Long-form policy reads differently from a flat product catalogue.
Phase 03

Grounding and policy

How the assistant answers when retrieval is confident, weak, or empty. Refuse, defer, escalate to a human. Citations on every answer. The wrong policy here is where most failed RAG projects break.
Phase 04

Surface and operations

A web widget, a team chat bot, a WhatsApp number, an internal admin tool. Then logging, an evaluation suite, an alert when answer quality drifts, and a re-index cadence that matches how often the source changes.

Section 05 · What we’ve shipped

Grounded systems in production, anonymised.

Regulated services sector · UK · anonymised

Public Q&A grounded in 18,000 characters of UK legislation.

A property services firm needed a public-facing tool that could answer tenant and landlord questions about the UK Renters’ Rights Act 2025. We indexed the act and supporting guidance, wired the assistant to refuse anything outside scope, and gated access with an email step that quietly logged every question into a database for the team to review.

Source: full text of the Renters’ Rights Act 2025 plus internal guidance notes.
Surface: branded web page, embedded on the firm’s site.
Operations: every conversation logged, every refusal flagged, weekly review of edge cases.

Professional services firm · UK · anonymised

Internal knowledge assistant over three years of policy documents.

A firm with offices in three UK cities needed staff to stop asking the same compliance questions in Slack. We indexed the firm’s policy library and built an internal assistant that answers in plain language with a citation to the exact paragraph. When the answer is not in the policy, it refuses and tells the staff member who to ask.

Source: 280 internal policy documents across HR, compliance and operations.
Surface: Slack bot for staff, web admin for the compliance lead.
Operations: daily incremental re-index, monthly accuracy review against a held-out test set.

UK lettings agency · UK · anonymised

Property knowledge base for a lettings team handling repeat landlord questions.

A lettings agency was answering the same questions across email, WhatsApp and internal notes: what applies to this property, which fee is allowed, what changed after the latest policy update. We designed a property knowledge base that ties each answer to portfolio notes, tenancy guidance and approved operating procedures, with semantic search for plain-language questions and keyword filters for property codes.

Source: property notes, approved scripts, tenancy process documents and update memos.
Surface: internal web assistant for negotiators and property managers.
Operations: document-owner review queue, version tracking and weekly retrieval-quality checks.

Names redacted out of respect for our clients. We can share full details on a call.

Further reading: the RAG mistake everyone makes in 2026 · how we test models against a knowledge base before shipping.

Section 06 · How we quote

Sized to the work, quoted before we build.

Proof of concept

Four to six weeks

One source corpus, up to ~500 documents.
Internal surface (web or Slack).
Citations and refusal policy from day one.
Fixed quote before we start.

Production system

Eight to twelve weeks

Multiple source systems, access controls, audit log.
Public or staff-facing surface, branded.
Evaluation suite and quality alerting.
Quoted fixed after discovery.

Managed operations

Rolling monthly

UK or EU hosting on infrastructure we operate.
Re-indexing cadence matched to your sources.
Monthly accuracy review and tuning.
One named point of contact.

Each build is quoted as a fixed figure once we have seen the corpus and the risk. Model usage and hosting are passed through at cost.

Section 07 · Production detail

What changes when a RAG system moves from demo to production.

The demo version of RAG is usually simple: upload documents, ask questions, get impressive answers. The production version is different. It has to know which documents are approved, which users can see which material, how long logs are kept, how answers are reviewed, and who owns a correction when the system finds weak evidence.

WiseSolutions starts that work before the first index is built. We ask where the source of truth lives, which documents should never be indexed, which teams need separate access, and which questions the assistant must refuse. A UK business does not need a clever demo if the operating rules are missing.

The hardest production decision is usually not the vector database. It is document ownership. If HR owns one policy, compliance owns another and operations keeps a working copy in Drive, the assistant needs a rule for conflicts. Otherwise the system will confidently cite the wrong version.

We also design the review loop. A grounded answer can still be incomplete if the right document was missing, the chunk was too small, the metadata was wrong or the user asked in a way the retrieval layer did not expect. The system should expose those misses instead of hiding them behind a polished paragraph.

Production questions

Who owns each source? Every policy, folder and dataset needs a named owner.
Who can ask? Internal staff, clients and public users usually need different access rules.
What should be refused? A good refusal policy is part of the product, not an afterthought.
How is quality reviewed? Real questions need to be sampled and checked against source material.
What happens after a bad answer? The team needs a correction path, not a vague promise that the model will improve.

The answers to these questions shape the architecture more than any vendor choice.

Section 08 · Quality control

We test retrieval before we trust the answers.

Question set

We collect real questions from staff, support tickets, sales calls and documents. Synthetic tests help, but live wording tells us how people actually ask.

Expected evidence

For each question, we define which passages should be retrieved. If the right evidence does not appear, the answer should not be trusted.

Refusal cases

We test questions the assistant should not answer, including missing policies, personal data requests, legal judgement and outdated documents.

Drift checks

We rerun the same questions after document updates, embedding changes and retrieval tuning so the team can see when quality moves.

This is where many RAG projects fail. The interface looks finished, but nobody has measured whether the right passages are being retrieved. We treat retrieval quality as the product.

Section 09 · Data handling

The document pipeline matters as much as the assistant.

A RAG system is only as clean as the material it reads. PDFs can hide tables, scans can lose footnotes, folders can contain drafts, and old policies can sit beside current versions. We inspect that mess directly instead of pretending the index will sort it out.

The ingestion process usually needs document type detection, OCR for scanned files, metadata extraction, chunking rules, deduplication and access filters. For a London office with ten folders, that can be simple. For a UK services group with years of operational files, it becomes a controlled pipeline.

We also decide how citations should look. A staff assistant may only need a file name and paragraph. A client-facing assistant may need a source title, date, version and a short explanation of why that source was used. The citation is part of the trust mechanism.

If the source changes often, we build the update path into the system. That may be a nightly re-index, a webhook from the document store, a manual approval queue or a staged release where new documents are tested before the public assistant sees them.

Ingestion checks

Format: native PDF, scanned PDF, spreadsheet, web page, database row or image.
Freshness: current version, draft, archive, duplicate or superseded source.
Access: who can read the source and who can see answers based on it.
Metadata: document owner, date, department, topic, location and version.
Traceability: source link, citation text and review status.

The better the source pipeline, the less pressure we put on the model to guess.

Section 10 · Governance

A useful RAG assistant should make risk visible.

We do not want a private knowledge system to feel mysterious. The team should know which sources are indexed, how many documents failed ingestion, which questions are being refused, and where users are still asking a person because the assistant is not good enough yet.

For GDPR-sensitive work, we document personal data flows, retention, access controls and deletion paths. That is practical governance, not paperwork for its own sake. It tells the business what the assistant is allowed to know and how that knowledge can be removed.

We prefer dashboards that show a few useful numbers: total questions, answered questions, refused questions, low-confidence retrievals, document updates and reviewed failures. If those numbers are moving in the wrong direction, the system needs attention.

The goal is a knowledge system that earns trust slowly. It answers well, cites clearly, refuses when the source is weak and gives the client a way to improve it month by month.

Monthly review

Top unanswered questions: decide whether the source is missing or the assistant should refuse.
Weak citations: inspect answers where the cited passages were thin or ambiguous.
Source drift: check whether new policies have replaced old ones in the index.
Access exceptions: review attempts to ask about restricted material.
Business outcome: compare usage against fewer tickets, faster replies or lower manual review.

RAG improves when real usage is reviewed. It decays when nobody looks.

Section 11 · Questions we get asked

RAG, answered honestly.

What is a RAG system, in plain terms?

A RAG system is an AI assistant that answers questions using your own documents, policies and data instead of generic internet knowledge. We index the source material, the AI model looks up the most relevant passages at query time, and the answer is grounded in those passages with citations back to the originals.

When does RAG make more sense than a regular chatbot or a fine-tuned model?

RAG wins when your information changes often (policies, prices, contracts), when you need citations back to the source, when the dataset is too large to fit in a prompt, or when the cost of a wrong answer is high. Fine-tuning makes more sense when you need a model to adopt a tone or behaviour. The two are not mutually exclusive.

Where does the data sit? Is it sent outside the UK?

You choose. We can run the document store, vector database, logs and workflow layer on UK or EU infrastructure. For sensitive content we default to a setup where source documents stay inside your environment or a UK-hosted environment WiseSolutions operates.

How long does a first RAG system take to build?

A focused proof of concept on a single document corpus is usually four to six weeks. A production system with access controls, audit logs and a user-facing surface (web, Slack, WhatsApp) is typically eight to twelve weeks.

What does it cost to run, ongoing?

It depends on volume and which model the system uses, so we size the running cost to your corpus and quote it upfront as part of the proposal, before anything is committed. Hosting and model usage are passed through at cost and itemised, so you always see exactly what the system consumes.

Can the system answer outside the source material?

By default, no. We configure the system to refuse, or to clearly mark answers that go beyond the source. You decide the policy. For regulated industries we usually lock it to source-only answers with citations.

What is the difference between RAG and fine-tuning?

RAG changes what the AI model can look up. Fine-tuning changes how a model behaves. If the question is about current policies, contract clauses, price lists or internal procedures, RAG is usually the first answer. If the issue is tone, formatting or a repeated classification style, fine-tuning may be useful later.

Can you index PDFs, images and scanned documents?

Yes, but quality varies. Native PDFs are usually straightforward. Scanned PDFs and images need OCR, layout detection and a review step because tables, stamps and handwritten notes can be misread. We flag low-confidence documents instead of pretending every page indexed cleanly.

How do you measure whether a RAG system works?

Before launch we build a test set of real questions, expected source passages, refusal cases and edge cases. We measure retrieval accuracy, citation quality, answer usefulness, refusal behaviour and support-ticket reduction after launch.

Do you work with Pinecone, Weaviate or other vector databases?

Yes. We choose the vector database around the corpus, compliance needs and operating budget. Pinecone and Weaviate are both valid choices. For some UK businesses, Postgres with vector search is simpler and cheaper to operate.

What happens when documents are updated?

We set a re-index cadence that matches the source. Some systems re-index nightly. Others watch folders or databases and update within minutes. Each answer can show which source version it used so old material is easier to detect.

Can a RAG system be GDPR compliant?

Yes, if it is designed that way. We map personal data, define lawful basis and retention, restrict access, log usage, support deletion, and run a data protection impact assessment when the risk profile calls for one.

§ Enquiry · VI · rag-discovery

Have a document pile that should already be answering questions?

Tell us what you are trying to ground an assistant in. We will reply within one working day with first questions or a discovery call slot.