Folio · Case No. 01 UK · anonymised
Finance Operations Automation: Invoice Triage for a UK Professional Services Firm
How we replaced four hours a week of manual invoice triage with an AI workflow that captures, classifies, files and reports, without disturbing how the finance team works.
Outcome metrics
A professional services firm in the UK was losing roughly four hours a week to invoice triage. Not to processing invoices, which finance handled cleanly. The cost was in the work that came before: opening every email, deciding whether it was a supplier invoice, an HMRC notice, a receipt or noise, downloading attachments, renaming files to match the firm’s convention, dropping them into the right Drive folder, logging them in a spreadsheet, and forwarding the relevant ones to the bookkeeper.
It was the kind of work that nobody enjoys, nobody notices when it goes well, and nobody can skip without something falling through. Exactly the kind of work AI is genuinely good at.
The problem
The firm had three recurring pain points. Invoices arrived in a single shared inbox that also received contracts, marketing emails and personal messages. Attachments came in inconsistent formats: PDFs with sensible filenames, photos of paper receipts from the founder’s phone, supplier invoices with reference numbers buried inside the body of the email rather than in the file. And the monthly bookkeeping handover required a manual sweep through three months of mail, double-checking that nothing had been missed before the books closed.
The cost was not one big number. It was the constant background drag of an admin task that did not match anyone’s job description. The partner who ended up doing it was billing real client work at multiple times the cost of an automation. The team had looked at off-the-shelf accounting tools and rejected them because the firm uses a custom chart of accounts and a specific Drive structure that any standard tool would have flattened.
They needed something that respected the existing process and removed the manual work, not something that replaced the process.
The approach
We started with a source-of-truth audit, which is how we begin every workflow build at WiseSolutions. We mapped which system owned which fact. Gmail is the canonical inbox. Drive is the canonical document store. The custom Airtable base is the canonical record of what has been processed. The bookkeeper does the actual accounting in their own software and does not need our automation to write into that. They need a clean handover at month end.
That mapping shaped the design. The automation reads from Gmail, writes to Drive and Airtable, and produces a monthly report. It does not touch the bookkeeping system at all. That single decision removed half the integration risk and ninety percent of the questions about data ownership.
From there we built the workflow in four passes. First, classification: every new email in the shared inbox is checked, the body and any attachments are read, and an AI step decides whether this is an invoice, a receipt, a regulatory document, a contract, marketing noise, or personal. Second, extraction: for documents that are invoices or receipts, the AI pulls supplier name, invoice reference, date, total amount and VAT details. Third, filing: each document is renamed to the firm’s convention (year, month, supplier, reference), dropped into the right Drive folder, and recorded in Airtable with a link back to the original email. Fourth, escalation: anything the system cannot classify with high confidence is sent to a review queue for the partner to glance at on Friday afternoons.
We tested every classification rule against three months of historical mail before launch. That cost us an extra week and saved us a much longer period of finger-pointing about wrong calls. The confidence threshold ended up being set high enough that around five percent of emails route to the review queue. Most of those are edge cases: emails where a supplier has changed format, or where an invoice has been forwarded twice and we want a human to confirm the canonical copy.
What the system actually does
Every five minutes the workflow polls the shared inbox. New messages are pulled into the pipeline, where they are classified by an AI step. Invoices and receipts trigger the extraction sub-flow. Attachments are renamed and filed. Airtable gets a new row with all the structured fields plus a link to the original email thread, so anyone reviewing the record can jump back to the source in one click.
Once a month a separate flow generates a payment report: supplier breakdown, category totals, comparison with the prior month, flagged exceptions. That report is the handover to the bookkeeper. Before this build, the same report was assembled by hand. Now it lands in their inbox on the first of the month, ready to be reconciled.
The whole thing runs on the firm’s own infrastructure. Nothing about the workflow depends on a third-party SaaS that could disappear or change pricing. The AI calls go to a single provider with a documented retention policy. The data lives in Gmail, Drive and Airtable, all under the firm’s control.
The outcome
Four hours a week of manual triage gone, validated three months in. Ninety-six percent classification accuracy measured against the previous quarter of manual decisions. Zero invoices lost in six months of production. The monthly bookkeeping handover, which used to consume most of a Friday afternoon, now takes about twenty minutes, and that time is spent reviewing flagged exceptions, not assembling the report.
More important than the numbers is what the partner who used to do this work said when we asked them at the three-month mark: “I forget the system exists. That’s the highest compliment I can give it.”
What we’d do differently next time
We over-engineered the review queue. Our first version sent anything below a strict confidence threshold to the partner, and they spent the first two weeks impatiently approving the same kinds of edge cases. The fix was to add a feedback loop: any case the partner approves teaches the classifier what counts as a low-risk decision, and the threshold for sending it to review gradually tightens. By the end of month two the queue was where it should have been all along: small enough to handle in five minutes a week, big enough to catch real edge cases.
We also learned that the most valuable AI step is not extraction or classification. It is the small reasoning step that decides whether a given email is complete: whether the attachment is the actual invoice or just a confirmation, whether the body has the figures that the PDF is missing, whether two messages in a thread are duplicates of the same document. That kind of judgment is what separates a workflow that works ninety percent of the time from one that the team genuinely stops thinking about.
This case is one of several we have built in the same shape: different sector, different documents, same underlying pattern. If your team is losing time to repetitive triage of an inbox, a folder or a queue, we can usually tell within a discovery call whether the same approach will work.
Does this look like the kind of system your team needs?
Tell us the constraint and we'll tell you what we'd build, in what order, and what it would cost. 30 minutes on a call.
Book a discovery call