Chris Garlick 14 min read

How to Choose an LLM for Business Use (UK 2026)

How to choose an LLM for business use in 2026. Honest trade-offs across Claude, GPT-5, Gemini, and open-source. When closed wins, when open wins, why.

The shortest, laziest answer to "which LLM should we use" in 2026 is "Claude." It's the answer most consultants will give you. It might even be the right answer for your specific use case.

It's also a wrong answer often enough that giving it as a default is bad advice. The honest answer depends on volume, what the model is doing, where the data needs to live, how much vendor risk you're willing to absorb, and whether you have any in-house engineering capacity at all.

This is the comparison I'd want a UK business owner or technical director to read before they signed an annual contract with anyone. I build with Claude often. I also build with open-source models on UK infrastructure for the clients where that's the better answer. The deciding factor isn't brand loyalty. It's structural fit.

The short answer

For most UK SMEs running fewer than 10 million tokens a month through varied prompts, the right setup is a frontier API as the daily driver (Claude Sonnet 4.6 is the safest default, GPT-5.4 if you need agentic tool use, Gemini 3.5 Flash if you want price headroom), with a cheap fallback model for high-volume bulk tasks (DeepSeek V4 API, or Haiku 4.5).

For UK businesses handling regulated data (SRA-supervised law firms, FCA-regulated financial services, NHS-adjacent admin), or anyone processing more than 50 million tokens a month, self-hosted open-source models (Llama 4, Mistral Large 3, Qwen 3.5 under Apache 2.0) on UK or EU infrastructure are increasingly the better commercial and compliance answer. Not the "free LLM" of three years ago. Production-grade systems that close the quality gap and never let your data leave your servers.

Below is why, with the trade-offs spelled out honestly.

The frontier model line-up in 2026

Four serious closed-API options, with genuinely different strengths.

Claude (Anthropic). Opus 4.7 leads on coding and production reliability. SWE-bench Verified at 83.5% and SWE-bench Pro at 64.3% puts Opus 4.7 ahead of GPT-5.4 (57.7%) and Gemini (54.2%) on real software engineering tasks. On production code review and factual writing, Opus 4.7's 36% hallucination rate beats GPT-5.5's 86%. API pricing sits at $5/$25 per million input/output tokens for Opus, $3/$15 for Sonnet 4.6, $1/$5 for Haiku 4.5.

GPT (OpenAI). GPT-5.5 leads on multi-hour autonomous agent work. Terminal-Bench at 82.7%, OSWorld at 78.7%, ARC-AGI-2 at 85% are the benchmarks where it pulls ahead of Claude. Pricing is roughly comparable to Claude Sonnet, with the Batch API at 50% of standard for non-latency-critical workloads. The ecosystem (function calling, plugins, Codex) is the deepest of any provider.

Gemini (Google). Gemini 3.1 Pro at $12 per million output tokens is the cheapest of the three frontier models. Strong multimodal support, 2M token context window, native integration with Google Workspace. For high-volume batch jobs, classification, or any workload where price is the primary lever, Gemini wins.

DeepSeek (Chinese, but open-source available). DeepSeek V4 API at $0.44 input / $0.87 output per million tokens is roughly 10x cheaper than Claude Sonnet for similar quality on many tasks. DeepSeek V4 Pro currently leads BenchLM's Chinese leaderboard at 87. The catch (and it is a real catch) is in the next section.

The case for closed-API frontier models

Three reasons businesses keep choosing them despite the cost.

Quality on hard tasks. Frontier closed models lead the open-source field on the hardest 5% of benchmarks. Long-context reasoning, multi-step agentic work, novel problem solving. If your workflow has those characteristics, paying for a frontier API is currently the cheapest way to get the quality you need.

Operational simplicity. A closed API is one HTTP call. No GPU procurement, no model serving infrastructure, no monitoring, no version pinning, no security patching. For most UK SMEs without dedicated MLOps capacity, the API model removes 80% of the engineering work.

Caching and batch discounts. Anthropic's prompt caching, OpenAI's automatic prefix cache, and Gemini's context caching mean repeat traffic is now 2-10x cheaper than the headline rate. The economics of using Claude on a RAG system are nothing like they were two years ago.

The case for open-source LLMs (and why it's stronger than most consultants will tell you)

Open-source models are not the "free but worse" option anymore. The shape of the market changed in 2025 and the consequences are still being underestimated.

2025 was the year open-source LLMs closed the gap with proprietary models. In 2026, they're on par in many areas, or better. Five practical reasons that matters for a UK business.

1. Your data never leaves your infrastructure

This is the biggest one and it's not hypothetical. When you call a closed API, your prompt and the model's response transit through that vendor's systems. For UK SRA-regulated law firms, FCA-regulated financial services, NHS-adjacent administration, or anyone processing genuinely sensitive client data, that's a problem the vendor's "we don't train on your data" clause doesn't fully solve.

A self-hosted open-source model running on UK or EU infrastructure means data residency is automatic by design, not a contractual promise. GDPR Article 25 requires data protection to be built into the design of your processing systems. Local LLM inference is the technical implementation of that requirement.

For a UK law firm processing client matter notes, that's not a "nice to have" feature. It's the difference between AI being legally usable and not.

2. The cost curve flips at moderate scale

For low volume, closed APIs are cheaper than self-hosting. The break-even point used to be high. In 2026 it isn't.

At 500+ requests per hour with 300 tokens average output, a single A100 GPU running vLLM saves approximately 70% compared to GPT-4o API costs, with the break-even point typically around 150-200 requests/hour. One fintech reported cutting monthly AI spend from $47,000 to $8,000 (an 83% reduction) by moving predictable workloads to self-hosted infrastructure while keeping frontier API access for the harder edge cases.

For a UK SME processing 50 million tokens a month, the maths often favours self-hosting before the project is six months old. The decision isn't "API vs self-host." It's "API now, self-host the predictable workloads once the volume justifies it."

3. You stop renting a model and start owning a capability

Lock-in is when switching costs or risk are high enough that you effectively cannot switch. With closed APIs, you build prompts tuned to a specific model's quirks, you couple to that vendor's SDK and tool-calling format, you train your team's expectations around that model's behaviour. When a provider shuts down a model you depend on, your production system breaks overnight.

Anthropic deprecating Claude 3 was the warning shot. Plenty of business systems built on Claude 3 broke when it was retired. The same will happen to Claude 4.6 in 18 to 24 months. And Claude 4.7. And whatever 4.8 will be.

With open-source models under Apache 2.0 (Mistral Large 3, Qwen 3.5, Gemma 4) or MIT (DeepSeek V4), you own the weights. The model is still on your server in five years if you want it there. Your prompts and your fine-tunes are portable. The capability stays with you.

4. Fine-tuning is genuinely possible

You can technically fine-tune Claude via the API. In practice, the constraints (cost, opacity, vendor dependency) make it a niche choice. With open weights, you can take a base model, fine-tune it on 10,000 of your own customer service emails or contract clauses or accounting workpapers, and deploy the specialised version on your own infrastructure.

For narrow high-volume business tasks (entity extraction, classification, format conversion), a fine-tuned 7B-13B parameter open-source model can match or beat a frontier closed model on the specific task, at a fraction of the runtime cost.

5. The open-source field is now genuinely competitive

Almost every flagship open model in 2026 is a sparse Mixture-of-Experts: DeepSeek V4-Pro (1.6T total / 49B active), Llama 4 Maverick (400B / 17B), Qwen 3.5 (397B / 17B), Mistral Large 3 (675B / 41B). On several benchmarks that matter to developers (coding, math, instruction following, long-context reasoning), Qwen 3.5 is winning, not just competing.

In September 2025 Alibaba's Qwen model family surpassed Llama to become the most downloaded LLM family on Hugging Face, with over 700 million downloads. The "open-source is the underdog" framing is two years out of date.

The Chinese model question (DeepSeek, Qwen, Kimi)

This is where the conversation gets nuanced and most "just use Claude" advice falls apart.

There are two completely different ways to use a Chinese-trained open-weight model, and they have completely different risk profiles.

Via the vendor's hosted API. Do not do this for any UK business handling client data. Each call to DeepSeek's or Qwen's API transmits a context window containing the user's query and any background data to servers in China. Under China's 2017 National Intelligence Law, companies must support, assist, and cooperate with state intelligence work. Neither DeepSeek nor Qwen has a GDPR-required representative in the EU. DeepSeek's publicly accessible database was leaked in early 2025, exposing chat history, API keys, and backend data.

Self-hosted from open weights. This is the interesting case. Once a model is downloaded from Hugging Face and running on your own GPU, it cannot phone home. The weights are static. The model can't transmit your data anywhere. Your data residency is wherever your server is.

The remaining concern is whether the model itself has been deliberately backdoored or trained to behave badly on specific topics. There's evidence this is a real risk. DeepSeek-R1 produces code with up to 50% more severe security vulnerabilities when prompted on topics the CCP considers politically sensitive. For UK businesses doing general operational work (contract summary, customer support, document extraction), this is unlikely to surface. For anyone whose workflows touch geopolitical content, it's a real consideration.

Pragmatic UK position. Qwen 3.5 or DeepSeek V4 self-hosted under Apache 2.0 or MIT licence, running on a UK or EU box, with the same evaluation harness you'd run on any model, is a defensible choice. Calling the vendor-hosted versions of these models is not.

The cost reality at scale

Like-for-like comparison for a UK SME processing 50 million tokens a month (typical for a mid-sized RAG system or production agent workload):

Option	Monthly cost	What you get	Best for
Claude Sonnet 4.6 via API	£350 - £700	Frontier quality, zero ops	Low volume, varied tasks
GPT-5.4 via API	£300 - £600	Frontier quality, deep tooling	Agent-heavy workflows
Gemini 3.5 Flash via API	£80 - £200	Good quality, cheap	High-volume classification
DeepSeek V4 via API	£30 - £80	Good quality, very cheap	Bulk processing (with data caveats)
Self-hosted Llama 4 / Qwen 3.5 / Mistral Large 3 on UK VPS or GPU server	£400 - £1,500 (mostly hardware amortisation)	Full data control, no per-token fees	Regulated data, predictable workloads
Hybrid: frontier API + self-hosted fallback	£200 - £600	Best of both	Most production teams

The hybrid option in the last row is what serious teams actually run in 2026. Frontier API for the 20% of queries that need the quality. Self-hosted open-source for the 80% that are predictable and don't justify the markup.

How to actually choose

A practical decision tree.

Step 1: how often does the source data change?

Static or rarely changing -> any model, prompt engineering matters more than choice
Constantly changing -> RAG architecture, model choice secondary

Step 2: what's the volume per month?

Under 5M tokens -> closed API. Don't overthink it. Pick Claude Sonnet or Gemini Flash.
5M - 50M tokens -> closed API still, but start measuring which queries are expensive
50M+ tokens -> seriously evaluate self-hosting for predictable workloads
200M+ tokens -> self-hosting is almost certainly cheaper

Step 3: what's the data sensitivity?

General business operations -> closed API is fine
Client-confidential material under professional regulation -> open-source self-hosted, UK or EU infrastructure
Genuinely classified or competitive data -> open-source self-hosted, air-gapped

Step 4: what's the task type?

Heavy coding -> Claude Opus 4.7 or self-hosted Qwen 3.5 Coder
Multi-hour agent work -> GPT-5.4 or self-hosted Llama 4 Maverick
High-volume classification or extraction -> fine-tuned 7B-13B open-source model
General "answer questions about our docs" -> Claude Sonnet 4.6 or self-hosted Mistral Large 3
Multimodal (vision, audio, video) -> Gemini 3.1 Pro

Step 5: how much engineering capacity do you have?

Zero in-house engineering -> closed API. Self-hosting is not for you.
One or two developers -> closed API now, plan to self-host once volume justifies it
Full engineering team -> design for portability from day one. Use a routing layer.

The mistake most UK businesses make

Picking a model before defining the workflow. "We want to use AI" leads to a Claude subscription before anyone has mapped what the AI is meant to do. Six months later, the bill is £2,000 a month, the use case has narrowed to one repetitive task, and the company is paying frontier prices for a workload a fine-tuned 7B Mistral model would handle for £200 a month on a Hetzner box.

The second mistake. Treating "which LLM" as a one-time decision. The right answer in 2026 is almost always a routing layer that sends different queries to different models. Claude Sonnet for the hard 20%. Gemini Flash for the cheap classification. A self-hosted open-source model for the bulk. Tied together with a thin abstraction layer so you can swap models without rewriting your application.

The third mistake. Defaulting to "the most expensive model is the safest choice." Frontier doesn't equal correct. Claude Opus on a workload that a Haiku 4.5 call could handle is just wasteful. The right level of model for the task is the question worth answering.

My honest take

The "just use Claude" advice you'll hear from most consultants in 2026 is a reasonable starting heuristic and a bad ending answer. For your first build, on low volume, on general business operations, with no engineering capacity, it's fine. For anything that grows beyond that, the trade-offs deserve a real conversation.

Open-source models in 2026 are not the underdog. They're the better commercial answer for any UK business with predictable workload patterns, sensitive data, or budgets that need to scale linearly with usage rather than per-token. The engineering bar to run them is genuinely lower than it was 18 months ago. The quality gap is smaller than the marketing makes it look.

For UK regulated firms specifically (law, financial services, accountancy, healthcare admin), self-hosted open-source on UK or EU infrastructure is increasingly the only legally defensible long-term answer. The compliance team will thank you in two years even if they don't yet know they will.

If you want a second opinion on which model setup actually fits the work you're trying to ship, book a free scoping call. I'll give you the honest answer, including when that answer is "you don't need any of this yet."

Frequently asked questions

Which LLM is best for business use in 2026?

There is no single best LLM for all business use. For low-volume general work, Claude Sonnet 4.6 is the safest default. For agentic multi-step work, GPT-5.4 leads. For high-volume cheap classification, Gemini 3.5 Flash. For regulated data or 50M+ tokens a month, self-hosted open-source models like Llama 4, Mistral Large 3, or Qwen 3.5 under Apache 2.0 are increasingly the better answer.

Are open-source LLMs good enough for business use?

Yes, in 2026 they are. 2025 was the year open-source closed the gap with proprietary models, and on several benchmarks open-source models like Qwen 3.5 are now winning rather than just competing. For predictable workloads, regulated data, or any business processing 50M+ tokens a month, self-hosted open-source is often the better commercial choice. The quality gap on the hardest 5% of tasks still favours closed frontier models, but most business workloads aren't in that 5%.

How do I deploy an open-source LLM for a UK business?

For 1 to 5 concurrent users on small models, Ollama on a single machine or VPS is the fastest to ship. For production serving 5+ concurrent users, vLLM delivers roughly 2.3x higher throughput than Ollama and is the more reliable choice. UK or EU infrastructure (AWS London, Hetzner Helsinki, dedicated GPU hosts) keeps data residency automatic, which is the main reason UK regulated firms self-host.

Is it safe to use Chinese LLMs (DeepSeek, Qwen) for business?

The honest answer is "depends how you use them." Calling DeepSeek's or Qwen's hosted API transmits your data to servers in China and exposes you to GDPR risk because neither has a GDPR representative in the EU. Running the same models self-hosted from open weights on UK or EU infrastructure is structurally safer because the weights cannot phone home. There remains a residual concern about deliberately backdoored behaviour on specific topics, which matters most for any workflow touching politically sensitive content.

When should I avoid self-hosting an LLM?

Avoid self-hosting if you have no in-house engineering capacity, your token volume is under 10M a month, your workloads are highly varied (so caching doesn't help), or your data isn't sensitive enough to justify the operational overhead. For most UK SMEs in those situations, a closed API is the right answer for at least the first 12 to 18 months. Self-hosting becomes the better choice once volume, data sensitivity, or vendor lock-in concerns cross the threshold.

Want this for your business?

I build software like what's described above. Fixed pricing, transparent process.

Get in touch

Back to blog