Possible SetupHR & Operations

Model Router Gateway — the smallest model that passes the test, per step

A model-agnostic gateway that routes each task to the cheapest model passing your eval — SLM for routine, frontier only for the hard step. Plus a cost dashboard and one-click provider swap. Demo reference architecture.

that passes the test: smallest model
Industry: Cross-industry / AI operations
Implementation: 3–5 weeks

At a glance

It starts with: Task comes in
The machine handles: 8 of 8 steps
A person keeps: Nothing — it runs end to end
Bottom line: smallest modelthat passes the test

The problem

Most automations send every step to the same expensive large model — even the trivial "is this an invoice? yes/no." That's a truck for the bread run.

This demo shows the opposite: a gateway that decides per step which model is enough — and calls the frontier only when the task genuinely needs it.

The industry is shifting from "more parameters" to "intelligence per euro." A 7B SLM is roughly 10–30× cheaper than a 70–175B model (NVIDIA, 2025) and is enough for the repetitive routine tasks that make up most of agent work — classify, extract, route, format. Meanwhile the price for a fixed capability level is falling dramatically (LLMflation: roughly 10×/year).

The catch: most mid-market companies don't use this. They have (a) a model hard-wired — lock-in, no switching when prices fall, (b) no eval to know whether a smaller model would do, and (c) no visibility into per-step cost. The result: they pay frontier prices for SLM work.

Honest limit: for hard, broad tasks (open-ended reasoning) the large model remains superior — inference compute substitutes for parameters only partially. The routing must respect that boundary, not ignore it.

How the process runs — step by step

Scroll through. The diagram stays put and highlights the step you are on.

The workflow01 / 08

Task comes in

step in the workflow

Classify & eval

type · difficulty

Model router

OpenRouter · LiteLLM

SLM enough?

Run SLM

cheap · fast

Frontier model

escalation · hard task

Log cost & eval

per step

Result + cost view

back into the workflow

01
Task comes in
step in the workflow
A step in your workflow needs a model — classify, extract, summarize, or a difficult synthesis. Instead of blindly calling the large model, the task goes to the router.
02
Classify & eval
type · difficulty
The router determines the task's type and difficulty and checks it against the stored eval — the small, honest test per task type.
03
Model router
OpenRouter · LiteLLM
The model-agnostic gateway (OpenRouter / LiteLLM) abstracts over 400+ models. The provider is configuration, not hard-wired code — you switch when prices fall.
04Branch
SLM enough?
The decision: is a small model (SLM) enough for this task — does it pass the eval — or does it need the frontier?
05
Run SLM
cheap · fast
Routine tasks (the bulk) run on a cheap, fast SLM. Roughly 10–30× cheaper than a large model and entirely sufficient for classify, extract, route.
06
Frontier model
escalation · hard task
Only the demonstrably hard cases escalate to a frontier model. You pay the top price deliberately — not blanket for every call.
07
Log cost & eval
per step
Every call is logged with model, cost, and eval result. You see spend per step, per model, per day — no surprise on the monthly bill.
08
Result + cost view
back into the workflow
The result goes back into the workflow — produced by the smallest model that passes the task, with full cost visibility.

Our Solution

A gateway that routes per task instead of blanket-calling the large model — self-hosted, model-agnostic (OpenRouter/LiteLLM as an abstraction over 400+ models). Three building blocks:

1. Eval-driven routing. Each task type has a small, honest test. The gateway picks the smallest model that passes it — SLM for routine, frontier only for the demonstrably hard step. "Passes my test" instead of "feels safe."

2. Frontier on demand with escalation. Uncertain or difficult cases automatically escalate a tier upward. So you pay the top price only for the cases that need it.

3. Cost & switching layer. A dashboard shows cost per step, model, and day. Because the model sits behind an abstraction, switching providers is configuration, not a rebuild — you profit from the price collapse instead of being at its mercy.

Honest framing: the gateway is not a quality guarantee — it's only as good as your evals. And "invisible" does not mean "free of dependency": a router you don't own is a new lock-in. Hence self-hosted and with documented evals.

Eval-driven routing: A small, honest test per task type. The gateway picks the smallest model that passes it — a decision by measurement, not gut feel.
Automatic escalation: Uncertain or difficult cases climb tier by tier to the frontier. You pay the top price only for the tasks that genuinely need it.
Model abstraction: 400+ models behind one API (OpenRouter / LiteLLM). Switching providers is configuration, not a code rebuild — you follow the price collapse instead of sitting in lock-in.
Cost dashboard: Spend per step, model, and day — live. No surprise on the monthly bill, but a cost curve you can steer.

What comes out of it

Possible setup, not a packaged product

The figures shown are target values and expected magnitudes for a possible setup – based on industry benchmarks, public studies of comparable setups, and our own tests on a real stack. They are not measured outcomes from a specific customer project; actual results depend on company size, process maturity, and integration depth. We do not offer this setup as a packaged product. We help teams design, automate, and run such processes themselves – through architecture consulting, workshops, and implementation support with n8n. For regulated third-party systems with certification or license requirements (e.g. HIS, gematik, DATEV-certified), we partner with specialized providers.

on SLM
Routine steps: hard cases only
Frontier calls: configuration
Provider swap: visible
Cost per step

From "everything to the expensive large model" to "the smallest model that passes the test per step, frontier only on demand" — with full cost visibility and one-click provider swap. Demo values, illustrative.

Before vs. After

Aspect	Before	After
Model choice	one large model for everything	routed to fit per step
Routine tasks	frontier price	SLM price
Hard tasks	same model as routine	targeted frontier escalation
Provider switch	code rebuild	configuration
Cost visibility	lump bill at month-end	cost per step, live

Model choice

Before

one large model for everything

After

routed to fit per step

Routine tasks

Before

frontier price

After

SLM price

Hard tasks

Before

same model as routine

After

targeted frontier escalation

Provider switch

Before

code rebuild

After

configuration

Cost visibility

Before

lump bill at month-end

After

cost per step, live

Technical facts

Technology Stack

n8n (self-hosted)OpenRouter / LiteLLMkleine + Frontier-ModelleEval-SuitePostgreSQL (Kosten-/Eval-Log)Grafana (Kosten-Dashboard)

Integrations

Seamless connection to your existing infrastructure

n8n (self-hosted)Orchestration: Central engine: classification, routing rules, escalation, and logging
OpenRouter / LiteLLMModel abstraction: One API over 400+ models; switch providers by configuration
SLM + frontier modelsInference: Small models for routine, frontier on demand for hard tasks
Eval suiteQuality control: Task-specific tests decide which model gets passed through
PostgreSQLCost/eval log: Record of all calls with model, cost, and test result
GrafanaCost dashboard: Spend per step, model, and day — visible live

Security & Compliance

Enterprise-ready with highest security standards

Data sovereignty through self-hosting: Gateway, evals, and logs run self-hosted. With LiteLLM even the model abstraction stays in-house — no obligation to send prompts through someone else's router.
No lock-in by design: Models and providers sit behind an abstraction and are swappable by configuration. The most expensive lock-in is the one you write into your code — this setup deliberately avoids it.
Auditable cost & eval log: Every call is logged with model, cost, and eval result — traceable which model handled which task with what outcome.
No quality promise without evals: The routing is only as good as the tests behind it. Framed honestly: the evals are the core — without them, "smaller model" is a gamble; with them, a measurement.

Frequently Asked Questions

In the demo, qualitatively yes, because routine work moves to cheap models. The real saving depends on your task mix — which is why we measure it with your evals instead of promising a number.

Only if your eval is poor. The routing uses a smaller model exclusively when it passes the test — otherwise it escalates to the frontier. Quality hangs on the test, not the model name.

The abstraction is itself swappable (LiteLLM self-hosted as an alternative). That's the whole point: neither a model nor a gateway as lock-in. Self-hosted keeps control with you.

No — a demo reference architecture. It shows the principle; routing rules and evals are tailored to your tasks. We offer architecture, workshops, and implementation, not an off-the-shelf product.

Does this look like your setup?

If this workflow resembles yours, we will look at it together — and tell you honestly whether automating it pays off.

Request a first call Run the bottleneck check first

Related Showcases

Agent Permission Gateway — securing AI agents with least privilege

A self-hosted control layer between AI agents and their tools: narrow per-tool rights, short-lived just-in-time tokens, per-step authorization, a human-in-the-loop gate before irreversible actions, and a tamper-evident audit log. Demo reference architecture, not a certified product.

View Details

EU AI Act Compliance Cockpit – Prove Your Obligations, Automatically

Automate the EU AI Act and GDPR obligations themselves: a central AI registry, an immutable audit trail, a human-in-the-loop gate for automated decisions, and automatic Article 50 transparency notices. Deadline-safe instead of a spreadsheet.

View Details