Model Router Gateway — the smallest model that passes the test, per step
A model-agnostic gateway that routes each task to the cheapest model passing your eval — SLM for routine, frontier only for the hard step. Plus a cost dashboard and one-click provider swap. Demo reference architecture.
Most automations send every step to the same expensive large model — even the trivial "is this an invoice? yes/no." That's a truck for the bread run.
This demo shows the opposite: a gateway that decides per step which model is enough — and calls the frontier only when the task genuinely needs it.
How the gateway routes a task
Each task is classified and sent through the model-agnostic gateway. An eval decides: is a cheap SLM enough — or does the task escalate to the frontier? Cost and result land in the log.
Before vs. After
| Aspekt | Before | After |
|---|---|---|
| Model choice | one large model for everything | routed to fit per step |
| Routine tasks | frontier price | SLM price |
| Hard tasks | same model as routine | targeted frontier escalation |
| Provider switch | code rebuild | configuration |
| Cost visibility | lump bill at month-end | cost per step, live |
The Challenge
The industry is shifting from "more parameters" to "intelligence per euro." A 7B SLM is roughly 10–30× cheaper than a 70–175B model (NVIDIA, 2025) and is enough for the repetitive routine tasks that make up most of agent work — classify, extract, route, format. Meanwhile the price for a fixed capability level is falling dramatically (LLMflation: roughly 10×/year).
The catch: most mid-market companies don't use this. They have (a) a model hard-wired — lock-in, no switching when prices fall, (b) no eval to know whether a smaller model would do, and (c) no visibility into per-step cost. The result: they pay frontier prices for SLM work.
Honest limit: for hard, broad tasks (open-ended reasoning) the large model remains superior — inference compute substitutes for parameters only partially. The routing must respect that boundary, not ignore it.
Our Solution
A gateway that routes per task instead of blanket-calling the large model — self-hosted, model-agnostic (OpenRouter/LiteLLM as an abstraction over 400+ models). Three building blocks:
1. Eval-driven routing. Each task type has a small, honest test. The gateway picks the smallest model that passes it — SLM for routine, frontier only for the demonstrably hard step. "Passes my test" instead of "feels safe."
2. Frontier on demand with escalation. Uncertain or difficult cases automatically escalate a tier upward. So you pay the top price only for the cases that need it.
3. Cost & switching layer. A dashboard shows cost per step, model, and day. Because the model sits behind an abstraction, switching providers is configuration, not a rebuild — you profit from the price collapse instead of being at its mercy.
Honest framing: the gateway is not a quality guarantee — it's only as good as your evals. And "invisible" does not mean "free of dependency": a router you don't own is a new lock-in. Hence self-hosted and with documented evals.
Key Features
Eval-driven routing
A small, honest test per task type. The gateway picks the smallest model that passes it — a decision by measurement, not gut feel.
Automatic escalation
Uncertain or difficult cases climb tier by tier to the frontier. You pay the top price only for the tasks that genuinely need it.
Model abstraction
400+ models behind one API (OpenRouter / LiteLLM). Switching providers is configuration, not a code rebuild — you follow the price collapse instead of sitting in lock-in.
Cost dashboard
Spend per step, model, and day — live. No surprise on the monthly bill, but a cost curve you can steer.
Results
Possible setup, not a packaged product
The figures shown are target values and expected magnitudes for a possible setup – based on industry benchmarks, public studies of comparable setups, and our own tests on a real stack. They are not measured outcomes from a specific customer project; actual results depend on company size, process maturity, and integration depth. We do not offer this setup as a packaged product. We help teams design, automate, and run such processes themselves – through architecture consulting, workshops, and implementation support with n8n. For regulated third-party systems with certification or license requirements (e.g. HIS, gematik, DATEV-certified), we partner with specialized providers.
From "everything to the expensive large model" to "the smallest model that passes the test per step, frontier only on demand" — with full cost visibility and one-click provider swap. Demo values, illustrative.
Integrations
Seamless connection to your existing infrastructure
n8n (self-hosted)
OrchestrationCentral engine: classification, routing rules, escalation, and logging
OpenRouter / LiteLLM
Model abstractionOne API over 400+ models; switch providers by configuration
SLM + frontier models
InferenceSmall models for routine, frontier on demand for hard tasks
Eval suite
Quality controlTask-specific tests decide which model gets passed through
PostgreSQL
Cost/eval logRecord of all calls with model, cost, and test result
Grafana
Cost dashboardSpend per step, model, and day — visible live
Security & Compliance
Enterprise-ready with highest security standards
Data sovereignty through self-hosting
Gateway, evals, and logs run self-hosted. With LiteLLM even the model abstraction stays in-house — no obligation to send prompts through someone else's router.
No lock-in by design
Models and providers sit behind an abstraction and are swappable by configuration. The most expensive lock-in is the one you write into your code — this setup deliberately avoids it.
Auditable cost & eval log
Every call is logged with model, cost, and eval result — traceable which model handled which task with what outcome.
No quality promise without evals
The routing is only as good as the tests behind it. Framed honestly: the evals are the core — without them, "smaller model" is a gamble; with them, a measurement.
Technology Stack
Frequently Asked Questions
Related Showcases
Agent Permission Gateway — securing AI agents with least privilege
A self-hosted control layer between AI agents and their tools: narrow per-tool rights, short-lived just-in-time tokens, per-step authorization, a human-in-the-loop gate before irreversible actions, and a tamper-evident audit log. Demo reference architecture, not a certified product.
EU AI Act Compliance Cockpit – Prove Your Obligations, Automatically
Automate the EU AI Act and GDPR obligations themselves: a central AI registry, an immutable audit trail, a human-in-the-loop gate for automated decisions, and automatic Article 50 transparency notices. Deadline-safe instead of a spreadsheet.