Skip to main content
Possible SetupHR & Operations

Model Router Gateway — the smallest model that passes the test, per step

A model-agnostic gateway that routes each task to the cheapest model passing your eval — SLM for routine, frontier only for the hard step. Plus a cost dashboard and one-click provider swap. Demo reference architecture.

Model RoutingSLMKostenkontrolleEval-getriebenmodell-agnostischn8n
Industry
Cross-industry / AI operations
Implementation
3-5 Wochen
Model choice
smallest that passes the test

Most automations send every step to the same expensive large model — even the trivial "is this an invoice? yes/no." That's a truck for the bread run.

This demo shows the opposite: a gateway that decides per step which model is enough — and calls the frontier only when the task genuinely needs it.

How the gateway routes a task

Each task is classified and sent through the model-agnostic gateway. An eval decides: is a cheap SLM enough — or does the task escalate to the frontier? Cost and result land in the log.

BPMN Elements
Trigger
Start Event
Processing
Task
Integration
Service Task
Output
End Event
Gateway
XOR (exclusive)

Before vs. After

Model choice
Before
one large model for everything
After
routed to fit per step
Routine tasks
Before
frontier price
After
SLM price
Hard tasks
Before
same model as routine
After
targeted frontier escalation
Provider switch
Before
code rebuild
After
configuration
Cost visibility
Before
lump bill at month-end
After
cost per step, live

The Challenge

The industry is shifting from "more parameters" to "intelligence per euro." A 7B SLM is roughly 10–30× cheaper than a 70–175B model (NVIDIA, 2025) and is enough for the repetitive routine tasks that make up most of agent work — classify, extract, route, format. Meanwhile the price for a fixed capability level is falling dramatically (LLMflation: roughly 10×/year).

The catch: most mid-market companies don't use this. They have (a) a model hard-wired — lock-in, no switching when prices fall, (b) no eval to know whether a smaller model would do, and (c) no visibility into per-step cost. The result: they pay frontier prices for SLM work.

Honest limit: for hard, broad tasks (open-ended reasoning) the large model remains superior — inference compute substitutes for parameters only partially. The routing must respect that boundary, not ignore it.

Our Solution

A gateway that routes per task instead of blanket-calling the large model — self-hosted, model-agnostic (OpenRouter/LiteLLM as an abstraction over 400+ models). Three building blocks:

1. Eval-driven routing. Each task type has a small, honest test. The gateway picks the smallest model that passes it — SLM for routine, frontier only for the demonstrably hard step. "Passes my test" instead of "feels safe."

2. Frontier on demand with escalation. Uncertain or difficult cases automatically escalate a tier upward. So you pay the top price only for the cases that need it.

3. Cost & switching layer. A dashboard shows cost per step, model, and day. Because the model sits behind an abstraction, switching providers is configuration, not a rebuild — you profit from the price collapse instead of being at its mercy.

Honest framing: the gateway is not a quality guarantee — it's only as good as your evals. And "invisible" does not mean "free of dependency": a router you don't own is a new lock-in. Hence self-hosted and with documented evals.

Key Features

Eval-driven routing

A small, honest test per task type. The gateway picks the smallest model that passes it — a decision by measurement, not gut feel.

Automatic escalation

Uncertain or difficult cases climb tier by tier to the frontier. You pay the top price only for the tasks that genuinely need it.

Model abstraction

400+ models behind one API (OpenRouter / LiteLLM). Switching providers is configuration, not a code rebuild — you follow the price collapse instead of sitting in lock-in.

Cost dashboard

Spend per step, model, and day — live. No surprise on the monthly bill, but a cost curve you can steer.

Results

Possible setup, not a packaged product

The figures shown are target values and expected magnitudes for a possible setup – based on industry benchmarks, public studies of comparable setups, and our own tests on a real stack. They are not measured outcomes from a specific customer project; actual results depend on company size, process maturity, and integration depth. We do not offer this setup as a packaged product. We help teams design, automate, and run such processes themselves – through architecture consulting, workshops, and implementation support with n8n. For regulated third-party systems with certification or license requirements (e.g. HIS, gematik, DATEV-certified), we partner with specialized providers.

on SLM
Routine steps
hard cases only
Frontier calls
configuration
Provider swap
visible
Cost per step

From "everything to the expensive large model" to "the smallest model that passes the test per step, frontier only on demand" — with full cost visibility and one-click provider swap. Demo values, illustrative.

Integrations

Seamless connection to your existing infrastructure

n8n (self-hosted)

Orchestration

Central engine: classification, routing rules, escalation, and logging

OpenRouter / LiteLLM

Model abstraction

One API over 400+ models; switch providers by configuration

SLM + frontier models

Inference

Small models for routine, frontier on demand for hard tasks

Eval suite

Quality control

Task-specific tests decide which model gets passed through

PostgreSQL

Cost/eval log

Record of all calls with model, cost, and test result

Grafana

Cost dashboard

Spend per step, model, and day — visible live

Security & Compliance

Enterprise-ready with highest security standards

Data sovereignty through self-hosting

Gateway, evals, and logs run self-hosted. With LiteLLM even the model abstraction stays in-house — no obligation to send prompts through someone else's router.

No lock-in by design

Models and providers sit behind an abstraction and are swappable by configuration. The most expensive lock-in is the one you write into your code — this setup deliberately avoids it.

Auditable cost & eval log

Every call is logged with model, cost, and eval result — traceable which model handled which task with what outcome.

No quality promise without evals

The routing is only as good as the tests behind it. Framed honestly: the evals are the core — without them, "smaller model" is a gamble; with them, a measurement.

Technology Stack

n8n (self-hosted)OpenRouter / LiteLLMkleine + Frontier-ModelleEval-SuitePostgreSQL (Kosten-/Eval-Log)Grafana (Kosten-Dashboard)

Frequently Asked Questions

In the demo, qualitatively yes, because routine work moves to cheap models. The real saving depends on your task mix — which is why we measure it with your evals instead of promising a number.
Only if your eval is poor. The routing uses a smaller model exclusively when it passes the test — otherwise it escalates to the frontier. Quality hangs on the test, not the model name.
The abstraction is itself swappable (LiteLLM self-hosted as an alternative). That's the whole point: neither a model nor a gateway as lock-in. Self-hosted keeps control with you.
No — a demo reference architecture. It shows the principle; routing rules and evals are tailored to your tasks. We offer architecture, workshops, and implementation, not an off-the-shelf product.

Would this automation pay off in your case?

You've just seen one possible setup. The 5-minute bottleneck diagnosis shows you — for your own process: maturity level, ROI estimate and whether this path is worth it. Free, instant result.