How much can I save with benchmark-driven routing?

Savings depend on your workload. Teams overusing flagship models for routine tasks commonly see 50-80% cost reduction. The biggest win comes from stopping the habit of sending every request to one expensive model when a cheaper one handles that specific task equally well.

Does the router make API calls with my keys?

No. The router does not ask for or handle provider API keys. All model execution goes through OpenClaw's existing auth and gateway. The plugin only reads local benchmark CSV files and communicates with OpenClaw's local gateway for classification.

Can OpenMark run the benchmark for me?

Yes. The OpenMark audit service is $299–$499 for one recurring task across 10–20 models with 48-hour turnaround. Optional retainer at $500–$1,000/month covers monthly re-runs. Best for tasks with measurable outputs.

Benchmark-Driven Routing
for OpenClaw

Q: What is the OpenClaw Model Router?

An open-source plugin for OpenClaw that routes your prompts to the best AI model for each task category, based on real benchmark results from OpenMark AI. It uses a lightweight semantic classifier to identify the task, then deterministically selects the optimal model from your benchmark data.

Q: Is the router open source?

Yes. The router is Apache-2.0 licensed and available on GitHub. The Python routing engine uses only stdlib — no pip dependencies, no external network calls for model ranking.

Stop defaulting to one model for everything. The OpenMark router uses real evaluation data from your own tasks to route each prompt to the best model — with fallbacks, cost savings, and full visibility.

GitHub ClawHub Routing Guide

See It In Action

Watch how the router classifies tasks and picks optimal models from your benchmark data.

What the Router Does

Most routing solutions use "simple vs complex" heuristics or generic capability tiers to pick a model. That's a guess dressed as a system. The OpenMark router for OpenClaw takes a different approach: it uses your actual benchmark results to make every routing decision.

You benchmark your recurring tasks on OpenMark AI, export the results, and the router uses that data to match incoming prompts to the best-performing model for each task category. No keyword matching, no complexity scoring — just measured performance on your real work.

Key difference: the router doesn't guess which model is "good enough" — it knows, because you already tested it on your own task with deterministic scoring.

What You See

You send a prompt. The router classifies it, finds the matching benchmark, picks the winner, and the routed model answers — all in a single turn. A routing card shows what happened:

Routed to gpt-5.4-nano (openai) — Content Creation Benchmark
Benchmark: 92.9% score | $0.002731/call | 30.28s

Why this route: better score than gemini-3.1-pro, 97.6% cheaper, 4.2x faster
Over 10K calls: $27.31 vs $1148.36

Strategy: balanced | Benchmark data: fresh

[actual response from gpt-5.4-nano follows here...]

The routed model generates the real reply. The classifier only identifies the task category — it never produces the user-visible answer.

How It Works

The plugin uses an internal two-phase architecture. To the user, it looks like a single reply.

Phase 1

Classify & Route — A lightweight LLM call (through OpenClaw's gateway) classifies the user message against your benchmark category names. The deterministic routing engine then ranks available models by your chosen strategy and selects the winner plus fallbacks. This takes ~60ms after classification.

Phase 2

Generate — OpenClaw immediately runs the real reply with the routed model, using full session context, system prompt, and conversation history. Authentication and streaming are handled by OpenClaw.

No direct provider API calls from the plugin. Classification goes through the OpenClaw gateway. Provider authentication and model execution stay inside OpenClaw — you don't hand API keys to the router.

Quick Start

Benchmark your recurring tasks on OpenMark AI — test across 100+ models with deterministic scoring.
Export — click Export → OpenClaw on the Results tab. The CSV includes dual model keys, scores, costs, and metadata.
Install — run openclaw plugins install openmark-router and restart the gateway.
Import — place CSVs in the benchmarks directory, or use the local dashboard's import flow. The router activates automatically.

That's it. The router registers as a provider, sets openmark/auto as your default model, and starts routing. Unmatched tasks pass through to your original default model unchanged.

Five Routing Strategies

Choose how the router ranks models from your benchmark data:

`balanced`

Weighted composite: accuracy (40%) + cost-efficiency (20%) + speed (25%) + stability (15%). Best for most workloads.

`best_score`

Highest benchmark accuracy regardless of cost or speed.

`best_cost_efficiency`

Best accuracy per dollar among viable models. Models below the viability floor are excluded.

`best_under_budget`

Highest score within your cost ceiling. Set cost_ceiling in config.

`best_under_latency`

Highest score within your latency ceiling. Set latency_ceiling_s in config.

All strategies use a 6-step cascade sort and a viability floor (max(top_score - 15pp, top_score * 0.5)) to exclude underperforming models. Fallback models are ranked from the same benchmark data.

Why Custom Benchmarks Matter for Routing

Every routing solution that uses generic categorization breaks in practice. "Email tasks" lumps cold outreach, complaint triage, and legal notices together — but model performance varies dramatically across these subtypes.

Generic benchmarks are equally broad. MMLU, Arena Elo, and HumanEval test general capabilities. A model scoring well on "writing" tells you nothing about your email templates with your tone requirements.

When you benchmark on OpenMark AI, you test models on your specific task, with your prompts, against your criteria. That's the data the router needs to make decisions you can trust.

Local Dashboard

The router ships with a local dashboard at http://127.0.0.1:2098/dashboard. From there you can:

View router health, version, and detected providers
See active benchmark categories and data freshness
Change routing strategy and toggle routing cards
Import and manage benchmark CSVs directly
View benchmark category descriptions from loaded metadata

Works With Your Existing Setup

The router detects which providers your OpenClaw install can use and filters benchmark candidates accordingly. Direct provider keys are preferred first; if a model's direct provider isn't available but OpenRouter is, the router falls back to the OpenRouter key for that row.

Single-provider setups still benefit — you can benchmark and route within one provider's model lineup. The router is also useful with subscriptions, hosted access, or OAuth-backed providers, as long as OpenClaw can execute the model IDs involved.

Frequently Asked Questions

What is the OpenClaw Model Router?

An open-source plugin for OpenClaw that routes prompts to the best AI model for each task category, using benchmark results from OpenMark AI. It uses a lightweight classifier to identify the task, then deterministically selects the optimal model from your data.

Do I need API keys from multiple providers?

No. The router works with whatever providers your OpenClaw install already has configured. Single-provider setups still benefit by routing within that provider's lineup. OpenRouter fallback is supported when benchmark rows include OR keys.

How much can I save?

Savings depend on your workload. Teams overusing flagship models for routine tasks commonly see 50-80% cost reduction. The router doesn't guarantee specific savings — it routes based on your measured benchmark data.

Is it open source?

Yes. Apache-2.0 licensed on GitHub. The Python routing engine uses only stdlib — no pip dependencies, no external network calls for model ranking.

Does it handle my API keys?

No. All model execution goes through OpenClaw's existing auth and gateway. The plugin never asks for or directly uses provider API keys. It reads local benchmark CSVs and communicates with OpenClaw's local gateway for classification.

Can you run the benchmark for me?

Yes. The audit service ($299–$499) covers one recurring task across 10–20 models in 48 hours. Optional retainer at $500–$1,000/month for ongoing re-runs as new models ship. Best-fit for tasks with measurable outputs (classification, extraction, RAG grading, routing, moderation). Details on the audit page →

Done-for-you option

Don't want to design the test yourself? Have us run it for you.

The audit is what produces the routing map. Skip building the eval from scratch. Send us your task, we benchmark it across all relevant models (up to 30+) and send back a synthesized report with the recommended primary, fallbacks, cost-at-volume, and re-test triggers. From $299, 48-hour turnaround, no call required.

See the audit service → Or run it yourself on the platform

Start Routing With Real Data

Benchmark your recurring tasks, export for OpenClaw, and let the router handle model selection.
100+ models, deterministic scoring, real cost tracking.
50 free credits — no API keys, no setup.

Start Benchmarking — Free → Or have us run it — from $299

More from OpenMark

AI Model Routing Guide Compare AI Models Best AI Model Best AI for Agents AI Pricing Comparison LLM Leaderboard Why Benchmark AI Models? Audit & Routing Service Launch OpenMark App

Benchmark-Driven Routingfor OpenClaw