Build a Task-Aware LLM Router in an Afternoon With the Promptster API
By Promptster Team · 2026-05-03
In our cost-per-quality analysis we showed that the 300x price spread between the cheapest and priciest models doesn't buy you 300x quality. For a lot of tasks it buys you nothing. The teams getting the best cost-to-quality ratio don't pick a model — they route work. Easy prompts go to nano tier. Hard prompts go to frontier. Classification decides.
This post builds that router. 60 lines of Python. Uses the Promptster API so you don't have to maintain eleven SDKs. Runs today.
The Architecture
┌───────────────┐
│ user prompt │
└───────┬───────┘
│
▼
┌───────────────────────────┐
│ classifier (nano model) │
│ → returns task_type label │
└───────┬───────────────────┘
│
▼
┌───────────────────────────┐
│ route to appropriate tier │
│ code → GPT-4o Mini │
│ math → Claude Sonnet │
│ extraction → Gemini Lite │
│ creative → Sonnet │
│ factual → Perplexity │
│ default → GPT-4o Mini │
└───────┬───────────────────┘
│
▼
┌───────────────┐
│ final answer │
└───────────────┘
Two LLM calls per request: one nano-cheap classifier (~$0.0001), one execution at the right tier. Classification overhead is roughly 1-2% of total request cost and the savings on misrouted work are 10-100x.
The Code
Requires a Promptster API key (pk_live_* or pk_test_*) and the requests library.
import os
import requests
import json
from typing import Literal
PROMPTSTER_KEY = os.environ["PROMPTSTER_API_KEY"]
BASE_URL = "https://www.promptster.dev/v1"
TaskType = Literal["code", "math", "extraction", "creative", "factual", "general"]
# Routing table — tier per task type.
# Values are (provider, model) pairs to send the prompt to.
ROUTES: dict[TaskType, tuple[str, str]] = {
"code": ("openai", "gpt-4o-mini"),
"math": ("anthropic", "claude-sonnet-4-5"),
"extraction": ("google", "gemini-2.5-flash-lite"),
"creative": ("anthropic", "claude-sonnet-4-5"),
"factual": ("perplexity","sonar"), # web-connected
"general": ("openai", "gpt-4o-mini"),
}
CLASSIFIER_MODEL = ("openai", "gpt-4o-mini") # could be gpt-5-nano if available
def call_promptster(provider: str, model: str, prompt: str,
temperature: float = 0.3, max_tokens: int = 1000) -> dict:
"""Send a prompt to the Promptster API and return the response payload."""
r = requests.post(
f"{BASE_URL}/prompts/test",
headers={"Authorization": f"Bearer {PROMPTSTER_KEY}"},
json={
"provider": provider,
"model": model,
"prompt": prompt,
"temperature": temperature,
"max_tokens": max_tokens,
},
timeout=60,
)
r.raise_for_status()
return r.json()
def classify(prompt: str) -> TaskType:
"""Ask a cheap model to classify the task shape."""
classifier_prompt = f"""Classify this prompt into exactly one of these task types:
- code: writing, debugging, or refactoring source code
- math: arithmetic, logic, or quantitative reasoning
- extraction: pulling structured fields from unstructured text
- creative: writing, storytelling, persuasion, rewriting tone
- factual: answering factual questions about the real world
- general: everything else
Prompt: {prompt}
Reply with ONLY the label, lowercase, no punctuation.
"""
provider, model = CLASSIFIER_MODEL
result = call_promptster(provider, model, classifier_prompt,
temperature=0.0, max_tokens=10)
label = result["response"].strip().lower()
if label not in ROUTES:
return "general"
return label # type: ignore[return-value]
def route(prompt: str) -> dict:
"""Classify, route, execute — return full payload incl. metadata."""
task_type = classify(prompt)
provider, model = ROUTES[task_type]
answer = call_promptster(provider, model, prompt)
answer["_router"] = {"task_type": task_type, "routed_to": f"{provider}/{model}"}
return answer
if __name__ == "__main__":
import sys
result = route(sys.argv[1])
print(f"Routed to: {result['_router']['routed_to']}")
print(f"Cost: ${result.get('cost_usd', 0):.6f}")
print(f"Response:\n{result['response']}")
That's the whole thing. Save as router.py, export PROMPTSTER_API_KEY, run:
python router.py "Extract the company name, date, and speaker list from this press release: ..."
# → Routed to: google/gemini-2.5-flash-lite
python router.py "Write a function that validates an IPv4 address in Python"
# → Routed to: openai/gpt-4o-mini
python router.py "What's the boiling point of water at 4000m altitude?"
# → Routed to: perplexity/sonar
Why It Works
The routing table encodes the findings from our task-type decision framework. Each entry reflects a measurement, not a guess:
- Code → GPT-4o Mini: 8-provider IPv4 benchmark showed cheap models match frontier quality on well-specified code. (Data.)
- Math → Claude Sonnet: cleaner chain-of-thought than budget tier; no silent truncation. (Budget-tier math results were correct but verbose to the point of truncation; frontier is more predictable here.)
- Extraction → Gemini 2.5 Flash Lite: cheapest, fastest, and schema-accurate. (Data.)
- Creative → Claude Sonnet: outright winner on formal constraint following in our task-framework test.
- Factual → Perplexity: only model in our 11-provider consensus study that scored 5/5 on a recent-knowledge factual task, because it has web retrieval.
- General → GPT-4o Mini: safe default. Passed every test in our cheap-fast-smart triangle without pathological failure.
If your workload shape differs from ours, replace the routing table. The architecture is what matters; the mappings are tunable.
When the Router Misfires
Real-world prompts don't always classify cleanly. A few failure modes and fixes:
Ambiguous prompts. "Summarize this code file's logic." Is that extraction or code? Answer: either works — you won't lose quality. The classifier will pick one, and either route is cheap.
Classifier misclassification. The nano classifier has maybe 95% accuracy. The remaining 5% get routed wrong. Usually the wrong route is slightly suboptimal, not broken. If you see consistent misroutes, tighten the classifier prompt or add a second-pass verifier (e.g., if classifier says "creative" and the prompt contains def or {, override to "code").
Frontier tier still wrong. Sometimes the prompt is hard enough that even Sonnet/GPT-4o makes errors. Detect with a confidence signal — for extraction tasks, schema validation catches it; for math, a second run with the reasoning tier can serve as a check. We'll cover this in the 3-judge consensus pattern post on May 11.
Scaling the Pattern
The 60-line version above is enough to ship. For production scale, add:
- Caching on classification. Many prompts repeat. Cache
hash(prompt) → task_typefor 24 hours. - Fallbacks. If the routed provider returns 5xx, fall back to a different provider at the same tier. The Promptster
testendpoint doesn't multi-route on its own; wrap with retry + alt-provider logic. - Observability. Log
task_type,routed_to,cost_usd, andlatency_msper call. After a week you'll see which task types dominate your traffic and whether your routing table is well-calibrated. - Async / batch mode. For high-volume ingestion, use the Promptster compare endpoint to route multiple prompts in parallel.
For monitoring routing quality over time, schedule a weekly comparison test across your top task types so you notice when a provider deprecates a model or a newer model dethrones your current pick.
The Real Win
You don't need LangChain to build an LLM router. You don't need a vector DB. You don't need a meta-learner. You need a cheap classifier, a routing table backed by data, and consistent monitoring.
60 lines of Python and a few dollars a month of API spend will outperform most "we use Claude for everything" production setups on cost — by 10-40x for mixed workloads — with no quality loss on routed work and better quality on hard work.
For the data behind this routing table, start with our 11-provider consensus study, the 300x price spread analysis, and the task-type decision framework.
Code tested against Promptster API v1 as of 2026-04-19. Requires a pk_live_* or pk_test_* API key; get one at /developer/api-keys.