Show HN: NadirClaw, LLM router that cuts costs by routing prompts right
Open-source LLM router that saves you money. Simple prompts go to cheap/local models, complex prompts go to premium models -- automatically.
NadirClaw sits between your AI tool and your LLM providers as an OpenAI-compatible proxy. It classifies every prompt in ~10ms and routes it to the right model. Works with any tool that speaks the OpenAI API: OpenClaw, Codex, Claude Code, Continue, Cursor, or plain curl.
How does NadirClaw compare to OpenRouter? See NadirClaw vs OpenRouter.
Or install from source:
Then run the interactive setup wizard:
This guides you through selecting providers, entering API keys, and choosing models for each routing tier. Then start the router:
That's it. NadirClaw starts on http://localhost:8856 with sensible defaults (Gemini 3 Flash for simple, OpenAI Codex for complex). If you skip nadirclaw setup, the serve command will offer to run it on first launch.
This clones the repo to ~/.nadirclaw, creates a virtual environment, installs dependencies, and adds nadirclaw to your PATH. Run it again to update.
NadirClaw loads configuration from ~/.nadirclaw/.env. Create or edit this file to set API keys and model preferences:
If ~/.nadirclaw/.env does not exist, NadirClaw falls back to .env in the current directory.
NadirClaw supports multiple ways to provide LLM credentials, checked in this order:
Set API keys in ~/.nadirclaw/.env:
Configure which model handles each tier:
Gemini models are called natively via the Google GenAI SDK. All other models go through LiteLLM, which supports 100+ providers.
Gemini is the default simple model. NadirClaw calls Gemini natively via the Google GenAI SDK for best performance.
If the primary model hits a 429 rate limit, NadirClaw automatically retries once, then falls back to the other tier's model. For example, if gemini-3-flash-preview is exhausted, NadirClaw will try gemini-2.5-pro (or whatever your complex model is). If both models are rate-limited, it returns a friendly error message instead of crashing.
If you're running Ollama locally, NadirClaw works out of the box with no API keys:
Or mix local + cloud:
OpenClaw is a personal AI assistant that bridges messaging services to AI coding agents. NadirClaw integrates as a model provider so OpenClaw's requests are automatically routed to the right model.
This writes NadirClaw as a provider in ~/.openclaw/openclaw.json with model nadirclaw/auto. If OpenClaw is already running, it will auto-reload the config -- no restart needed.
nadirclaw openclaw onboard adds this to your OpenClaw config:
NadirClaw supports the SSE streaming format that OpenClaw expects (stream: true), handling multi-modal content and tool definitions in system prompts.
Codex is OpenAI's CLI coding agent. NadirClaw integrates as a custom model provider.
This writes ~/.codex/config.toml:
To use your ChatGPT subscription instead of an API key:
This delegates to the Codex CLI for the OAuth flow and stores the credentials in ~/.nadirclaw/credentials.json. Tokens are automatically refreshed when they expire.
NadirClaw exposes a standard OpenAI-compatible API. Point any tool at it:
Choose your routing strategy by setting the model field:
Use short names instead of full model IDs:
Beyond basic simple/complex classification, NadirClaw applies routing modifiers that can override the base decision:
NadirClaw detects agentic requests (coding agents, multi-step tool use) and forces them to the complex model, even if the individual message looks simple. Signals:
This prevents a message like "now add tests" from being routed to the cheap model when it's part of an ongoing agentic refactoring session.
Prompts with 2+ reasoning markers are routed to the reasoning model (or complex model if no reasoning model is configured):
Once a conversation is routed to a model, subsequent messages in the same session reuse that model. This prevents jarring mid-conversation model switches. Sessions are keyed by system prompt + first user message, with a 30-minute TTL.
If the estimated token count of a request exceeds a model's context window, NadirClaw automatically swaps to a model with a larger context. For example, a 150k-token conversation targeting gpt-4o (128k context) will be redirected to gemini-2.5-pro (1M context).
Analyze request logs and print a summary report:
Example output:
Classify a prompt locally without running the server. Useful for testing your setup:
Most LLM usage doesn't need a premium model. NadirClaw routes each prompt to the right tier automatically:
NadirClaw uses a binary complexity classifier based on sentence embeddings:
Pre-computed centroids: Ships two tiny centroid vectors (~1.5 KB each) derived from ~170 seed prompts. These are pre-computed and included in the package — no training step required.
Classification: For each incoming prompt, computes its embedding using all-MiniLM-L6-v2 (~80 MB, downloaded once on first use) and measures cosine similarity to both centroids. If the prompt is closer to the complex centroid, it routes to your complex model; otherwise to your simple model.
Borderline handling: When confidence is below the threshold (default 0.06), the classifier defaults to complex -- it's cheaper to over-serve a simple prompt than to under-serve a complex one.
Routing modifiers: After classification, NadirClaw applies intelligent overrides:
Dispatch: Calls the selected model via the appropriate backend:
Rate limit fallback: If the selected model returns a 429 rate limit error, NadirClaw retries once, then automatically falls back to the other tier's model. If both are rate-limited, it returns a user-friendly error message.
Classification takes ~10ms on a warm encoder. The first request takes ~2-3 seconds to load the embedding model.
Auth is disabled by default (local-only). Set NADIRCLAW_AUTH_TOKEN to require a bearer token.
NadirClaw supports optional distributed tracing via OpenTelemetry. Install the extras and set an OTLP endpoint:
When enabled, NadirClaw emits spans for:
Spans include GenAI semantic conventions (gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens) plus custom nadirclaw.* attributes for routing metadata.
If the telemetry packages are not installed or OTEL_EXPORTER_OTLP_ENDPOINT is not set, all tracing is a no-op with zero overhead.
MIT
Marco Rodriguez
Startup ScoutFinding the next unicorn before it breaks. Passionate about innovation and entrepreneurship.