Show HN: NadirClaw, LLM router that cuts costs by routing prompts right

Open-source LLM router that saves you money. Simple prompts go to cheap/local models, complex prompts go to premium models -- automatically.

NadirClaw sits between your AI tool and your LLM providers as an OpenAI-compatible proxy. It classifies every prompt in ~10ms and routes it to the right model. Works with any tool that speaks the OpenAI API: OpenClaw, Codex, Claude Code, Continue, Cursor, or plain curl.

How does NadirClaw compare to OpenRouter? See NadirClaw vs OpenRouter.

Or install from source:

Then run the interactive setup wizard:

This guides you through selecting providers, entering API keys, and choosing models for each routing tier. Then start the router:

That's it. NadirClaw starts on http://localhost:8856 with sensible defaults (Gemini 3 Flash for simple, OpenAI Codex for complex). If you skip nadirclaw setup, the serve command will offer to run it on first launch.

This clones the repo to ~/.nadirclaw, creates a virtual environment, installs dependencies, and adds nadirclaw to your PATH. Run it again to update.

NadirClaw loads configuration from ~/.nadirclaw/.env. Create or edit this file to set API keys and model preferences:

If ~/.nadirclaw/.env does not exist, NadirClaw falls back to .env in the current directory.

NadirClaw supports multiple ways to provide LLM credentials, checked in this order:

Set API keys in ~/.nadirclaw/.env:

Configure which model handles each tier:

Gemini models are called natively via the Google GenAI SDK. All other models go through LiteLLM, which supports 100+ providers.

Gemini is the default simple model. NadirClaw calls Gemini natively via the Google GenAI SDK for best performance.

If the primary model hits a 429 rate limit, NadirClaw automatically retries once, then falls back to the other tier's model. For example, if gemini-3-flash-preview is exhausted, NadirClaw will try gemini-2.5-pro (or whatever your complex model is). If both models are rate-limited, it returns a friendly error message instead of crashing.

If you're running Ollama locally, NadirClaw works out of the box with no API keys:

Or mix local + cloud:

OpenClaw is a personal AI assistant that bridges messaging services to AI coding agents. NadirClaw integrates as a model provider so OpenClaw's requests are automatically routed to the right model.

This writes NadirClaw as a provider in ~/.openclaw/openclaw.json with model nadirclaw/auto. If OpenClaw is already running, it will auto-reload the config -- no restart needed.

nadirclaw openclaw onboard adds this to your OpenClaw config:

NadirClaw supports the SSE streaming format that OpenClaw expects (stream: true), handling multi-modal content and tool definitions in system prompts.

Codex is OpenAI's CLI coding agent. NadirClaw integrates as a custom model provider.

This writes ~/.codex/config.toml:

To use your ChatGPT subscription instead of an API key:

This delegates to the Codex CLI for the OAuth flow and stores the credentials in ~/.nadirclaw/credentials.json. Tokens are automatically refreshed when they expire.

NadirClaw exposes a standard OpenAI-compatible API. Point any tool at it:

Choose your routing strategy by setting the model field:

Use short names instead of full model IDs:

Beyond basic simple/complex classification, NadirClaw applies routing modifiers that can override the base decision:

NadirClaw detects agentic requests (coding agents, multi-step tool use) and forces them to the complex model, even if the individual message looks simple. Signals:

This prevents a message like "now add tests" from being routed to the cheap model when it's part of an ongoing agentic refactoring session.

Prompts with 2+ reasoning markers are routed to the reasoning model (or complex model if no reasoning model is configured):

Once a conversation is routed to a model, subsequent messages in the same session reuse that model. This prevents jarring mid-conversation model switches. Sessions are keyed by system prompt + first user message, with a 30-minute TTL.

If the estimated token count of a request exceeds a model's context window, NadirClaw automatically swaps to a model with a larger context. For example, a 150k-token conversation targeting gpt-4o (128k context) will be redirected to gemini-2.5-pro (1M context).

Analyze request logs and print a summary report:

Example output:

Classify a prompt locally without running the server. Useful for testing your setup:

Most LLM usage doesn't need a premium model. NadirClaw routes each prompt to the right tier automatically:

NadirClaw uses a binary complexity classifier based on sentence embeddings:

Pre-computed centroids: Ships two tiny centroid vectors (~1.5 KB each) derived from ~170 seed prompts. These are pre-computed and included in the package — no training step required.

Classification: For each incoming prompt, computes its embedding using all-MiniLM-L6-v2 (~80 MB, downloaded once on first use) and measures cosine similarity to both centroids. If the prompt is closer to the complex centroid, it routes to your complex model; otherwise to your simple model.

Borderline handling: When confidence is below the threshold (default 0.06), the classifier defaults to complex -- it's cheaper to over-serve a simple prompt than to under-serve a complex one.

Routing modifiers: After classification, NadirClaw applies intelligent overrides:

Dispatch: Calls the selected model via the appropriate backend:

Rate limit fallback: If the selected model returns a 429 rate limit error, NadirClaw retries once, then automatically falls back to the other tier's model. If both are rate-limited, it returns a user-friendly error message.

Classification takes ~10ms on a warm encoder. The first request takes ~2-3 seconds to load the embedding model.

Auth is disabled by default (local-only). Set NADIRCLAW_AUTH_TOKEN to require a bearer token.

NadirClaw supports optional distributed tracing via OpenTelemetry. Install the extras and set an OTLP endpoint:

When enabled, NadirClaw emits spans for:

Spans include GenAI semantic conventions (gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens) plus custom nadirclaw.* attributes for routing metadata.

If the telemetry packages are not installed or OTEL_EXPORTER_OTLP_ENDPOINT is not set, all tracing is a no-op with zero overhead.

MIT

Marco Rodriguez