Kalki Smart Router — API Documentation

The API that thinks before it routes.

Kalki Smart Router is a unified AI API that analyzes each request in real time, then routes it to the most capable and cost‑efficient model across providers. You get lower costs, reliable performance, and a single interface—no model wrangling, no vendor lock‑in.

Who this is for

Developers: Ship faster with one endpoint, consistent schemas, and built‑in caching.
Product & Growth Teams: Reduce inference spend without sacrificing quality.
Enterprises: Multi‑provider reliability, governance controls, and SLAs through one contract.

What makes Kalki different

Intelligent model selection – Kalki inspects your prompt (complexity, domain, format) and chooses the right model automatically.
Smart caching – Identical or semantically equivalent requests are served from cache for sub‑second responses at near‑zero cost.
Unified multi‑provider access – Access top models (e.g., OpenAI, Anthropic) through one stable API.
Enterprise‑grade reliability – Fallbacks and timeouts guard against provider outages.
Transparent cost control – Cap per‑request spend and see exactly where each call was routed.

Proven impact (benchmark snapshot)

Up to 89.5% lower cost vs direct OpenAI; 73.5% vs direct Anthropic.
Cache hits respond in ~305 ms (representative), with dramatic acceleration on repeats.

Benchmarks were run across simple, intermediate, and complex workloads; results vary with traffic and content.

Quick start

Base URL


Copied!https://api.kalkigpt.com/
https://api.kalkigpt.com/

Authentication


Copied!x-api-key: YOUR_KALKI_API_KEY
x-api-key: YOUR_KALKI_API_KEY

Core endpoint

POST `/v1/enrich_and_ask`

Kalki analyzes your request, optionally enriches context, selects a model, and returns the response.

Request body

Field	Type	Default	Notes
`prompt`	string	—	Required. Your instruction or query.
`domain`	string	`"general"`	One of `general`, `technical`, `creative`, `scientific`. Helps routing.
`verbosity`	string	`"balanced"`	`terse`, `balanced`, or `detailed`.
`temperature`	number	`0.7`	0.0–1.0.
`stream`	boolean	`false`	If `true`, responses are streamed.
`json_mode`	boolean	`false`	Forces JSON‑shaped output (validated).
`max_cost_usd`	number	—	Hard cap per request; router respects this limit.
`tone`	string	`"friendly"`	`professional`, `friendly`, or `casual`.
`audience`	string	—	Target audience hint (e.g., “exec”, “developer”).

Response (typical)


Copied!{
  "output": {
    "text": "…final answer…"
  },
  "routing": {
    "provider": "openai",
    "model": "gpt-5-nano",
    "reason": "intermediate explanation detected"
  },
  "cache": { "hit": false },
  "cost_usd": 0.0001,
  "processing_time_ms": 7760,
  "request_id": "req_abc123xyz"
}
{
  "output": {
    "text": "…final answer…"
  },
  "routing": {
    "provider": "openai",
    "model": "gpt-5-nano",
    "reason": "intermediate explanation detected"
  },
  "cache": { "hit": false },
  "cost_usd": 0.0001,
  "processing_time_ms": 7760,
  "request_id": "req_abc123xyz"
}

Errors

All errors return standard HTTP codes with a consistent body:


Copied!{ "error": { "type": "invalid_request", "message": "…" } }
{ "error": { "type": "invalid_request", "message": "…" } }

Common cases: 401 (invalid key), 429 (rate limit), 422 (validation), 5xx (provider error with automatic retry/fallbacks).

Examples

cURL — simple, optimized for speed & cost


Copied!curl -X POST "https://api.kalkigpt.com/v1/enrich_and_ask" \
  -H "x-api-key: $KALKI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What is 25 + 17?",
    "domain": "general",
    "verbosity": "terse"
  }'
curl -X POST "https://api.kalkigpt.com/v1/enrich_and_ask" \
  -H "x-api-key: $KALKI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What is 25 + 17?",
    "domain": "general",
    "verbosity": "terse"
  }'

JavaScript


Copied!import axios from "axios";

const client = axios.create({
  baseURL: "https://api.kalkigpt.com/v1",
  headers: { "x-api-key": process.env.KALKI_API_KEY }
});

const { data } = await client.post("/enrich_and_ask", {
  prompt: "Explain supervised vs unsupervised learning in simple terms",
  domain: "technical",
  verbosity: "balanced",
  json_mode: false
});

console.log(data.output.text);
import axios from "axios";

const client = axios.create({
  baseURL: "https://api.kalkigpt.com/v1",
  headers: { "x-api-key": process.env.KALKI_API_KEY }
});

const { data } = await client.post("/enrich_and_ask", {
  prompt: "Explain supervised vs unsupervised learning in simple terms",
  domain: "technical",
  verbosity: "balanced",
  json_mode: false
});

console.log(data.output.text);

Python


Copied!import os, requests

resp = requests.post(
    "https://api.kalkigpt.com/v1/enrich_and_ask",
    headers={
        "x-api-key": os.environ["KALKI_API_KEY"],
        "Content-Type": "application/json",
    },
    json={
        "prompt": "Draft a one-paragraph product update for non-technical execs.",
        "domain": "creative",
        "tone": "professional",
        "verbosity": "balanced"
    }
)
print(resp.json()["output"]["text"])
import os, requests

resp = requests.post(
    "https://api.kalkigpt.com/v1/enrich_and_ask",
    headers={
        "x-api-key": os.environ["KALKI_API_KEY"],
        "Content-Type": "application/json",
    },
    json={
        "prompt": "Draft a one-paragraph product update for non-technical execs.",
        "domain": "creative",
        "tone": "professional",
        "verbosity": "balanced"
    }
)
print(resp.json()["output"]["text"])

Streaming (Server‑Sent Events)

Set stream: true in the request body. The API returns an SSE stream with incremental tokens and a final completion event.

Pricing & limits

Usage‑based pricing only for the compute you use; Kalki routes to the most cost‑effective capable model.
Costs & routing transparency are returned with every response.
Starter tier includes up to 1,000 requests/day; higher limits available for paid plans.

See the dashboard for real‑time usage and savings.

Performance & savings (summary)

Query type	Kalki cost	OpenAI direct	Anthropic direct	Savings vs OpenAI	Savings vs Anthropic	Cache performance
Simple (arithmetic, facts)	~$0.0000	~$0.0001	~$0.0002	100%	100%	<500 ms (instant)
Intermediate (explanations)	~$0.0001	~$0.0034	~$0.0045	97%	98%	~305 ms cached
Complex (analysis)	~$0.0081	~$0.0396	~$0.0158	80%	49%	~305 ms when cached
Average savings				89.5%	73.5%	Significantly faster

Monthly projection (1,000 requests):

vs OpenAI: $15.29 saved
vs Anthropic: $4.99 saved

Governance, security, and reliability

No model lock‑in: One contract; many providers.
Resilience: Automated retries and provider fallbacks to maintain uptime.
Observability: Per‑request logs expose routing, cost, and latency.
Data handling: Requests are processed to serve your responses and routing logic; enterprise retention controls available.

Support & status

Getting started checklist

Create an account and generate your API key.
Make a test call using the cURL example above.
Integrate via your preferred SDK or direct HTTP.
Track savings and performance in the dashboard.

The Kalki promise

You shouldn’t have to think about models. We do it for you.
Kalki routes with discernment—so your team ships faster, spends less, and serves users better.

Get Your API Key

Talk to an Architect

True intelligence isn’t just about answers—it’s about presence, alignment, and the wisdom to serve with clarity. Kalki is not just a tool. Kalki is dharma in action.