Smart Routing - LockLLM Documentation

Link to section: What is Smart Routing?What is Smart Routing?

Smart Routing automatically selects the best AI model for each request based on the task type and prompt complexity. Instead of always using your most expensive model, simple tasks are routed to efficient models that deliver the same quality at lower cost, while complex tasks stay on advanced models.

Key benefits:

Save money by routing simple tasks to cheaper models automatically
Maintain quality for complex tasks that need advanced reasoning
11 task types detected automatically by AI analysis
Three complexity tiers (low, medium, high) for fine-grained control
You only pay routing fees when routing actually saves you money (5% of savings)
No routing fee when your original model is selected or a more expensive model is chosen
Full transparency - response headers show exactly what happened and why
Create custom routing rules through the dashboard for full control
Available in proxy mode with all major providers

Link to section: How Routing WorksHow Routing Works

Smart Routing follows a four-step process for every request:

1. Classify Task    ->  What type of work is this? (e.g., Code Generation)
2. Analyze Complexity  ->  How difficult is it? (low / medium / high)
3. Select Model     ->  Which model is optimal for this task + complexity?
4. Execute          ->  Route to selected model using BYOK key or LockLLM credits

The entire classification happens in real time with minimal latency impact.

Link to section: Task TypesTask Types

LockLLM's AI classifier automatically identifies the task type from your prompt:

Task Type	Description	Example Prompts
Open QA	Open-ended questions needing creative or expansive answers	"What are the implications of quantum computing?"
Closed QA	Factual questions with specific, definitive answers	"What is the capital of France?"
Summarization	Condensing longer content into shorter form	"Summarize this article in 3 bullet points"
Text Generation	Creative writing and content creation	"Write a product description for..."
Code Generation	Programming, debugging, and code-related tasks	"Write a Python function that sorts a list"
Chatbot	Conversational interactions and dialogue	"Hi, can you help me with my order?"
Classification	Categorizing, labeling, or sorting content	"Is this review positive or negative?"
Rewrite	Editing, rephrasing, or reformatting content	"Rewrite this paragraph in a formal tone"
Brainstorming	Generating ideas and exploring possibilities	"Give me 10 marketing campaign ideas for..."
Extraction	Pulling specific data from text	"Extract all email addresses from this text"
Other	Tasks that don't match the above categories	Miscellaneous requests

Link to section: Complexity TiersComplexity Tiers

Each prompt is assigned a complexity score and mapped to a tier:

Tier	Description	Typical Routing Behavior
Low	Simple, straightforward tasks with clear answers	Routes to efficient, cost-effective models
Medium	Moderate complexity requiring some reasoning	Routes to balanced models
High	Complex tasks needing advanced reasoning, nuance, or creativity	Stays on your original (typically more capable) model

High-complexity tasks are kept on your original model because quality matters most for difficult requests. Routing fees only apply when routing to a cheaper model, so high-complexity requests that stay on your original model incur no routing fee.

Link to section: Routing ModesRouting Modes

Link to section: Auto RoutingAuto Routing

Set X-LockLLM-Route-Action: auto to enable AI-powered routing. The system automatically classifies your task, analyzes complexity, and selects the optimal model.

When auto routing saves you money:

A simple "What is 2+2?" sent to GPT-4 gets routed to a faster, cheaper model
A basic summarization task gets routed to an efficient model instead of a premium one
Low-complexity chatbot interactions use cost-effective models

When auto routing preserves quality:

Complex code generation stays on your advanced model
High-complexity analysis and reasoning tasks are not downgraded
Tasks requiring deep domain knowledge keep your original model

Link to section: Custom RoutingCustom Routing

Set X-LockLLM-Route-Action: custom to use your own routing rules defined in the dashboard. Custom rules let you specify exactly which model to use for each task type and complexity combination.

If no matching custom rule is found for a request, it automatically falls back to auto routing.

Link to section: Disabled (Default)Disabled (Default)

By default, routing is disabled. Your requests use exactly the model you specified with no routing intervention.

Link to section: How Task Classification WorksHow Task Classification Works

LockLLM uses AI-powered analysis to determine both the task type and complexity of each prompt. The classifier examines linguistic patterns, structural cues, and content to identify whether a prompt is asking for code, seeking factual answers, requesting creative content, summarizing text, or performing other tasks.

Complexity is assessed based on factors like the depth of reasoning required, number of constraints in the prompt, domain specificity, and whether the task requires multi-step thinking. The result is a complexity score that maps to one of three tiers: low, medium, or high.

You can see exactly how each request was classified by checking the response headers:

X-LockLLM-Task-Type shows the detected task type (e.g., "Code Generation", "Summarization")
X-LockLLM-Complexity shows the complexity score (0.0-1.0)

Classification works best with clear, focused prompts. Very short or ambiguous prompts may occasionally be categorized as "Other", in which case standard complexity-based routing applies.

Link to section: Auto Routing Behavior in DetailAuto Routing Behavior in Detail

Link to section: What Gets RoutedWhat Gets Routed

Auto routing makes different decisions based on the detected task type and complexity:

Complexity	Behavior	Routing Fee
Low	Routed to a cost-efficient model	5% of cost savings
Medium	Routed to a balanced-performance model	5% of cost savings
High	Kept on your original model	FREE (no routing)

Code Generation: Complex code generation tasks stay on your original model to preserve quality. Lower-complexity code tasks may be routed to efficient models. If you want full control over code task routing, use custom routing rules to define exactly which models handle each complexity tier.

Link to section: What You PayWhat You Pay

You are only charged a routing fee when routing actually saves you money. The fee is 5% of the cost difference between your original model and the selected model. If the router keeps your original model (high complexity, code generation, or no cheaper alternative), there is no fee at all.

Link to section: Cost Optimization ExamplesCost Optimization Examples

Link to section: Example 1: Customer Support ChatbotExample 1: Customer Support Chatbot

A chatbot handles a mix of traffic: ~60% simple greetings and FAQ answers (low complexity), ~30% moderate customer inquiries (medium complexity), and ~10% complex account issues (high complexity).

Simple requests ("What are your business hours?") are routed to cost-efficient models, saving 50-70% per request
Moderate requests ("Help me understand my billing statement") use balanced models
Complex requests ("I need to dispute a charge and escalate to management") stay on the original model
Result: Significant cost reduction on the majority of traffic with no quality impact on complex cases

Link to section: Example 2: Code AssistantExample 2: Code Assistant

A developer tool sends all requests as code generation tasks. Complex code tasks stay on the original model to preserve quality, while simpler code tasks may be routed to efficient models for cost savings. For full control over code routing, define custom routing rules to specify exactly which model handles each complexity tier.

Link to section: Example 3: Content PlatformExample 3: Content Platform

A content platform uses AI for various tasks: article summarization, content rewriting, idea brainstorming, and in-depth analysis. Simple summarization and basic rewrites are routed to cheaper models, while complex content generation and analysis stay on premium models. The routing fee is a fraction of the savings, and the platform sees an overall 30-50% reduction in AI costs.

Link to section: Monitoring Routing DecisionsMonitoring Routing Decisions

Every routed request includes detailed metadata in the response headers, giving you full visibility into what happened and why:

What was classified: X-LockLLM-Task-Type and X-LockLLM-Complexity
What was selected: X-LockLLM-Selected-Model and X-LockLLM-Routing-Reason
What it cost: X-LockLLM-Estimated-Original-Cost, X-LockLLM-Estimated-Routed-Cost, and X-LockLLM-Estimated-Savings

Recommended rollout approach:

Enable auto routing in a staging or development environment first
Review routing decisions in your activity logs via the dashboard
Check that task classification matches your expectations for common prompts
If a specific task type is being misclassified, create custom routing rules for that combination
Deploy to production once you are confident in the routing behavior

You can also run auto routing on non-critical endpoints while keeping routing disabled on critical ones, then expand as you build confidence.

Link to section: Setting Up Auto RoutingSetting Up Auto Routing

Link to section: Proxy Mode - JavaScript/TypeScriptProxy Mode - JavaScript/TypeScript

const OpenAI = require('openai')

const openai = new OpenAI({
  apiKey: process.env.LOCKLLM_API_KEY,
  baseURL: 'https://api.lockllm.com/v1/proxy/openai',
  defaultHeaders: {
    'X-LockLLM-Route-Action': 'auto'
  }
})

// User requests GPT-4, but if the task is simple,
// the router selects a cheaper model automatically
const response = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: userPrompt }]
})

Link to section: Proxy Mode - PythonProxy Mode - Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get('LOCKLLM_API_KEY'),
    base_url='https://api.lockllm.com/v1/proxy/openai',
    default_headers={
        'X-LockLLM-Route-Action': 'auto'
    }
)

response = client.chat.completions.create(
    model='gpt-4',
    messages=[{'role': 'user', 'content': user_prompt}]
)

Link to section: LockLLM SDK - JavaScript/TypeScriptLockLLM SDK - JavaScript/TypeScript

import { createOpenAI } from '@lockllm/sdk/wrappers'

const openai = createOpenAI({
  apiKey: process.env.LOCKLLM_API_KEY,
  proxyOptions: {
    routeAction: 'auto'
  }
})

Link to section: LockLLM SDK - PythonLockLLM SDK - Python

from lockllm import create_openai, ProxyOptions

openai = create_openai(
    api_key=os.getenv('LOCKLLM_API_KEY'),
    proxy_options=ProxyOptions(route_action='auto')
)

Link to section: Setting Up Custom Routing RulesSetting Up Custom Routing Rules

Link to section: Step 1: Open the Routing DashboardStep 1: Open the Routing Dashboard

Sign in to your LockLLM dashboard
Go to Proxy > Custom Routing (or visit lockllm.com/route-settings)

Link to section: Step 2: Create a RuleStep 2: Create a Rule

Click Create Rule
Select a task type (e.g., Code Generation)
Select a complexity tier (e.g., High)
Enter the target model name (e.g., claude-sonnet-4-20250514)
Select the provider (e.g., Anthropic)
Toggle Use BYOK if you want to use your own API key for this provider
Click Save

Link to section: Step 3: Enable Custom RoutingStep 3: Enable Custom Routing

Set the routing header to custom in your requests:

const openai = new OpenAI({
  apiKey: process.env.LOCKLLM_API_KEY,
  baseURL: 'https://api.lockllm.com/v1/proxy/openai',
  defaultHeaders: {
    'X-LockLLM-Route-Action': 'custom'
  }
})

Link to section: Example RulesExample Rules

Task Type	Complexity	Target Model	Provider	Use BYOK
Code Generation	High	claude-sonnet-4-20250514	Anthropic	Yes
Code Generation	Low	gpt-4o-mini	OpenAI	Yes
Summarization	Low	gpt-4o-mini	OpenAI	Yes
Open QA	High	gpt-4	OpenAI	Yes
Chatbot	Low	llama-3-8b	Groq	Yes

Link to section: PricingPricing

Smart routing uses a simple, fair pricing model: you only pay when routing saves you money.

Link to section: Routing Fee StructureRouting Fee Structure

Scenario	Fee
Route to cheaper model	5% of cost savings
Route to same-cost model	FREE
Route to more expensive model	FREE
No route taken (original model used)	FREE
Routing disabled	FREE

Link to section: Example: Cost Savings BreakdownExample: Cost Savings Breakdown

Scenario: User requests an expensive model, but the prompt is a simple question.

Original model cost estimate: $0.50 per request
Router selects cheaper model: $0.05 per request
Cost savings: $0.45
Routing fee: $0.45 x 5% = $0.0225
Total cost: $0.05 + $0.0225 = $0.0725
You save: $0.4275 (85% savings after fee)

The routing fee is small compared to the savings, and you never pay a fee when routing doesn't save you money.

Link to section: Routing MetadataRouting Metadata

Every routed request includes metadata in the response headers so you can see exactly what happened:

Header	Description
`X-LockLLM-Route-Enabled`	`"true"` - routing was active for this request
`X-LockLLM-Task-Type`	Detected task type (e.g., "Code Generation")
`X-LockLLM-Complexity`	Complexity score (0.0-1.0)
`X-LockLLM-Selected-Model`	Model chosen by the router
`X-LockLLM-Routing-Reason`	Human-readable explanation for the selection
`X-LockLLM-Original-Model`	Your originally requested model (if changed)
`X-LockLLM-Original-Provider`	Your original provider (if changed)
`X-LockLLM-Estimated-Original-Cost`	What the original model would have cost
`X-LockLLM-Estimated-Routed-Cost`	What the selected model costs
`X-LockLLM-Estimated-Savings`	Estimated dollar savings
`X-LockLLM-Routing-Fee-Reserved`	Routing fee charged (5% of savings)

These headers give you full visibility into routing decisions for monitoring, debugging, and cost tracking.

Link to section: ConfigurationConfiguration

Header	Values	Default	Description
`X-LockLLM-Route-Action`	`disabled`, `auto`, `custom`	`disabled`	Routing mode

Link to section: Combining with Other FeaturesCombining with Other Features

Smart routing works alongside all other LockLLM features:

Threat detection: Security scanning runs first, routing happens after the prompt is cleared
Custom policies: Policy checks are independent of routing decisions
PII detection: PII is detected/redacted before routing occurs
Prompt compression: Compression is applied after routing decisions are made
Abuse detection: Abuse checks are independent of routing

Link to section: LimitationsLimitations

Proxy mode only - Routing is available exclusively in proxy mode (/v1/proxy), not in the standalone scan endpoint
BYOK requirements - For auto routing to non-OpenRouter providers, you need a BYOK key configured for the target provider. If no key is available, the router falls back to your original model
Classification accuracy - Task classification works well for clear, focused prompts. Very short or ambiguous prompts may occasionally be miscategorized, though the router defaults to your original model when uncertain

Link to section: FAQFAQ

Link to section: What if I disagree with a routing decision?What if I disagree with a routing decision?

You have full control. Check the X-LockLLM-Task-Type and X-LockLLM-Complexity response headers to see how your prompt was classified. If routing isn't working well for specific use cases, switch to custom routing mode and define your own rules, or disable routing entirely for those requests.

Link to section: Does routing add latency?Does routing add latency?

Routing adds minimal latency for the classification step. The classification runs in parallel with security scanning, so the overall impact is small. The potential cost savings typically outweigh the minor latency addition.

Link to section: Can I use routing with the universal endpoint?Can I use routing with the universal endpoint?

Yes. When using the universal endpoint (/v1/proxy/chat/completions) with LockLLM credits, routing can select from any available model. This works without BYOK keys since all models are accessed through LockLLM credits.

Link to section: Do custom rules override auto routing?Do custom rules override auto routing?

Yes. When X-LockLLM-Route-Action: custom is set, your custom rules take priority. If no custom rule matches the detected task type and complexity, the system falls back to auto routing logic.

Link to section: Can I route different task types to different providers?Can I route different task types to different providers?

Yes. Custom routing rules support any combination of task type, complexity tier, target model, and provider. You can route code generation to Anthropic, summarization to OpenAI, and chatbot tasks to Groq - all with different BYOK keys.

Link to section: Is there a way to see routing decisions without enabling routing?Is there a way to see routing decisions without enabling routing?

No. Routing metadata is only available when routing is enabled (auto or custom). To test routing without impacting your production traffic, enable routing on a staging or development environment first.