Smart Routing
Automatically route LLM requests to the optimal model based on task type and complexity. Save costs on simple tasks while maintaining quality for complex ones.
Link to section: What is Smart Routing?What is Smart Routing?
Smart Routing automatically selects the best AI model for each request based on the task type and prompt complexity. Instead of always using your most expensive model, simple tasks are routed to efficient models that deliver the same quality at lower cost, while complex tasks stay on advanced models.
Key benefits:
- Save money by routing simple tasks to cheaper models automatically
- Maintain quality for complex tasks that need advanced reasoning
- 11 task types detected automatically by AI analysis
- Three complexity tiers (low, medium, high) for fine-grained control
- You only pay routing fees when routing actually saves you money (5% of savings)
- No routing fee when your original model is selected or a more expensive model is chosen
- Full transparency - response headers show exactly what happened and why
- Create custom routing rules through the dashboard for full control
- Available in proxy mode with all 17+ providers
Link to section: How Routing WorksHow Routing Works
Smart Routing follows a four-step process for every request:
1. Classify Task -> What type of work is this? (e.g., Code Generation)
2. Analyze Complexity -> How difficult is it? (low / medium / high)
3. Select Model -> Which model is optimal for this task + complexity?
4. Execute -> Route to selected model using BYOK key or LockLLM credits
The entire classification happens in real time with minimal latency impact.
Link to section: Task TypesTask Types
LockLLM's AI classifier automatically identifies the task type from your prompt:
| Task Type | Description | Example Prompts |
|---|---|---|
| Open QA | Open-ended questions needing creative or expansive answers | "What are the implications of quantum computing?" |
| Closed QA | Factual questions with specific, definitive answers | "What is the capital of France?" |
| Summarization | Condensing longer content into shorter form | "Summarize this article in 3 bullet points" |
| Text Generation | Creative writing and content creation | "Write a product description for..." |
| Code Generation | Programming, debugging, and code-related tasks | "Write a Python function that sorts a list" |
| Chatbot | Conversational interactions and dialogue | "Hi, can you help me with my order?" |
| Classification | Categorizing, labeling, or sorting content | "Is this review positive or negative?" |
| Rewrite | Editing, rephrasing, or reformatting content | "Rewrite this paragraph in a formal tone" |
| Brainstorming | Generating ideas and exploring possibilities | "Give me 10 marketing campaign ideas for..." |
| Extraction | Pulling specific data from text | "Extract all email addresses from this text" |
| Other | Tasks that don't match the above categories | Miscellaneous requests |
Link to section: Complexity TiersComplexity Tiers
Each prompt is assigned a complexity score and mapped to a tier:
| Tier | Description | Typical Routing Behavior |
|---|---|---|
| Low | Simple, straightforward tasks with clear answers | Routes to efficient, cost-effective models |
| Medium | Moderate complexity requiring some reasoning | Routes to balanced models |
| High | Complex tasks needing advanced reasoning, nuance, or creativity | Stays on your original (typically more capable) model |
High-complexity tasks are kept on your original model because quality matters most for difficult requests. Routing fees only apply when routing to a cheaper model, so high-complexity requests that stay on your original model incur no routing fee.
Link to section: Routing ModesRouting Modes
Link to section: Auto RoutingAuto Routing
Set X-LockLLM-Route-Action: auto to enable AI-powered routing. The system automatically classifies your task, analyzes complexity, and selects the optimal model.
When auto routing saves you money:
- A simple "What is 2+2?" sent to GPT-4 gets routed to a faster, cheaper model
- A basic summarization task gets routed to an efficient model instead of a premium one
- Low-complexity chatbot interactions use cost-effective models
When auto routing preserves quality:
- Complex code generation stays on your advanced model
- High-complexity analysis and reasoning tasks are not downgraded
- Tasks requiring deep domain knowledge keep your original model
Link to section: Custom RoutingCustom Routing
Set X-LockLLM-Route-Action: custom to use your own routing rules defined in the dashboard. Custom rules let you specify exactly which model to use for each task type and complexity combination.
If no matching custom rule is found for a request, it automatically falls back to auto routing.
Link to section: Disabled (Default)Disabled (Default)
By default, routing is disabled. Your requests use exactly the model you specified with no routing intervention.
Link to section: How Task Classification WorksHow Task Classification Works
LockLLM uses AI-powered analysis to determine both the task type and complexity of each prompt. The classifier examines linguistic patterns, structural cues, and content to identify whether a prompt is asking for code, seeking factual answers, requesting creative content, summarizing text, or performing other tasks.
Complexity is assessed based on factors like the depth of reasoning required, number of constraints in the prompt, domain specificity, and whether the task requires multi-step thinking. The result is a complexity score that maps to one of three tiers: low, medium, or high.
You can see exactly how each request was classified by checking the response headers:
X-LockLLM-Task-Typeshows the detected task type (e.g., "Code Generation", "Summarization")X-LockLLM-Complexityshows the complexity score (0.0-1.0)
Classification works best with clear, focused prompts. Very short or ambiguous prompts may occasionally be categorized as "Other", in which case standard complexity-based routing applies.
Link to section: Auto Routing Behavior in DetailAuto Routing Behavior in Detail
Link to section: What Gets RoutedWhat Gets Routed
Auto routing makes different decisions based on the detected task type and complexity:
| Complexity | Behavior | Routing Fee |
|---|---|---|
| Low | Routed to a cost-efficient model | 5% of cost savings |
| Medium | Routed to a balanced-performance model | 5% of cost savings |
| High | Kept on your original model | FREE (no routing) |
Code Generation: Complex code generation tasks stay on your original model to preserve quality. Lower-complexity code tasks may be routed to efficient models. If you want full control over code task routing, use custom routing rules to define exactly which models handle each complexity tier.
Link to section: What You PayWhat You Pay
You are only charged a routing fee when routing actually saves you money. The fee is 5% of the cost difference between your original model and the selected model. If the router keeps your original model (high complexity, code generation, or no cheaper alternative), there is no fee at all.
Link to section: Cost Optimization ExamplesCost Optimization Examples
Link to section: Example 1: Customer Support ChatbotExample 1: Customer Support Chatbot
A chatbot handles a mix of traffic: ~60% simple greetings and FAQ answers (low complexity), ~30% moderate customer inquiries (medium complexity), and ~10% complex account issues (high complexity).
- Simple requests ("What are your business hours?") are routed to cost-efficient models, saving 50-70% per request
- Moderate requests ("Help me understand my billing statement") use balanced models
- Complex requests ("I need to dispute a charge and escalate to management") stay on the original model
- Result: Significant cost reduction on the majority of traffic with no quality impact on complex cases
Link to section: Example 2: Code AssistantExample 2: Code Assistant
A developer tool sends all requests as code generation tasks. Complex code tasks stay on the original model to preserve quality, while simpler code tasks may be routed to efficient models for cost savings. For full control over code routing, define custom routing rules to specify exactly which model handles each complexity tier.
Link to section: Example 3: Content PlatformExample 3: Content Platform
A content platform uses AI for various tasks: article summarization, content rewriting, idea brainstorming, and in-depth analysis. Simple summarization and basic rewrites are routed to cheaper models, while complex content generation and analysis stay on premium models. The routing fee is a fraction of the savings, and the platform sees an overall 30-50% reduction in AI costs.
Link to section: Monitoring Routing DecisionsMonitoring Routing Decisions
Every routed request includes detailed metadata in the response headers, giving you full visibility into what happened and why:
- What was classified:
X-LockLLM-Task-TypeandX-LockLLM-Complexity - What was selected:
X-LockLLM-Selected-ModelandX-LockLLM-Routing-Reason - What it cost:
X-LockLLM-Estimated-Original-Cost,X-LockLLM-Estimated-Routed-Cost, andX-LockLLM-Estimated-Savings
Recommended rollout approach:
- Enable auto routing in a staging or development environment first
- Review routing decisions in your activity logs via the dashboard
- Check that task classification matches your expectations for common prompts
- If a specific task type is being misclassified, create custom routing rules for that combination
- Deploy to production once you are confident in the routing behavior
You can also run auto routing on non-critical endpoints while keeping routing disabled on critical ones, then expand as you build confidence.
Link to section: Setting Up Auto RoutingSetting Up Auto Routing
Link to section: Proxy Mode - JavaScript/TypeScriptProxy Mode - JavaScript/TypeScript
const OpenAI = require('openai')
const openai = new OpenAI({
apiKey: process.env.LOCKLLM_API_KEY,
baseURL: 'https://api.lockllm.com/v1/proxy/openai',
defaultHeaders: {
'X-LockLLM-Route-Action': 'auto'
}
})
// User requests GPT-4, but if the task is simple,
// the router selects a cheaper model automatically
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: userPrompt }]
})
Link to section: Proxy Mode - PythonProxy Mode - Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get('LOCKLLM_API_KEY'),
base_url='https://api.lockllm.com/v1/proxy/openai',
default_headers={
'X-LockLLM-Route-Action': 'auto'
}
)
response = client.chat.completions.create(
model='gpt-4',
messages=[{'role': 'user', 'content': user_prompt}]
)
Link to section: LockLLM SDK - JavaScript/TypeScriptLockLLM SDK - JavaScript/TypeScript
import { createOpenAI } from '@lockllm/sdk/wrappers'
const openai = createOpenAI({
apiKey: process.env.LOCKLLM_API_KEY,
proxyOptions: {
routeAction: 'auto'
}
})
Link to section: LockLLM SDK - PythonLockLLM SDK - Python
from lockllm import create_openai, ProxyOptions
openai = create_openai(
api_key=os.getenv('LOCKLLM_API_KEY'),
proxy_options=ProxyOptions(route_action='auto')
)
Link to section: Setting Up Custom Routing RulesSetting Up Custom Routing Rules
Link to section: Step 1: Open the Routing DashboardStep 1: Open the Routing Dashboard
- Sign in to your LockLLM dashboard
- Go to Proxy > Custom Routing (or visit lockllm.com/route-settings)
Link to section: Step 2: Create a RuleStep 2: Create a Rule
- Click Create Rule
- Select a task type (e.g., Code Generation)
- Select a complexity tier (e.g., High)
- Enter the target model name (e.g.,
claude-sonnet-4-20250514) - Select the provider (e.g., Anthropic)
- Toggle Use BYOK if you want to use your own API key for this provider
- Click Save
Link to section: Step 3: Enable Custom RoutingStep 3: Enable Custom Routing
Set the routing header to custom in your requests:
const openai = new OpenAI({
apiKey: process.env.LOCKLLM_API_KEY,
baseURL: 'https://api.lockllm.com/v1/proxy/openai',
defaultHeaders: {
'X-LockLLM-Route-Action': 'custom'
}
})
Link to section: Example RulesExample Rules
| Task Type | Complexity | Target Model | Provider | Use BYOK |
|---|---|---|---|---|
| Code Generation | High | claude-sonnet-4-20250514 | Anthropic | Yes |
| Code Generation | Low | gpt-4o-mini | OpenAI | Yes |
| Summarization | Low | gpt-4o-mini | OpenAI | Yes |
| Open QA | High | gpt-4 | OpenAI | Yes |
| Chatbot | Low | llama-3-8b | Groq | Yes |
Link to section: PricingPricing
Smart routing uses a simple, fair pricing model: you only pay when routing saves you money.
Link to section: Routing Fee StructureRouting Fee Structure
| Scenario | Fee |
|---|---|
| Route to cheaper model | 5% of cost savings |
| Route to same-cost model | FREE |
| Route to more expensive model | FREE |
| No route taken (original model used) | FREE |
| Routing disabled | FREE |
Link to section: Example: Cost Savings BreakdownExample: Cost Savings Breakdown
Scenario: User requests an expensive model, but the prompt is a simple question.
- Original model cost estimate: $0.50 per request
- Router selects cheaper model: $0.05 per request
- Cost savings: $0.45
- Routing fee: $0.45 x 5% = $0.0225
- Total cost: $0.05 + $0.0225 = $0.0725
- You save: $0.4275 (85% savings after fee)
The routing fee is small compared to the savings, and you never pay a fee when routing doesn't save you money.
Link to section: Routing MetadataRouting Metadata
Every routed request includes metadata in the response headers so you can see exactly what happened:
| Header | Description |
|---|---|
X-LockLLM-Route-Enabled | "true" - routing was active for this request |
X-LockLLM-Task-Type | Detected task type (e.g., "Code Generation") |
X-LockLLM-Complexity | Complexity score (0.0-1.0) |
X-LockLLM-Selected-Model | Model chosen by the router |
X-LockLLM-Routing-Reason | Human-readable explanation for the selection |
X-LockLLM-Original-Model | Your originally requested model (if changed) |
X-LockLLM-Original-Provider | Your original provider (if changed) |
X-LockLLM-Estimated-Original-Cost | What the original model would have cost |
X-LockLLM-Estimated-Routed-Cost | What the selected model costs |
X-LockLLM-Estimated-Savings | Estimated dollar savings |
X-LockLLM-Routing-Fee-Reserved | Routing fee charged (5% of savings) |
These headers give you full visibility into routing decisions for monitoring, debugging, and cost tracking.
Link to section: ConfigurationConfiguration
| Header | Values | Default | Description |
|---|---|---|---|
X-LockLLM-Route-Action | disabled, auto, custom | disabled | Routing mode |
Link to section: Combining with Other FeaturesCombining with Other Features
Smart routing works alongside all other LockLLM features:
- Threat detection: Security scanning runs first, routing happens after the prompt is cleared
- Custom policies: Policy checks are independent of routing decisions
- PII detection: PII is detected/redacted before routing occurs
- Prompt compression: Compression is applied after routing decisions are made
- Abuse detection: Abuse checks are independent of routing
Link to section: LimitationsLimitations
- Proxy mode only - Routing is available exclusively in proxy mode (
/v1/proxy), not in the standalone scan endpoint - BYOK requirements - For auto routing to non-OpenRouter providers, you need a BYOK key configured for the target provider. If no key is available, the router falls back to your original model
- Classification accuracy - Task classification works well for clear, focused prompts. Very short or ambiguous prompts may occasionally be miscategorized, though the router defaults to your original model when uncertain
Link to section: FAQFAQ
Link to section: What if I disagree with a routing decision?What if I disagree with a routing decision?
You have full control. Check the X-LockLLM-Task-Type and X-LockLLM-Complexity response headers to see how your prompt was classified. If routing isn't working well for specific use cases, switch to custom routing mode and define your own rules, or disable routing entirely for those requests.
Link to section: Does routing add latency?Does routing add latency?
Routing adds minimal latency for the classification step. The classification runs in parallel with security scanning, so the overall impact is small. The potential cost savings typically outweigh the minor latency addition.
Link to section: Can I use routing with the universal endpoint?Can I use routing with the universal endpoint?
Yes. When using the universal endpoint (/v1/proxy/chat/completions) with LockLLM credits, routing can select from any available model. This works without BYOK keys since all models are accessed through LockLLM credits.
Link to section: Do custom rules override auto routing?Do custom rules override auto routing?
Yes. When X-LockLLM-Route-Action: custom is set, your custom rules take priority. If no custom rule matches the detected task type and complexity, the system falls back to auto routing logic.
Link to section: Can I route different task types to different providers?Can I route different task types to different providers?
Yes. Custom routing rules support any combination of task type, complexity tier, target model, and provider. You can route code generation to Anthropic, summarization to OpenAI, and chatbot tasks to Groq - all with different BYOK keys.
Link to section: Is there a way to see routing decisions without enabling routing?Is there a way to see routing decisions without enabling routing?
No. Routing metadata is only available when routing is enabled (auto or custom). To test routing without impacting your production traffic, enable routing on a staging or development environment first.