Smart Routing

Automatically route LLM requests to the optimal model based on task type and complexity. Save costs on simple tasks while maintaining quality for complex ones.

Link to section: What is Smart Routing?What is Smart Routing?

Smart Routing automatically selects the best AI model for each request based on the task type and prompt complexity. Instead of always using your most expensive model, simple tasks are routed to efficient models that deliver the same quality at lower cost, while complex tasks stay on advanced models.

Key benefits:

  • Save money by routing simple tasks to cheaper models automatically
  • Maintain quality for complex tasks that need advanced reasoning
  • 11 task types detected automatically by AI analysis
  • Three complexity tiers (low, medium, high) for fine-grained control
  • You only pay routing fees when routing actually saves you money (5% of savings)
  • No routing fee when your original model is selected or a more expensive model is chosen
  • Full transparency - response headers show exactly what happened and why
  • Create custom routing rules through the dashboard for full control
  • Available in proxy mode with all 17+ providers

Link to section: How Routing WorksHow Routing Works

Smart Routing follows a four-step process for every request:

1. Classify Task    ->  What type of work is this? (e.g., Code Generation)
2. Analyze Complexity  ->  How difficult is it? (low / medium / high)
3. Select Model     ->  Which model is optimal for this task + complexity?
4. Execute          ->  Route to selected model using BYOK key or LockLLM credits

The entire classification happens in real time with minimal latency impact.

Link to section: Task TypesTask Types

LockLLM's AI classifier automatically identifies the task type from your prompt:

Task TypeDescriptionExample Prompts
Open QAOpen-ended questions needing creative or expansive answers"What are the implications of quantum computing?"
Closed QAFactual questions with specific, definitive answers"What is the capital of France?"
SummarizationCondensing longer content into shorter form"Summarize this article in 3 bullet points"
Text GenerationCreative writing and content creation"Write a product description for..."
Code GenerationProgramming, debugging, and code-related tasks"Write a Python function that sorts a list"
ChatbotConversational interactions and dialogue"Hi, can you help me with my order?"
ClassificationCategorizing, labeling, or sorting content"Is this review positive or negative?"
RewriteEditing, rephrasing, or reformatting content"Rewrite this paragraph in a formal tone"
BrainstormingGenerating ideas and exploring possibilities"Give me 10 marketing campaign ideas for..."
ExtractionPulling specific data from text"Extract all email addresses from this text"
OtherTasks that don't match the above categoriesMiscellaneous requests

Link to section: Complexity TiersComplexity Tiers

Each prompt is assigned a complexity score and mapped to a tier:

TierDescriptionTypical Routing Behavior
LowSimple, straightforward tasks with clear answersRoutes to efficient, cost-effective models
MediumModerate complexity requiring some reasoningRoutes to balanced models
HighComplex tasks needing advanced reasoning, nuance, or creativityStays on your original (typically more capable) model

High-complexity tasks are kept on your original model because quality matters most for difficult requests. Routing fees only apply when routing to a cheaper model, so high-complexity requests that stay on your original model incur no routing fee.

Link to section: Routing ModesRouting Modes

Link to section: Auto RoutingAuto Routing

Set X-LockLLM-Route-Action: auto to enable AI-powered routing. The system automatically classifies your task, analyzes complexity, and selects the optimal model.

When auto routing saves you money:

  • A simple "What is 2+2?" sent to GPT-4 gets routed to a faster, cheaper model
  • A basic summarization task gets routed to an efficient model instead of a premium one
  • Low-complexity chatbot interactions use cost-effective models

When auto routing preserves quality:

  • Complex code generation stays on your advanced model
  • High-complexity analysis and reasoning tasks are not downgraded
  • Tasks requiring deep domain knowledge keep your original model

Link to section: Custom RoutingCustom Routing

Set X-LockLLM-Route-Action: custom to use your own routing rules defined in the dashboard. Custom rules let you specify exactly which model to use for each task type and complexity combination.

If no matching custom rule is found for a request, it automatically falls back to auto routing.

Link to section: Disabled (Default)Disabled (Default)

By default, routing is disabled. Your requests use exactly the model you specified with no routing intervention.

Link to section: How Task Classification WorksHow Task Classification Works

LockLLM uses AI-powered analysis to determine both the task type and complexity of each prompt. The classifier examines linguistic patterns, structural cues, and content to identify whether a prompt is asking for code, seeking factual answers, requesting creative content, summarizing text, or performing other tasks.

Complexity is assessed based on factors like the depth of reasoning required, number of constraints in the prompt, domain specificity, and whether the task requires multi-step thinking. The result is a complexity score that maps to one of three tiers: low, medium, or high.

You can see exactly how each request was classified by checking the response headers:

  • X-LockLLM-Task-Type shows the detected task type (e.g., "Code Generation", "Summarization")
  • X-LockLLM-Complexity shows the complexity score (0.0-1.0)

Classification works best with clear, focused prompts. Very short or ambiguous prompts may occasionally be categorized as "Other", in which case standard complexity-based routing applies.

Link to section: Auto Routing Behavior in DetailAuto Routing Behavior in Detail

Link to section: What Gets RoutedWhat Gets Routed

Auto routing makes different decisions based on the detected task type and complexity:

ComplexityBehaviorRouting Fee
LowRouted to a cost-efficient model5% of cost savings
MediumRouted to a balanced-performance model5% of cost savings
HighKept on your original modelFREE (no routing)

Code Generation: Complex code generation tasks stay on your original model to preserve quality. Lower-complexity code tasks may be routed to efficient models. If you want full control over code task routing, use custom routing rules to define exactly which models handle each complexity tier.

Link to section: What You PayWhat You Pay

You are only charged a routing fee when routing actually saves you money. The fee is 5% of the cost difference between your original model and the selected model. If the router keeps your original model (high complexity, code generation, or no cheaper alternative), there is no fee at all.

Link to section: Cost Optimization ExamplesCost Optimization Examples

Link to section: Example 1: Customer Support ChatbotExample 1: Customer Support Chatbot

A chatbot handles a mix of traffic: ~60% simple greetings and FAQ answers (low complexity), ~30% moderate customer inquiries (medium complexity), and ~10% complex account issues (high complexity).

  • Simple requests ("What are your business hours?") are routed to cost-efficient models, saving 50-70% per request
  • Moderate requests ("Help me understand my billing statement") use balanced models
  • Complex requests ("I need to dispute a charge and escalate to management") stay on the original model
  • Result: Significant cost reduction on the majority of traffic with no quality impact on complex cases

Link to section: Example 2: Code AssistantExample 2: Code Assistant

A developer tool sends all requests as code generation tasks. Complex code tasks stay on the original model to preserve quality, while simpler code tasks may be routed to efficient models for cost savings. For full control over code routing, define custom routing rules to specify exactly which model handles each complexity tier.

Link to section: Example 3: Content PlatformExample 3: Content Platform

A content platform uses AI for various tasks: article summarization, content rewriting, idea brainstorming, and in-depth analysis. Simple summarization and basic rewrites are routed to cheaper models, while complex content generation and analysis stay on premium models. The routing fee is a fraction of the savings, and the platform sees an overall 30-50% reduction in AI costs.

Link to section: Monitoring Routing DecisionsMonitoring Routing Decisions

Every routed request includes detailed metadata in the response headers, giving you full visibility into what happened and why:

  • What was classified: X-LockLLM-Task-Type and X-LockLLM-Complexity
  • What was selected: X-LockLLM-Selected-Model and X-LockLLM-Routing-Reason
  • What it cost: X-LockLLM-Estimated-Original-Cost, X-LockLLM-Estimated-Routed-Cost, and X-LockLLM-Estimated-Savings

Recommended rollout approach:

  1. Enable auto routing in a staging or development environment first
  2. Review routing decisions in your activity logs via the dashboard
  3. Check that task classification matches your expectations for common prompts
  4. If a specific task type is being misclassified, create custom routing rules for that combination
  5. Deploy to production once you are confident in the routing behavior

You can also run auto routing on non-critical endpoints while keeping routing disabled on critical ones, then expand as you build confidence.

Link to section: Setting Up Auto RoutingSetting Up Auto Routing

Link to section: Proxy Mode - JavaScript/TypeScriptProxy Mode - JavaScript/TypeScript

const OpenAI = require('openai')

const openai = new OpenAI({
  apiKey: process.env.LOCKLLM_API_KEY,
  baseURL: 'https://api.lockllm.com/v1/proxy/openai',
  defaultHeaders: {
    'X-LockLLM-Route-Action': 'auto'
  }
})

// User requests GPT-4, but if the task is simple,
// the router selects a cheaper model automatically
const response = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: userPrompt }]
})

Link to section: Proxy Mode - PythonProxy Mode - Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get('LOCKLLM_API_KEY'),
    base_url='https://api.lockllm.com/v1/proxy/openai',
    default_headers={
        'X-LockLLM-Route-Action': 'auto'
    }
)

response = client.chat.completions.create(
    model='gpt-4',
    messages=[{'role': 'user', 'content': user_prompt}]
)

Link to section: LockLLM SDK - JavaScript/TypeScriptLockLLM SDK - JavaScript/TypeScript

import { createOpenAI } from '@lockllm/sdk/wrappers'

const openai = createOpenAI({
  apiKey: process.env.LOCKLLM_API_KEY,
  proxyOptions: {
    routeAction: 'auto'
  }
})

Link to section: LockLLM SDK - PythonLockLLM SDK - Python

from lockllm import create_openai, ProxyOptions

openai = create_openai(
    api_key=os.getenv('LOCKLLM_API_KEY'),
    proxy_options=ProxyOptions(route_action='auto')
)

Link to section: Setting Up Custom Routing RulesSetting Up Custom Routing Rules

Link to section: Step 1: Open the Routing DashboardStep 1: Open the Routing Dashboard

  1. Sign in to your LockLLM dashboard
  2. Go to Proxy > Custom Routing (or visit lockllm.com/route-settings)

Link to section: Step 2: Create a RuleStep 2: Create a Rule

  1. Click Create Rule
  2. Select a task type (e.g., Code Generation)
  3. Select a complexity tier (e.g., High)
  4. Enter the target model name (e.g., claude-sonnet-4-20250514)
  5. Select the provider (e.g., Anthropic)
  6. Toggle Use BYOK if you want to use your own API key for this provider
  7. Click Save

Link to section: Step 3: Enable Custom RoutingStep 3: Enable Custom Routing

Set the routing header to custom in your requests:

const openai = new OpenAI({
  apiKey: process.env.LOCKLLM_API_KEY,
  baseURL: 'https://api.lockllm.com/v1/proxy/openai',
  defaultHeaders: {
    'X-LockLLM-Route-Action': 'custom'
  }
})

Link to section: Example RulesExample Rules

Task TypeComplexityTarget ModelProviderUse BYOK
Code GenerationHighclaude-sonnet-4-20250514AnthropicYes
Code GenerationLowgpt-4o-miniOpenAIYes
SummarizationLowgpt-4o-miniOpenAIYes
Open QAHighgpt-4OpenAIYes
ChatbotLowllama-3-8bGroqYes

Link to section: PricingPricing

Smart routing uses a simple, fair pricing model: you only pay when routing saves you money.

Link to section: Routing Fee StructureRouting Fee Structure

ScenarioFee
Route to cheaper model5% of cost savings
Route to same-cost modelFREE
Route to more expensive modelFREE
No route taken (original model used)FREE
Routing disabledFREE

Link to section: Example: Cost Savings BreakdownExample: Cost Savings Breakdown

Scenario: User requests an expensive model, but the prompt is a simple question.

  • Original model cost estimate: $0.50 per request
  • Router selects cheaper model: $0.05 per request
  • Cost savings: $0.45
  • Routing fee: $0.45 x 5% = $0.0225
  • Total cost: $0.05 + $0.0225 = $0.0725
  • You save: $0.4275 (85% savings after fee)

The routing fee is small compared to the savings, and you never pay a fee when routing doesn't save you money.

Link to section: Routing MetadataRouting Metadata

Every routed request includes metadata in the response headers so you can see exactly what happened:

HeaderDescription
X-LockLLM-Route-Enabled"true" - routing was active for this request
X-LockLLM-Task-TypeDetected task type (e.g., "Code Generation")
X-LockLLM-ComplexityComplexity score (0.0-1.0)
X-LockLLM-Selected-ModelModel chosen by the router
X-LockLLM-Routing-ReasonHuman-readable explanation for the selection
X-LockLLM-Original-ModelYour originally requested model (if changed)
X-LockLLM-Original-ProviderYour original provider (if changed)
X-LockLLM-Estimated-Original-CostWhat the original model would have cost
X-LockLLM-Estimated-Routed-CostWhat the selected model costs
X-LockLLM-Estimated-SavingsEstimated dollar savings
X-LockLLM-Routing-Fee-ReservedRouting fee charged (5% of savings)

These headers give you full visibility into routing decisions for monitoring, debugging, and cost tracking.

Link to section: ConfigurationConfiguration

HeaderValuesDefaultDescription
X-LockLLM-Route-Actiondisabled, auto, customdisabledRouting mode

Link to section: Combining with Other FeaturesCombining with Other Features

Smart routing works alongside all other LockLLM features:

  • Threat detection: Security scanning runs first, routing happens after the prompt is cleared
  • Custom policies: Policy checks are independent of routing decisions
  • PII detection: PII is detected/redacted before routing occurs
  • Prompt compression: Compression is applied after routing decisions are made
  • Abuse detection: Abuse checks are independent of routing

Link to section: LimitationsLimitations

  • Proxy mode only - Routing is available exclusively in proxy mode (/v1/proxy), not in the standalone scan endpoint
  • BYOK requirements - For auto routing to non-OpenRouter providers, you need a BYOK key configured for the target provider. If no key is available, the router falls back to your original model
  • Classification accuracy - Task classification works well for clear, focused prompts. Very short or ambiguous prompts may occasionally be miscategorized, though the router defaults to your original model when uncertain

Link to section: FAQFAQ

Link to section: What if I disagree with a routing decision?What if I disagree with a routing decision?

You have full control. Check the X-LockLLM-Task-Type and X-LockLLM-Complexity response headers to see how your prompt was classified. If routing isn't working well for specific use cases, switch to custom routing mode and define your own rules, or disable routing entirely for those requests.

Link to section: Does routing add latency?Does routing add latency?

Routing adds minimal latency for the classification step. The classification runs in parallel with security scanning, so the overall impact is small. The potential cost savings typically outweigh the minor latency addition.

Link to section: Can I use routing with the universal endpoint?Can I use routing with the universal endpoint?

Yes. When using the universal endpoint (/v1/proxy/chat/completions) with LockLLM credits, routing can select from any available model. This works without BYOK keys since all models are accessed through LockLLM credits.

Link to section: Do custom rules override auto routing?Do custom rules override auto routing?

Yes. When X-LockLLM-Route-Action: custom is set, your custom rules take priority. If no custom rule matches the detected task type and complexity, the system falls back to auto routing logic.

Link to section: Can I route different task types to different providers?Can I route different task types to different providers?

Yes. Custom routing rules support any combination of task type, complexity tier, target model, and provider. You can route code generation to Anthropic, summarization to OpenAI, and chatbot tasks to Groq - all with different BYOK keys.

Link to section: Is there a way to see routing decisions without enabling routing?Is there a way to see routing decisions without enabling routing?

No. Routing metadata is only available when routing is enabled (auto or custom). To test routing without impacting your production traffic, enable routing on a staging or development environment first.

Updated 4 days ago