Python SDK - LockLLM Documentation

Link to section: IntroductionIntroduction

The LockLLM Python SDK is a production-ready library that provides comprehensive AI security for your LLM applications. Built with Python type hints and designed for modern Python development, it offers both synchronous and asynchronous APIs with drop-in replacements for popular AI provider SDKs with automatic prompt injection detection and jailbreak prevention.

Key features:

Real-time security scanning with minimal latency (<250ms)
Dual sync/async API for maximum flexibility
Drop-in replacements for 17+ AI providers (custom endpoint support for each)
Configurable scan modes with custom policy enforcement
AI abuse detection (opt-in) to protect against automated misuse
Smart routing for automatic model selection by task type and complexity
Response caching for cost optimization (enabled by default in proxy mode)
Universal proxy mode supporting 200+ models without provider API keys
PII detection and redaction (names, emails, phone numbers, SSNs, credit cards, and more)
Full type hints with mypy support
Works with Python 3.8 through 3.12
Streaming-compatible with all providers
Context manager support
Free tier available with generous limits

Use cases:

Production LLM applications requiring security
AI agents and autonomous systems
Chatbots and conversational interfaces
RAG (Retrieval Augmented Generation) systems
Multi-tenant AI applications
Enterprise AI deployments
FastAPI, Django, and Flask applications

Link to section: InstallationInstallation

Install the SDK using your preferred package manager:

# pip
pip install lockllm

# pip3
pip3 install lockllm

# poetry
poetry add lockllm

# pipenv
pipenv install lockllm

Requirements:

Python 3.8 or higher (supports 3.8, 3.9, 3.10, 3.11, 3.12)
requests and httpx (installed automatically as dependencies)
typing-extensions (installed automatically for Python < 3.11)

Link to section: Optional DependenciesOptional Dependencies

For provider wrapper functions, install the relevant official SDKs:

# For OpenAI and OpenAI-compatible providers (16 providers)
pip install openai

# For Anthropic Claude
pip install anthropic

Provider SDK mapping:

openai - OpenAI, Groq, DeepSeek, Mistral, Perplexity, OpenRouter, Together AI, xAI, Fireworks, Anyscale, Hugging Face, Gemini, Cohere, Azure, Bedrock, Vertex AI
anthropic - Anthropic Claude

These SDKs are only required if you use the wrapper functions for those providers.

Link to section: Verify InstallationVerify Installation

from lockllm import __version__
print(__version__)  # Prints the installed SDK version

Link to section: Quick StartQuick Start

Link to section: Step 1: Get Your API KeysStep 1: Get Your API Keys

Visit lockllm.com and create a free account
Navigate to API Keys section and copy your LockLLM API key
Go to Proxy Settings and add your provider API keys (OpenAI, Anthropic, etc.)

Your provider keys are encrypted and stored securely. You'll only need your LockLLM API key in your code.

Link to section: Step 2: Basic UsageStep 2: Basic Usage

Choose from four integration methods: wrapper functions (easiest), direct scan API, official SDKs with custom base URL, or the universal proxy wrapper for non-BYOK usage.

Link to section: Wrapper Functions (Recommended)Wrapper Functions (Recommended)

The simplest way to add security - replace your SDK initialization:

Synchronous:

from lockllm import create_openai
import os

# Before:
# from openai import OpenAI
# openai = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# After (one line change):
openai = create_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

# Everything else works exactly the same
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": user_input}]
)

print(response.choices[0].message.content)

Asynchronous:

from lockllm import create_async_openai
import os
import asyncio

async def main():
    openai = create_async_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

    response = await openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": user_input}]
    )

    print(response.choices[0].message.content)

asyncio.run(main())

Link to section: Direct Scan APIDirect Scan API

For manual control and custom workflows:

Synchronous:

from lockllm import LockLLM
import os

lockllm = LockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

# Scan user input before processing
result = lockllm.scan(
    input=user_prompt,
    sensitivity="medium"  # "low" | "medium" | "high"
)

if not result.safe:
    print("Malicious input detected!")
    print(f"Injection score: {result.injection}%")
    print(f"Confidence: {result.confidence}%")
    print(f"Request ID: {result.request_id}")

    # Handle security incident
    return {"error": "Invalid input detected"}

# Safe to proceed
response = your_llm_call(user_prompt)

Asynchronous:

from lockllm import AsyncLockLLM
import os
import asyncio

async def main():
    lockllm = AsyncLockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

    result = await lockllm.scan(
        input=user_prompt,
        sensitivity="medium"
    )

    if not result.safe:
        print(f"Malicious prompt detected: {result.injection}%")
        return

    # Safe to proceed
    response = await your_llm_call(user_prompt)

asyncio.run(main())

See Scan Modes below for advanced scanning options including custom policy checks, abuse detection, PII detection, and smart routing.

Link to section: Official SDKs with ProxyOfficial SDKs with Proxy

Use official SDKs with LockLLM's proxy:

from openai import OpenAI
from lockllm import get_proxy_url
import os

client = OpenAI(
    api_key=os.getenv("LOCKLLM_API_KEY"),
    base_url=get_proxy_url('openai')
)

# Works exactly like the official SDK
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

Link to section: Universal ProxyUniversal Proxy

If you haven't configured provider API keys, use the universal proxy endpoint which uses LockLLM credits. You can browse all supported models and their IDs in the Model List page in your dashboard. When making requests, you must use the exact model ID shown there (e.g., openai/gpt-4).

Using the OpenAI SDK:

from openai import OpenAI
from lockllm import get_universal_proxy_url
import os

client = OpenAI(
    api_key=os.getenv("LOCKLLM_API_KEY"),
    base_url=get_universal_proxy_url()
)

response = client.chat.completions.create(
    model="openai/gpt-4",  # Use the model ID from the Model List page
    messages=[{"role": "user", "content": "Hello!"}]
)

Using LockLLM Wrapper (Recommended):

The SDK provides dedicated wrapper functions for the universal proxy endpoint. These are the simplest way to get started without configuring provider API keys:

Synchronous:

from lockllm import create_client, ProxyOptions
import os

# No BYOK required - uses LockLLM credits
client = create_client(
    api_key=os.getenv("LOCKLLM_API_KEY"),
    proxy_options=ProxyOptions(scan_action="block")
)

response = client.chat.completions.create(
    model="openai/gpt-4",
    messages=[{"role": "user", "content": user_input}]
)

print(response.choices[0].message.content)

Asynchronous:

from lockllm import create_async_client, ProxyOptions
import os
import asyncio

async def main():
    client = create_async_client(
        api_key=os.getenv("LOCKLLM_API_KEY"),
        proxy_options=ProxyOptions(scan_action="block")
    )

    response = await client.chat.completions.create(
        model="openai/gpt-4",
        messages=[{"role": "user", "content": user_input}]
    )

    print(response.choices[0].message.content)

asyncio.run(main())

create_client() and create_async_client() default to the universal proxy URL (https://api.lockllm.com/v1/proxy). You can override this with the base_url parameter.

Link to section: Custom OpenAI-Compatible Endpoint WrapperCustom OpenAI-Compatible Endpoint Wrapper

For custom endpoints that follow the OpenAI API format but are not one of the 17 built-in providers:

Synchronous:

from lockllm import create_openai_compatible, ProxyOptions
import os

# base_url is required for custom endpoints
client = create_openai_compatible(
    api_key=os.getenv("LOCKLLM_API_KEY"),
    base_url="https://api.lockllm.com/v1/proxy/custom",
    proxy_options=ProxyOptions(scan_action="block")
)

response = client.chat.completions.create(
    model="your-custom-model",
    messages=[{"role": "user", "content": user_input}]
)

Asynchronous:

from lockllm import create_async_openai_compatible
import os
import asyncio

async def main():
    client = create_async_openai_compatible(
        api_key=os.getenv("LOCKLLM_API_KEY"),
        base_url="https://api.lockllm.com/v1/proxy/custom"
    )

    response = await client.chat.completions.create(
        model="your-custom-model",
        messages=[{"role": "user", "content": user_input}]
    )

asyncio.run(main())

Unlike provider-specific wrappers, create_openai_compatible() requires a base_url parameter since there is no default endpoint for custom providers.

Link to section: Provider WrappersProvider Wrappers

LockLLM provides drop-in replacements for 17+ AI providers with custom endpoint support. All wrappers work identically to the official SDKs with automatic security scanning.

Link to section: OpenAI (Sync)OpenAI (Sync)

from lockllm import create_openai
import os

openai = create_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

# Chat completions
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": user_input}
    ],
    temperature=0.7,
    max_tokens=1000
)

print(response.choices[0].message.content)

# Streaming
stream = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Count from 1 to 10"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='')

# Function calling
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather in Boston?"}],
    functions=[{
        "name": "get_weather",
        "description": "Get the current weather in a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }]
)

Link to section: OpenAI (Async)OpenAI (Async)

from lockllm import create_async_openai
import os
import asyncio

async def main():
    openai = create_async_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

    # Chat completions
    response = await openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": user_input}]
    )

    # Async streaming
    stream = await openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Write a story"}],
        stream=True
    )

    async for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end='')

asyncio.run(main())

Link to section: Anthropic Claude (Sync)Anthropic Claude (Sync)

from lockllm import create_anthropic
import os

anthropic = create_anthropic(api_key=os.getenv("LOCKLLM_API_KEY"))

# Messages API
message = anthropic.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": user_input}
    ]
)

print(message.content[0].text)

# Streaming
with anthropic.messages.stream(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a poem"}]
) as stream:
    for text in stream.text_stream:
        print(text, end='', flush=True)

Link to section: Anthropic Claude (Async)Anthropic Claude (Async)

from lockllm import create_async_anthropic
import os
import asyncio

async def main():
    anthropic = create_async_anthropic(api_key=os.getenv("LOCKLLM_API_KEY"))

    # Async messages
    message = await anthropic.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": user_input}]
    )

    # Async streaming
    async with anthropic.messages.stream(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Write a poem"}]
    ) as stream:
        async for text in stream.text_stream:
            print(text, end='', flush=True)

asyncio.run(main())

Link to section: Groq, DeepSeek, PerplexityGroq, DeepSeek, Perplexity

from lockllm import create_groq, create_deepseek, create_perplexity
import os

# Groq - Fast inference with Llama models
groq = create_groq(api_key=os.getenv("LOCKLLM_API_KEY"))
response = groq.chat.completions.create(
    model='llama-3.1-70b-versatile',
    messages=[{'role': 'user', 'content': user_input}]
)

# DeepSeek - Advanced reasoning models
deepseek = create_deepseek(api_key=os.getenv("LOCKLLM_API_KEY"))
response = deepseek.chat.completions.create(
    model='deepseek-chat',
    messages=[{'role': 'user', 'content': user_input}]
)

# Perplexity - Models with internet access
perplexity = create_perplexity(api_key=os.getenv("LOCKLLM_API_KEY"))
response = perplexity.chat.completions.create(
    model='llama-3.1-sonar-huge-128k-online',
    messages=[{'role': 'user', 'content': user_input}]
)

Link to section: All Supported ProvidersAll Supported Providers

LockLLM supports 17+ providers with ready-to-use wrappers. All providers support custom endpoint URLs configured via the dashboard.

Import any wrapper function:

Synchronous wrappers:

from lockllm import (
    create_openai,        # OpenAI GPT models
    create_anthropic,     # Anthropic Claude
    create_groq,          # Groq LPU inference
    create_deepseek,      # DeepSeek models
    create_perplexity,    # Perplexity (with internet)
    create_mistral,       # Mistral AI
    create_openrouter,    # OpenRouter (multi-provider)
    create_together,      # Together AI
    create_xai,           # xAI Grok
    create_fireworks,     # Fireworks AI
    create_anyscale,      # Anyscale Endpoints
    create_huggingface,   # Hugging Face Inference
    create_gemini,        # Google Gemini
    create_cohere,        # Cohere
    create_azure,         # Azure OpenAI
    create_bedrock,       # AWS Bedrock
    create_vertex_ai      # Google Vertex AI
)

Asynchronous wrappers:

from lockllm import (
    create_async_openai,
    create_async_anthropic,
    create_async_groq,
    create_async_deepseek,
    create_async_perplexity,
    create_async_mistral,
    create_async_openrouter,
    create_async_together,
    create_async_xai,
    create_async_fireworks,
    create_async_anyscale,
    create_async_huggingface,
    create_async_gemini,
    create_async_cohere,
    create_async_azure,
    create_async_bedrock,
    create_async_vertex_ai
)

Provider compatibility:

16 providers use OpenAI-compatible API (require openai package)
Anthropic uses its own SDK (requires anthropic)
All providers support custom endpoint URLs via dashboard

Link to section: Configuring Proxy OptionsConfiguring Proxy Options

All wrapper functions accept a proxy_options parameter to configure scanning, routing, and caching behavior for every request made through that client:

from lockllm import create_openai, ProxyOptions
import os

# Configure scanning and routing for all requests through this client
openai = create_openai(
    api_key=os.getenv("LOCKLLM_API_KEY"),
    proxy_options=ProxyOptions(
        scan_action="block",          # Block malicious requests
        policy_action="block",        # Block policy violations
        route_action="auto",          # Enable smart routing
        sensitivity="high"            # Maximum protection
    )
)

# All requests through this client use the configured options
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": user_input}]
)

This works with all sync and async wrappers. See Proxy Options for the full list of configurable fields.

Link to section: ConfigurationConfiguration

Link to section: LockLLM Client ConfigurationLockLLM Client Configuration

Synchronous:

from lockllm import LockLLM, LockLLMConfig
import os

config = LockLLMConfig(
    api_key=os.getenv("LOCKLLM_API_KEY"),  # Required
    base_url="https://api.lockllm.com",     # Optional: custom endpoint
    timeout=60.0,                            # Optional: request timeout (seconds)
    max_retries=3                            # Optional: max retry attempts
)

lockllm = LockLLM(api_key=config.api_key, base_url=config.base_url, timeout=config.timeout)

Asynchronous:

from lockllm import AsyncLockLLM
import os

lockllm = AsyncLockLLM(
    api_key=os.getenv("LOCKLLM_API_KEY"),
    base_url="https://api.lockllm.com",
    timeout=60.0,
    max_retries=3
)

Link to section: Sensitivity LevelsSensitivity Levels

Control detection strictness with the sensitivity parameter:

from lockllm import LockLLM
import os

lockllm = LockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

# Low sensitivity - fewer false positives
# Use for: creative applications, exploratory use cases
low_result = lockllm.scan(input=user_prompt, sensitivity="low")

# Medium sensitivity - balanced detection - DEFAULT
# Use for: general user inputs, standard applications
medium_result = lockllm.scan(input=user_prompt, sensitivity="medium")

# High sensitivity - maximum protection
# Use for: sensitive operations, admin panels, data exports
high_result = lockllm.scan(input=user_prompt, sensitivity="high")

Choosing sensitivity:

High: Critical systems (admin, payments, sensitive data)
Medium: General applications (default, recommended)
Low: Creative tools (writing assistants, brainstorming)

Link to section: Scan ModesScan Modes

Control which security checks are performed with the scan_mode parameter:

from lockllm import LockLLM
import os

lockllm = LockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

# Normal mode - core injection detection only
result = lockllm.scan(input=user_prompt, scan_mode="normal")

# Policy-only mode - custom content policies only (skips core injection scan)
result = lockllm.scan(input=user_prompt, scan_mode="policy_only")

# Combined mode - both core injection and custom policies (default, maximum security)
result = lockllm.scan(input=user_prompt, scan_mode="combined")

Available modes:

"normal" - Core injection detection only (prompt injection, jailbreak, instruction override, etc.)
"policy_only" - Custom content policies only (checks your policies configured in the dashboard)
"combined" - Both core injection and custom policies (default, recommended for maximum protection)

Link to section: Scan ActionsScan Actions

Control what happens when threats or violations are detected:

# Block mode - raises an error, request is stopped
result = lockllm.scan(
    input=user_prompt,
    scan_action="block",       # Block on core injection
    policy_action="block"      # Block on policy violations
)

# Allow with warning mode - request proceeds, warnings included in response (default)
result = lockllm.scan(
    input=user_prompt,
    scan_action="allow_with_warning",
    policy_action="allow_with_warning"
)

Available actions:

"block" - Raises PromptInjectionError or PolicyViolationError when detected
"allow_with_warning" - Request proceeds, warning details included in the response (default)

Link to section: Abuse DetectionAbuse Detection

Opt-in abuse detection protects against automated misuse and resource exhaustion:

# Enable abuse detection
result = lockllm.scan(
    input=user_prompt,
    abuse_action="block"              # Block abusive requests
)

# Or allow with warnings
result = lockllm.scan(
    input=user_prompt,
    abuse_action="allow_with_warning" # Allow but flag abusive requests
)

Abuse detection is disabled by default. Set abuse_action to enable it.

Link to section: PII DetectionPII Detection

Opt-in PII (Personally Identifiable Information) detection scans prompts for sensitive data before processing. When enabled, it identifies entity types such as names, email addresses, phone numbers, Social Security numbers, credit card numbers, street addresses, dates of birth, driver's license numbers, and more.

Scan API usage:

from lockllm import LockLLM
import os

lockllm = LockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

# Strip mode - replaces PII with placeholders
result = lockllm.scan(
    input="My name is John Smith and my email is [email protected]",
    pii_action="strip"
)

if result.pii_result and result.pii_result.detected:
    print(f"PII detected: {result.pii_result.entity_count} entities")
    print(f"Entity types: {', '.join(result.pii_result.entity_types)}")
    print(f"Redacted text: {result.pii_result.redacted_input}")

# Block mode - raises PIIDetectedError if PII is found
result = lockllm.scan(
    input="Call me at 555-0123",
    pii_action="block"
)

# Allow with warning mode - request proceeds, PII info included in response
result = lockllm.scan(
    input="My SSN is 123-45-6789",
    pii_action="allow_with_warning"
)

if result.pii_result and result.pii_result.detected:
    print(f"Warning: {result.pii_result.entity_count} PII entities found")
    print(f"Types: {', '.join(result.pii_result.entity_types)}")

Async usage:

from lockllm import AsyncLockLLM
import os
import asyncio

async def main():
    lockllm = AsyncLockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

    result = await lockllm.scan(
        input="My credit card is 4111-1111-1111-1111",
        pii_action="strip"
    )

    if result.pii_result and result.pii_result.detected:
        print(f"Redacted: {result.pii_result.redacted_input}")

asyncio.run(main())

Proxy mode usage:

from lockllm import create_openai, ProxyOptions
import os

openai = create_openai(
    api_key=os.getenv("LOCKLLM_API_KEY"),
    proxy_options=ProxyOptions(
        scan_action="block",
        pii_action="strip"   # Strip PII before sending to provider
    )
)

# PII is automatically stripped from the prompt before reaching the AI provider
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": user_input}]
)

Available PII actions:

"strip" - Detects PII and replaces identified entities with [TYPE] placeholders (e.g., [Email], [Phone Number]) before forwarding to the AI provider (proxy mode) or returns redacted text in scan response
"block" - Blocks the request entirely if PII is detected, raising PIIDetectedError
"allow_with_warning" - Allows the request through but includes PII detection results in the response
None (default) - PII detection is disabled

Supported entity types:

Account Number
Building Number
City
Credit Card Number
Date of Birth
Driver's License Number
Email Address
First Name
Last Name
ID Card Number
Password
Phone Number
Social Security Number
Street Address
Tax ID Number
Username
Zip Code

PII detection is disabled by default. Set pii_action to enable it.

Link to section: Prompt CompressionPrompt Compression

Opt-in prompt compression reduces token count before sending prompts to AI providers. Three compression methods are available: TOON for structured JSON data, Compact for general text, and Combined for maximum compression.

Scan API usage:

from lockllm import LockLLM
import os

lockllm = LockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

# TOON - converts JSON to compact notation (free)
result = lockllm.scan(
    input='{"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}',
    compression="toon"
)

if result.compression_result:
    print(f"Method: {result.compression_result.method}")
    print(f"Original: {result.compression_result.original_length} chars")
    print(f"Compressed: {result.compression_result.compressed_length} chars")
    print(f"Ratio: {result.compression_result.compression_ratio:.2f}")
    print(f"Compressed text: {result.compression_result.compressed_input}")

# Compact - ML-based compression for any text ($0.0001/use)
result = lockllm.scan(
    input="A long prompt with detailed instructions that could be compressed...",
    compression="compact",
    compression_rate=0.5  # Optional: 0.3-0.7 (default 0.5, lower = more aggressive)
)

if result.compression_result:
    print(f"Compressed to {result.compression_result.compression_ratio:.0%} of original")
    print(f"Compressed text: {result.compression_result.compressed_input}")

# Combined - TOON then ML-based compression ($0.0001/use, maximum compression)
# Best for JSON data: applies TOON first, then ML compression on the result
result = lockllm.scan(
    input='{"data": [{"id": 1, "value": "long text content here..."}, {"id": 2, "value": "more content"}]}',
    compression="combined",
    compression_rate=0.5  # Optional: controls the ML compression stage
)

if result.compression_result:
    print(f"Compressed to {result.compression_result.compression_ratio:.0%} of original")
    print(f"Compressed text: {result.compression_result.compressed_input}")

Async usage:

from lockllm import AsyncLockLLM
import os
import asyncio

async def main():
    lockllm = AsyncLockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

    result = await lockllm.scan(
        input='{"data": [{"key": "value1"}, {"key": "value2"}]}',
        compression="toon"
    )

    if result.compression_result:
        print(f"Compressed: {result.compression_result.compressed_input}")

asyncio.run(main())

Proxy mode usage:

from lockllm import create_openai, ProxyOptions
import os

openai = create_openai(
    api_key=os.getenv("LOCKLLM_API_KEY"),
    proxy_options=ProxyOptions(
        scan_action="block",
        compression="toon"  # Compress JSON prompts before sending to provider
    )
)

# JSON content in prompts is automatically compressed before reaching the AI provider
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": json_prompt}]
)

# For ML-based compression with custom rate
openai_compact = create_openai(
    api_key=os.getenv("LOCKLLM_API_KEY"),
    proxy_options=ProxyOptions(
        compression="compact",
        compression_rate=0.4  # More aggressive compression
    )
)

Available compression methods:

"toon" - JSON-to-compact notation (free). Converts structured JSON to a token-efficient format with 30-60% token savings. Non-JSON input is returned unchanged - it will not error or crash on free text.
"compact" - ML-based compression ($0.0001/use). Works on any text type. Uses token-level classification to remove non-essential tokens while preserving meaning. Configurable compression rate (0.3-0.7, default 0.5).
"combined" - Maximum compression ($0.0001/use). Applies TOON first, then runs ML-based Compact on the result. Non-JSON input skips the TOON stage and goes directly to ML compression. Best when you want maximum token reduction.
None (default) - Compression is disabled

Compression rate (compact and combined methods):

0.3 - Most aggressive compression (removes more tokens)
0.5 - Balanced compression (default)
0.7 - Conservative compression (preserves more tokens)

Compression is disabled by default. Set compression to enable it.

Link to section: Chunked ScanningChunked Scanning

For long prompts, enable chunked scanning to process input in segments:

result = lockllm.scan(
    input=long_prompt,
    chunk=True    # Enable chunked scanning for long inputs
)

Link to section: Reusable Scan ConfigurationReusable Scan Configuration

Use ScanOptions to create reusable scan configurations:

from lockllm import LockLLM, ScanOptions
import os

lockllm = LockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

# Create reusable scan configuration
opts = ScanOptions(
    scan_mode="combined",
    scan_action="block",
    policy_action="block",
    abuse_action="allow_with_warning",
    pii_action="strip"
)

# Use the same options for multiple scans
result1 = lockllm.scan(input=prompt1, scan_options=opts)
result2 = lockllm.scan(input=prompt2, scan_options=opts)

# Override individual options when needed (takes precedence over ScanOptions)
result3 = lockllm.scan(input=prompt3, scan_options=opts, sensitivity="high")

Link to section: Proxy OptionsProxy Options

When using wrapper functions, configure proxy behavior with ProxyOptions:

from lockllm import ProxyOptions

options = ProxyOptions(
    scan_mode="combined",         # "normal" | "policy_only" | "combined"
    scan_action="block",          # "block" | "allow_with_warning"
    policy_action="block",        # "block" | "allow_with_warning"
    abuse_action="block",         # "block" | "allow_with_warning" | None (disabled)
    route_action="auto",          # "disabled" | "auto" | "custom"
    sensitivity="medium",         # "low" | "medium" | "high"
    cache_response=True,          # Enable/disable response caching
    cache_ttl=3600,               # Cache TTL in seconds (max 86400)
    chunk=None,                   # Enable/disable chunked scanning
    pii_action="strip"            # "strip" | "block" | "allow_with_warning" | None (disabled)
)

Fields:

scan_mode - Which security checks to run (see Scan Modes)
scan_action - Action on core injection detection
policy_action - Action on policy violations
abuse_action - Action on abuse detection (disabled if None)
route_action - Smart routing mode (see below)
sensitivity - Detection sensitivity level
cache_response - Enable response caching to reduce costs and latency (enabled by default, streaming requests are not cached)
cache_ttl - Cache time-to-live in seconds, max 86400 (24 hours)
chunk - Enable chunked scanning for long inputs
pii_action - PII detection behavior (disabled if None). See PII Detection for details and supported entity types

Smart routing: Automatically selects the best AI model based on task type and complexity to optimize cost and quality. Available only in proxy mode.

"disabled" - No routing, use the model you specified (default)
"auto" - Automatic routing based on task type and complexity analysis
"custom" - Use your custom routing rules configured in the dashboard

Note: ProxyOptions is for wrapper functions (create_openai, etc.) and direct proxy usage. For the scan API, use ScanOptions or individual parameters instead.

Link to section: Custom EndpointsCustom Endpoints

All providers support custom endpoint URLs for:

Self-hosted LLM deployments (OpenAI-compatible APIs)
Azure OpenAI resources with custom endpoints
Alternative API gateways and reverse proxies
Private cloud or air-gapped deployments
Development and staging environments

How it works: Configure custom endpoints in the LockLLM dashboard when adding any provider API key. The SDK wrappers automatically use your custom endpoint URL.

# The wrapper automatically uses your custom endpoint
azure = create_azure(api_key=os.getenv("LOCKLLM_API_KEY"))

# Your custom Azure endpoint is configured in the dashboard:
# - Endpoint: https://your-resource.openai.azure.com
# - Deployment: gpt-4
# - API Version: 2024-10-21

Example - Self-hosted model: If you have a self-hosted model with an OpenAI-compatible API, configure it in the dashboard using one of the OpenAI-compatible provider wrappers (e.g., OpenAI, Groq) with your custom endpoint URL.

# Use OpenAI wrapper with custom endpoint configured in dashboard
openai = create_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

# Dashboard configuration:
# - Provider: OpenAI
# - Custom Endpoint: https://your-self-hosted-llm.com/v1
# - API Key: your-model-api-key

Link to section: Request OptionsRequest Options

Override configuration per-request:

# Per-request timeout
result = lockllm.scan(
    input=user_prompt,
    sensitivity="high",
    timeout=30.0  # 30 second timeout for this request
)

Link to section: Advanced FeaturesAdvanced Features

Link to section: Streaming ResponsesStreaming Responses

All provider wrappers support streaming:

Synchronous streaming:

openai = create_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

stream = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='')

Asynchronous streaming:

import asyncio

async def main():
    openai = create_async_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

    stream = await openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Write a story"}],
        stream=True
    )

    async for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end='')

asyncio.run(main())

Link to section: Function CallingFunction Calling

OpenAI function calling works seamlessly:

openai = create_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather in Boston?"}],
    functions=[{
        "name": "get_weather",
        "description": "Get the current weather in a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }],
    function_call="auto"
)

if response.choices[0].message.function_call:
    function_call = response.choices[0].message.function_call
    import json
    args = json.loads(function_call.arguments)

    # Call your function with the parsed arguments
    weather = get_weather(args['location'], args.get('unit', 'celsius'))

    # Send function result back to LLM
    final_response = openai.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "user", "content": "What's the weather in Boston?"},
            response.choices[0].message,
            {"role": "function", "name": "get_weather", "content": json.dumps(weather)}
        ]
    )

Link to section: Multi-Turn ConversationsMulti-Turn Conversations

Maintain conversation context with message history:

openai = create_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
]

response = openai.chat.completions.create(
    model="gpt-4",
    messages=messages
)

# Add assistant response to history
messages.append({
    "role": "assistant",
    "content": response.choices[0].message.content
})

# Continue conversation
messages.append({
    "role": "user",
    "content": "What is its population?"
})

response = openai.chat.completions.create(
    model="gpt-4",
    messages=messages
)

Link to section: Response Metadata (Proxy Mode)Response Metadata (Proxy Mode)

When using wrapper functions or the proxy directly, you can extract detailed scan and routing metadata from response headers:

from lockllm import parse_proxy_metadata, decode_detail_field

# After a proxy request using official SDKs with proxy base URL,
# parse metadata from response headers
metadata = parse_proxy_metadata(dict(response.headers))

# Core info
print(f"Request ID: {metadata.request_id}")
print(f"Scanned: {metadata.scanned}")
print(f"Safe: {metadata.safe}")
print(f"Provider: {metadata.provider}")
print(f"Model: {metadata.model}")
print(f"Sensitivity: {metadata.sensitivity}")

# Check if request was blocked
if metadata.blocked:
    print("Request was blocked by LockLLM")

# Check for scan warnings
if metadata.scan_warning:
    print(f"Injection score: {metadata.scan_warning.injection_score}")
    print(f"Confidence: {metadata.scan_warning.confidence}")
    # Decode detailed scan info
    detail = decode_detail_field(metadata.scan_warning.detail)
    if detail:
        print(f"Scan detail: {detail}")

# Check for policy warnings
if metadata.policy_warnings:
    print(f"Policy violations: {metadata.policy_warnings.count}")
    print(f"Policy confidence: {metadata.policy_warnings.confidence}")
    # Decode detailed policy violation info
    detail = decode_detail_field(metadata.policy_warnings.detail)
    if detail:
        print(f"Policy detail: {detail}")

# Check abuse detection
if metadata.abuse_detected:
    print(f"Abuse confidence: {metadata.abuse_detected.confidence}")
    print(f"Abuse types: {metadata.abuse_detected.types}")

# Check routing info
if metadata.routing:
    print(f"Routed to: {metadata.routing.selected_model}")
    print(f"Original model: {metadata.routing.original_model}")
    print(f"Task type: {metadata.routing.task_type}")
    print(f"Complexity: {metadata.routing.complexity}")
    print(f"Estimated savings: ${metadata.routing.estimated_savings}")

# Check PII detection results
if metadata.pii_detected:
    print(f"PII detected: {metadata.pii_detected.detected}")
    print(f"Entity types: {metadata.pii_detected.entity_types}")
    print(f"Entity count: {metadata.pii_detected.entity_count}")
    print(f"Action taken: {metadata.pii_detected.action}")

# Check credit usage
if metadata.credits_deducted is not None:
    print(f"Credits charged: ${metadata.credits_deducted}")
    print(f"Balance remaining: ${metadata.balance_after}")

# Check cache status
if metadata.cache_status == "HIT":
    print(f"Cache hit - saved {metadata.tokens_saved} tokens")
    print(f"Cost saved: ${metadata.cost_saved}")

Link to section: FastAPI IntegrationFastAPI Integration

Integrate with FastAPI for automatic request scanning:

from fastapi import FastAPI, HTTPException, Depends
from lockllm import AsyncLockLLM
from pydantic import BaseModel
import os

app = FastAPI()
lockllm = AsyncLockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

class ChatRequest(BaseModel):
    prompt: str

async def scan_prompt(request: ChatRequest):
    """Dependency to scan prompts"""
    result = await lockllm.scan(
        input=request.prompt,
        sensitivity="medium"
    )

    if not result.safe:
        raise HTTPException(
            status_code=400,
            detail={
                "error": "Malicious input detected",
                "injection": result.injection,
                "confidence": result.confidence,
                "request_id": result.request_id
            }
        )

    return result

@app.post("/chat")
async def chat(
    request: ChatRequest,
    scan_result = Depends(scan_prompt)
):
    # Request already scanned by dependency
    openai = create_async_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

    response = await openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": request.prompt}]
    )

    return {
        "response": response.choices[0].message.content,
        "scan_result": {
            "safe": scan_result.safe,
            "request_id": scan_result.request_id
        }
    }

Link to section: Django IntegrationDjango Integration

Integrate with Django middleware:

# middleware.py
from lockllm import LockLLM
import os
import json

class LockLLMMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response
        self.lockllm = LockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

    def __call__(self, request):
        if request.method == 'POST' and request.content_type == 'application/json':
            try:
                body = json.loads(request.body)
                if 'prompt' in body:
                    result = self.lockllm.scan(
                        input=body['prompt'],
                        sensitivity="medium"
                    )

                    if not result.safe:
                        from django.http import JsonResponse
                        return JsonResponse({
                            'error': 'Malicious input detected',
                            'details': {
                                'injection': result.injection,
                                'confidence': result.confidence,
                                'request_id': result.request_id
                            }
                        }, status=400)

                    # Attach scan result to request
                    request.scan_result = result
            except Exception:
                pass

        response = self.get_response(request)
        return response

# settings.py
MIDDLEWARE = [
    # ... other middleware
    'yourapp.middleware.LockLLMMiddleware',
]

Link to section: Flask IntegrationFlask Integration

from flask import Flask, request, jsonify
from lockllm import LockLLM
import os

app = Flask(__name__)
lockllm = LockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

@app.before_request
def scan_request():
    if request.method == 'POST' and request.is_json:
        data = request.get_json()
        if 'prompt' in data:
            result = lockllm.scan(
                input=data['prompt'],
                sensitivity="medium"
            )

            if not result.safe:
                return jsonify({
                    'error': 'Malicious input detected',
                    'details': {
                        'injection': result.injection,
                        'confidence': result.confidence,
                        'request_id': result.request_id
                    }
                }), 400

            # Store scan result for the request
            request.scan_result = result

@app.route('/chat', methods=['POST'])
def chat():
    # Request already scanned by before_request
    data = request.get_json()

    openai = create_openai(api_key=os.getenv("LOCKLLM_API_KEY"))
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": data['prompt']}]
    )

    return jsonify({
        'response': response.choices[0].message.content
    })

if __name__ == '__main__':
    app.run()

Link to section: Batch ProcessingBatch Processing

Process multiple requests concurrently with asyncio:

import asyncio
from lockllm import create_async_openai
import os

async def process_prompt(openai, prompt):
    """Process a single prompt"""
    try:
        response = await openai.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

async def batch_process(prompts):
    """Process multiple prompts concurrently"""
    openai = create_async_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

    # Process all prompts concurrently
    tasks = [process_prompt(openai, prompt) for prompt in prompts]
    results = await asyncio.gather(*tasks)

    return results

# Usage
prompts = [
    "What is AI?",
    "Explain machine learning",
    "What is deep learning?"
]

results = asyncio.run(batch_process(prompts))
for i, result in enumerate(results):
    print(f"Prompt {i+1}: {result}\n")

Link to section: Context Manager SupportContext Manager Support

Use context managers for automatic resource cleanup:

Synchronous:

from lockllm import LockLLM
import os

# Context manager ensures proper cleanup
with LockLLM(api_key=os.getenv("LOCKLLM_API_KEY")) as client:
    result = client.scan(input="test prompt")
    print(f"Safe: {result.safe}")

Asynchronous:

from lockllm import AsyncLockLLM
import os
import asyncio

async def main():
    # Async context manager
    async with AsyncLockLLM(api_key=os.getenv("LOCKLLM_API_KEY")) as client:
        result = await client.scan(input="test prompt")
        print(f"Safe: {result.safe}")

asyncio.run(main())

Link to section: Custom Policy EnforcementCustom Policy Enforcement

LockLLM lets you upload custom content policies through the dashboard and enforce them at runtime via policy_action. This allows you to block or flag responses that violate your application-specific rules in addition to the built-in safety checks.

Scan API - enforce custom policies with scan mode:

Synchronous:

from lockllm import LockLLM, PolicyViolationError
import os

lockllm = LockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

try:
    result = lockllm.scan(
        input=user_prompt,
        scan_mode="combined",      # Run both injection and policy checks
        scan_action="block",       # Block detected injection attempts
        policy_action="block",     # Block custom policy violations
        sensitivity="medium"
    )

    if result.safe:
        # No issues found - safe to proceed
        response = your_llm_call(user_prompt)
    else:
        # scan_action="allow_with_warning" would reach here instead of raising
        print(f"Warning: potential threat detected")

except PolicyViolationError as e:
    # Raised when policy_action="block" and a violation is detected
    print(f"Custom policy violated: {e.message}")
    if e.violated_policies:
        for policy in e.violated_policies:
            print(f"  Policy: {policy.get('policy_name')}")
            for category in policy.get('violated_categories', []):
                print(f"    Category: {category.get('name')}")
    return {"error": "Request blocked by content policy"}

Asynchronous:

from lockllm import AsyncLockLLM, PolicyViolationError
import os
import asyncio

async def main():
    lockllm = AsyncLockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

    try:
        result = await lockllm.scan(
            input=user_prompt,
            scan_mode="combined",
            scan_action="block",
            policy_action="block",
            sensitivity="medium"
        )
        response = await your_llm_call(user_prompt)

    except PolicyViolationError as e:
        print(f"Policy violation: {e.message}")

asyncio.run(main())

Proxy mode - enforce policies on all requests transparently:

from lockllm import create_openai, ProxyOptions
import os

# All requests through this client are checked against your custom policies
openai = create_openai(
    api_key=os.getenv("LOCKLLM_API_KEY"),
    proxy_options=ProxyOptions(
        scan_mode="combined",
        scan_action="block",
        policy_action="block",
        sensitivity="medium"
    )
)

# PolicyViolationError raised automatically if a violation is detected
from lockllm import PolicyViolationError

try:
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": user_input}]
    )
    print(response.choices[0].message.content)

except PolicyViolationError as e:
    print(f"Blocked by policy: {e.message}")

You can also use scan_mode="policy_only" if you only want to check custom policies without running the core injection scan.

Link to section: Smart RoutingSmart Routing

Smart routing automatically selects the optimal AI model for each request based on the detected task type and prompt complexity. This helps reduce costs by routing simple tasks to lighter models while ensuring complex tasks use more capable ones.

Routing is only available in proxy mode and is configured via route_action in ProxyOptions.

Auto routing (recommended):

Synchronous:

from lockllm import create_openai, ProxyOptions
import os

openai = create_openai(
    api_key=os.getenv("LOCKLLM_API_KEY"),
    proxy_options=ProxyOptions(
        route_action="auto",    # Automatic task-based model selection
        scan_action="block"
    )
)

response = openai.chat.completions.create(
    model="gpt-4",  # Starting point; router may select a different model
    messages=[{"role": "user", "content": user_input}]
)

print(response.choices[0].message.content)

Asynchronous:

from lockllm import create_async_openai, ProxyOptions
import os
import asyncio

async def main():
    openai = create_async_openai(
        api_key=os.getenv("LOCKLLM_API_KEY"),
        proxy_options=ProxyOptions(route_action="auto")
    )

    response = await openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": user_input}]
    )

    print(response.choices[0].message.content)

asyncio.run(main())

Custom routing rules:

Set route_action="custom" to apply your own routing rules configured in the dashboard. Custom rules let you map specific task types and complexity tiers to models of your choosing. If no matching rule is found, the router falls back to auto mode.

from lockllm import create_openai, ProxyOptions
import os

openai = create_openai(
    api_key=os.getenv("LOCKLLM_API_KEY"),
    proxy_options=ProxyOptions(route_action="custom")
)

Accessing routing metadata from the scan API:

When you use scan() directly, the ScanResponse.routing field contains routing metadata if routing was applied:

from lockllm import LockLLM
import os

lockllm = LockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

result = lockllm.scan(
    input=user_prompt,
    sensitivity="medium"
)

if result.routing and result.routing.enabled:
    print(f"Task type: {result.routing.task_type}")
    print(f"Complexity: {result.routing.complexity:.2f}")
    print(f"Selected model: {result.routing.selected_model}")
    print(f"Routing reason: {result.routing.reasoning}")
    print(f"Estimated cost: ${result.routing.estimated_cost}")

For proxy mode, use parse_proxy_metadata() to read routing information from response headers (see Response Metadata).

Link to section: Response CachingResponse Caching

Response caching is enabled by default in proxy mode. When an identical request is received within the cache window, LockLLM returns the cached response directly, saving tokens and reducing cost. Streaming requests are never cached.

Default behavior (caching on):

from lockllm import create_openai
import os

# Caching is enabled by default - no configuration needed
openai = create_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": user_input}]
)

Custom cache TTL:

from lockllm import create_openai, ProxyOptions
import os

openai = create_openai(
    api_key=os.getenv("LOCKLLM_API_KEY"),
    proxy_options=ProxyOptions(
        cache_response=True,
        cache_ttl=1800  # Cache responses for 30 minutes (max: 86400 seconds)
    )
)

Disable caching:

from lockllm import create_openai, ProxyOptions
import os

openai = create_openai(
    api_key=os.getenv("LOCKLLM_API_KEY"),
    proxy_options=ProxyOptions(cache_response=False)
)

Checking cache status from response headers:

from lockllm import create_openai, parse_proxy_metadata
import os

openai = create_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": user_input}]
)

metadata = parse_proxy_metadata(dict(response.headers))

if metadata.cache_status == "HIT":
    print(f"Cache hit - response served from cache")
    print(f"Cache age: {metadata.cache_age}s")
    print(f"Tokens saved: {metadata.tokens_saved}")
    print(f"Cost saved: ${metadata.cost_saved:.6f}")
else:
    print("Cache miss - response generated fresh")

Link to section: Error HandlingError Handling

LockLLM provides typed exceptions for comprehensive error handling:

Link to section: Error TypesError Types

from lockllm import (
    LockLLMError,             # Base error class
    AuthenticationError,      # 401 - Invalid API key
    RateLimitError,           # 429 - Rate limit exceeded
    PromptInjectionError,     # 400 - Malicious input detected
    PolicyViolationError,     # 403 - Custom policy violation
    AbuseDetectedError,       # 400 - Abuse pattern detected
    PIIDetectedError,         # 403 - PII detected (when pii_action is "block")
    InsufficientCreditsError, # 402 - Insufficient credits
    UpstreamError,            # 502 - Provider API error
    ConfigurationError,       # 400 - Invalid configuration
    NetworkError              # 0 - Network/connection error
)

Link to section: Complete Error HandlingComplete Error Handling

from lockllm import create_openai
from lockllm import (
    PromptInjectionError,
    PolicyViolationError,
    AbuseDetectedError,
    PIIDetectedError,
    InsufficientCreditsError,
    AuthenticationError,
    RateLimitError,
    UpstreamError,
    ConfigurationError,
    NetworkError
)
import os

openai = create_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

try:
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": user_input}]
    )

    print(response.choices[0].message.content)

except PromptInjectionError as error:
    # Security threat detected
    print("Malicious input detected!")
    print(f"Injection score: {error.scan_result.injection}%")
    print(f"Confidence: {error.scan_result.confidence}%")
    print(f"Request ID: {error.request_id}")

    # Log security incident
    import logging
    logging.warning(f"Prompt injection blocked: {error.request_id}")

    # Return user-friendly error
    return {"error": "Your input could not be processed for security reasons."}

except PolicyViolationError as error:
    # Custom content policy violation
    print("Content policy violation detected!")
    print(f"Request ID: {error.request_id}")
    if error.violated_policies:
        for policy in error.violated_policies:
            print(f"  Policy: {policy.get('policy_name')}")

except AbuseDetectedError as error:
    # Abuse pattern detected (bot content, repetition, resource exhaustion)
    print("Abuse detected!")
    print(f"Request ID: {error.request_id}")
    if error.abuse_details:
        print(f"  Confidence: {error.abuse_details.get('confidence')}%")

except PIIDetectedError as error:
    # Personally identifiable information detected (when pii_action is "block")
    print("PII detected in input!")
    print(f"Request ID: {error.request_id}")
    if error.entity_types:
        print(f"  Entity types: {', '.join(error.entity_types)}")
    print(f"  Entity count: {error.entity_count}")

    # Return user-friendly error
    return {"error": "Your input contains personal information that cannot be processed."}

except InsufficientCreditsError as error:
    # Not enough credits for the request
    print("Insufficient credits!")
    if error.current_balance is not None:
        print(f"  Current balance: ${error.current_balance}")
    if error.estimated_cost is not None:
        print(f"  Estimated cost: ${error.estimated_cost}")
    print("Top up at: https://www.lockllm.com/billing")

except AuthenticationError as error:
    print("Invalid API key")
    # Check your LOCKLLM_API_KEY environment variable

except RateLimitError as error:
    print("Rate limit exceeded")
    print(f"Retry after (ms): {error.retry_after}")

    # Wait and retry
    if error.retry_after:
        import time
        time.sleep(error.retry_after / 1000)
        # Retry request...

except UpstreamError as error:
    print("Provider API error")
    print(f"Provider: {error.provider}")
    print(f"Status: {error.upstream_status}")
    print(f"Message: {error.message}")

    # Handle provider-specific errors
    if error.provider == 'openai' and error.upstream_status == 429:
        # OpenAI rate limit
        pass

except ConfigurationError as error:
    print(f"Configuration error: {error.message}")
    # Check provider key is added in dashboard

except NetworkError as error:
    print(f"Network error: {error.message}")
    # Check internet connection, firewall, etc.

Link to section: Exponential BackoffExponential Backoff

Implement exponential backoff for transient errors:

import time
from lockllm import RateLimitError, NetworkError

def call_with_backoff(fn, max_retries=5):
    """Call function with exponential backoff"""
    for attempt in range(max_retries):
        try:
            return fn()
        except (RateLimitError, NetworkError) as error:
            if attempt == max_retries - 1:
                raise

            delay = min(1.0 * (2 ** attempt), 30.0)
            print(f"Retry attempt {attempt + 1} after {delay}s")
            time.sleep(delay)

    raise Exception('Max retries exceeded')

# Usage
openai = create_openai(api_key=os.getenv("LOCKLLM_API_KEY"))

response = call_with_backoff(lambda: openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": user_input}]
))

Link to section: Type HintsType Hints

Full type hint support with mypy:

Link to section: Type AnnotationsType Annotations

from lockllm import (
    LockLLM,
    LockLLMConfig,
    RequestOptions,
    ScanRequest,
    ScanResponse,
    ScanResult,
    ScanOptions,
    ScanMode,
    ScanAction,
    RouteAction,
    PIIAction,
    PIIResult,
    ProxyOptions,
    ProxyResponseMetadata,
    ProxyScanWarning,
    ProxyPolicyWarnings,
    ProxyAbuseDetected,
    ProxyPIIDetected,
    ProxyRoutingMetadata,
    Sensitivity,
    ProviderName,
    TaskType,
    ComplexityTier,
    Usage,
    Debug,
)
from typing import Optional

# Configuration types
def create_client(api_key: str, base_url: Optional[str] = None) -> LockLLM:
    return LockLLM(api_key=api_key, base_url=base_url)

# Scan function with type hints
def scan_prompt(text: str, level: Sensitivity = "medium") -> ScanResponse:
    lockllm: LockLLM = LockLLM(api_key="...")
    result: ScanResponse = lockllm.scan(input=text, sensitivity=level)
    return result

# Response types
response: ScanResponse = scan_prompt("test")
is_safe: bool = response.safe
score: float = response.injection
request_id: str = response.request_id

Link to section: mypy Supportmypy Support

The SDK includes a py.typed marker for mypy:

# Run mypy type checking
mypy your_code.py

# Example output:
# your_code.py:10: error: Argument "sensitivity" has incompatible type "str"; expected "Literal['low', 'medium', 'high']"

Link to section: IDE AutocompleteIDE Autocomplete

Full IDE autocomplete with type stubs:

from lockllm import LockLLM

lockllm = LockLLM(api_key="...")

# IDE suggests: scan(...)
lockllm.s  # <-- autocomplete suggestions

# IDE suggests: input, sensitivity, scan_mode, scan_action, policy_action, ...
lockllm.scan(i  # <-- autocomplete for parameters

# IDE suggests: "low", "medium", "high"
lockllm.scan(input="test", sensitivity="m  # <-- autocomplete for values

Link to section: API ReferenceAPI Reference

Link to section: Type AliasesType Aliases

from lockllm import (
    Sensitivity,      # Literal["low", "medium", "high"]
    ScanMode,         # Literal["normal", "policy_only", "combined"]
    ScanAction,       # Literal["block", "allow_with_warning"]
    RouteAction,      # Literal["disabled", "auto", "custom"]
    PIIAction,        # Literal["strip", "block", "allow_with_warning"]
    ProviderName,     # Literal["openai", "anthropic", "gemini", "cohere", ...]
    TaskType,         # Literal["Open QA", "Closed QA", "Summarization", ...]
    ComplexityTier,   # Literal["low", "medium", "high"]
)

Sensitivity - Detection sensitivity threshold level

Sensitivity = Literal["low", "medium", "high"]

ScanMode - Which security checks to perform

ScanMode = Literal["normal", "policy_only", "combined"]

ScanAction - Behavior when threats or violations are detected

ScanAction = Literal["block", "allow_with_warning"]

RouteAction - Smart routing mode

RouteAction = Literal["disabled", "auto", "custom"]

PIIAction - PII detection behavior

PIIAction = Literal["strip", "block", "allow_with_warning"]

ProviderName - All supported AI provider identifiers

ProviderName = Literal[
    "openai", "anthropic", "gemini", "cohere", "openrouter",
    "perplexity", "mistral", "groq", "deepseek", "together",
    "xai", "fireworks", "anyscale", "huggingface", "azure",
    "bedrock", "vertex-ai",
]

TaskType - Supported task types for smart routing

TaskType = Literal[
    "Open QA", "Closed QA", "Summarization", "Text Generation",
    "Code Generation", "Chatbot", "Classification", "Rewrite",
    "Brainstorming", "Extraction", "Other",
]

ComplexityTier - Complexity tiers for routing

ComplexityTier = Literal["low", "medium", "high"]
# Mapped from complexity score: low (0-0.4), medium (0.4-0.7), high (0.7-1.0)

Link to section: ConstantsConstants

from lockllm import PROVIDER_BASE_URLS, UNIVERSAL_PROXY_URL

# Provider-specific proxy URLs (dict mapping ProviderName -> URL)
# e.g., PROVIDER_BASE_URLS["openai"] -> "https://api.lockllm.com/v1/proxy/openai"

# Universal proxy URL for non-BYOK users
# UNIVERSAL_PROXY_URL -> "https://api.lockllm.com/v1/proxy"

Link to section: ClassesClasses

Link to section: LockLLMLockLLM

Synchronous client class for scanning prompts.

class LockLLM:
    def __init__(
        self,
        api_key: str,
        base_url: Optional[str] = None,
        timeout: Optional[float] = None,
        max_retries: Optional[int] = None
    )

    def scan(
        self,
        input: str,
        sensitivity: Literal["low", "medium", "high"] = "medium",
        scan_mode: Optional[Literal["normal", "policy_only", "combined"]] = None,
        scan_action: Optional[Literal["block", "allow_with_warning"]] = None,
        policy_action: Optional[Literal["block", "allow_with_warning"]] = None,
        abuse_action: Optional[Literal["block", "allow_with_warning"]] = None,
        pii_action: Optional[Literal["strip", "block", "allow_with_warning"]] = None,
        compression: Optional[Literal["toon", "compact", "combined"]] = None,
        compression_rate: Optional[float] = None,
        chunk: Optional[bool] = None,
        scan_options: Optional[ScanOptions] = None,
        **options
    ) -> ScanResponse

    @property
    def config(self) -> LockLLMConfig

    def close(self) -> None

Link to section: AsyncLockLLMAsyncLockLLM

Asynchronous client class for scanning prompts.

class AsyncLockLLM:
    def __init__(
        self,
        api_key: str,
        base_url: Optional[str] = None,
        timeout: Optional[float] = None,
        max_retries: Optional[int] = None
    )

    async def scan(
        self,
        input: str,
        sensitivity: Literal["low", "medium", "high"] = "medium",
        scan_mode: Optional[Literal["normal", "policy_only", "combined"]] = None,
        scan_action: Optional[Literal["block", "allow_with_warning"]] = None,
        policy_action: Optional[Literal["block", "allow_with_warning"]] = None,
        abuse_action: Optional[Literal["block", "allow_with_warning"]] = None,
        pii_action: Optional[Literal["strip", "block", "allow_with_warning"]] = None,
        compression: Optional[Literal["toon", "compact", "combined"]] = None,
        compression_rate: Optional[float] = None,
        chunk: Optional[bool] = None,
        scan_options: Optional[ScanOptions] = None,
        **options
    ) -> ScanResponse

    @property
    def config(self) -> LockLLMConfig

    async def close(self) -> None

Link to section: Data ClassesData Classes

Link to section: LockLLMConfigLockLLMConfig

@dataclass
class LockLLMConfig:
    api_key: str              # Your LockLLM API key (required)
    base_url: str             # API endpoint (default: "https://api.lockllm.com")
    timeout: float            # Request timeout in seconds (default: 60.0)
    max_retries: int          # Max retry attempts (default: 3)

Link to section: RequestOptionsRequestOptions

Per-request configuration overrides:

@dataclass
class RequestOptions:
    headers: Optional[Dict[str, str]] = None  # Additional HTTP headers to include
    timeout: Optional[float] = None           # Override default timeout for this request

Link to section: ScanResultScanResult

Base class for scan responses. Contains the core detection fields shared by ScanResponse and by PromptInjectionError.scan_result:

@dataclass
class ScanResult:
    safe: bool                     # true if safe, false if malicious
    label: Literal[0, 1]          # 0=safe, 1=malicious
    confidence: Optional[float]   # Confidence score 0-100 (None in policy_only mode)
    injection: Optional[float]    # Injection risk score 0-100 (None in policy_only mode)
    sensitivity: str               # Sensitivity level used for this scan

Link to section: ScanResponseScanResponse

@dataclass
class ScanResponse(ScanResult):
    safe: bool                                       # true if safe, false if malicious
    label: Literal[0, 1]                            # 0=safe, 1=malicious
    confidence: Optional[float]                      # Confidence score 0-100
    injection: Optional[float]                       # Injection risk score 0-100
    sensitivity: str                                 # Sensitivity level used
    request_id: str                                  # Unique request identifier
    usage: Usage                                     # Usage statistics
    debug: Optional[Debug]                           # Debug info (optional)
    policy_confidence: Optional[float]               # Policy check confidence (0-100)
    policy_warnings: Optional[List[PolicyViolation]] # Policy violations (if any)
    scan_warning: Optional[ScanWarning]              # Injection warning details
    abuse_warnings: Optional[AbuseWarning]           # Abuse detection results
    routing: Optional[RoutingInfo]                   # Routing metadata
    pii_result: Optional[PIIResult]                  # PII detection result (when enabled)

Link to section: ScanRequestScanRequest

A single scan input with an optional sensitivity override. Used when constructing scan payloads programmatically:

@dataclass
class ScanRequest:
    input: str                                    # The text prompt to scan (required)
    sensitivity: Sensitivity = "medium"           # Detection sensitivity level

Link to section: ScanOptionsScanOptions

Reusable scan configuration for the scan API:

@dataclass
class ScanOptions:
    scan_mode: Optional[ScanMode] = None        # "normal" | "policy_only" | "combined"
    scan_action: Optional[ScanAction] = None    # "block" | "allow_with_warning"
    policy_action: Optional[ScanAction] = None  # "block" | "allow_with_warning"
    abuse_action: Optional[ScanAction] = None   # "block" | "allow_with_warning"
    chunk: Optional[bool] = None                # Enable/disable chunked scanning
    pii_action: Optional[PIIAction] = None      # "strip" | "block" | "allow_with_warning"
    compression: Optional[CompressionAction] = None  # "toon" | "compact" | "combined"
    compression_rate: Optional[float] = None         # 0.3-0.7 (compact/combined only)

Link to section: ProxyOptionsProxyOptions

Configuration for wrapper functions and proxy requests:

@dataclass
class ProxyOptions:
    scan_mode: Optional[str] = None         # "normal" | "policy_only" | "combined"
    scan_action: Optional[str] = None       # "block" | "allow_with_warning"
    policy_action: Optional[str] = None     # "block" | "allow_with_warning"
    abuse_action: Optional[str] = None      # "block" | "allow_with_warning"
    route_action: Optional[str] = None      # "disabled" | "auto" | "custom"
    sensitivity: Optional[str] = None       # "low" | "medium" | "high"
    cache_response: Optional[bool] = None   # Enable/disable response caching
    cache_ttl: Optional[int] = None         # Cache TTL in seconds (max 86400)
    chunk: Optional[bool] = None            # Enable/disable chunked scanning
    pii_action: Optional[str] = None        # "strip" | "block" | "allow_with_warning"
    compression: Optional[str] = None       # "toon" | "compact" | "combined"
    compression_rate: Optional[float] = None  # 0.3-0.7 (compact/combined only)

Link to section: UsageUsage

@dataclass
class Usage:
    requests: int      # Number of upstream inference requests used
    input_chars: int   # Number of characters in the input

Link to section: DebugDebug

Debug information (available on certain plans):

@dataclass
class Debug:
    duration_ms: int                       # Total processing time in milliseconds
    inference_ms: int                      # ML inference time in milliseconds
    mode: Literal["single", "chunked"]     # Processing mode used

Link to section: PolicyViolationPolicyViolation

@dataclass
class PolicyViolation:
    policy_name: str                             # Name of the violated policy
    violated_categories: List[ViolatedCategory]  # List of violated categories
    violation_details: Optional[str]             # Text that triggered the violation

Link to section: ViolatedCategoryViolatedCategory

@dataclass
class ViolatedCategory:
    name: str                      # Category name
    description: Optional[str]     # Category description

Link to section: ScanWarningScanWarning

@dataclass
class ScanWarning:
    message: str            # Warning description
    injection_score: float  # Injection risk score (0-100)
    confidence: float       # Detection confidence (0-100)
    label: int              # 0=safe, 1=unsafe

Link to section: AbuseWarningAbuseWarning

@dataclass
class AbuseWarning:
    detected: bool                 # Whether abuse was detected
    confidence: float              # Abuse confidence (0-100)
    abuse_types: List[str]         # Types of abuse detected
    indicators: Dict[str, float]   # Scores per abuse category
    recommendation: Optional[str]  # Suggested action

Link to section: RoutingInfoRoutingInfo

@dataclass
class RoutingInfo:
    enabled: bool                      # Whether routing was applied
    task_type: str                     # Detected task classification
    complexity: float                  # Complexity score (0-1)
    selected_model: Optional[str]      # Model chosen by router
    reasoning: Optional[str]           # Why this model was selected
    estimated_cost: Optional[float]    # Estimated cost

Link to section: PIIResultPIIResult

@dataclass
class PIIResult:
    detected: bool                       # Whether PII was detected
    entity_types: List[str]              # Types of PII entities found
    entity_count: int                    # Number of PII entities found
    redacted_input: Optional[str]        # Redacted text (only when pii_action is "strip")

Link to section: CompressionResultCompressionResult

@dataclass
class CompressionResult:
    method: str                    # "toon", "compact", or "combined"
    compressed_input: str          # The compressed text
    original_length: int           # Original text length in characters
    compressed_length: int         # Compressed text length in characters
    compression_ratio: float       # Ratio of compressed/original (lower = better)

Link to section: ProxyScanWarningProxyScanWarning

Scan warning metadata parsed from proxy response headers:

@dataclass
class ProxyScanWarning:
    injection_score: float   # Injection score from scan (0-100)
    confidence: float        # Confidence level of the detection (0-100)
    detail: str              # Base64-encoded JSON with detailed scan info

Use decode_detail_field(warning.detail) to decode the detail field into a Python dict.

Link to section: ProxyPolicyWarningsProxyPolicyWarnings

Policy warning metadata parsed from proxy response headers:

@dataclass
class ProxyPolicyWarnings:
    count: int               # Number of policy violations detected
    confidence: float        # Confidence level of the detection (0-100)
    detail: str              # Base64-encoded JSON with violation details

Use decode_detail_field(warnings.detail) to decode the detail field into a Python dict.

Link to section: ProxyAbuseDetectedProxyAbuseDetected

Abuse detection metadata parsed from proxy response headers:

@dataclass
class ProxyAbuseDetected:
    confidence: float        # Confidence level of abuse detection (0-100)
    types: str               # Comma-separated abuse types detected
    detail: str              # Base64-encoded JSON with abuse details

Use decode_detail_field(abuse.detail) to decode the detail field into a Python dict.

Link to section: ProxyPIIDetectedProxyPIIDetected

PII detection metadata parsed from proxy response headers:

@dataclass
class ProxyPIIDetected:
    detected: bool           # Whether PII was detected
    entity_types: str        # Comma-separated PII entity types (e.g., "Email,Phone Number")
    entity_count: int        # Number of PII entities found
    action: str              # PII action taken ("strip", "block", "allow_with_warning")

Link to section: ProxyCompressionMetadataProxyCompressionMetadata

Compression metadata parsed from proxy response headers:

@dataclass
class ProxyCompressionMetadata:
    method: str                    # "toon", "compact", or "combined"
    applied: bool                  # Whether compression was actually applied
    ratio: Optional[float]        # Compression ratio (only when applied)

Link to section: ProxyRoutingMetadataProxyRoutingMetadata

Detailed routing metadata parsed from proxy response headers. Contains more information than RoutingInfo (which is returned from the scan API):

@dataclass
class ProxyRoutingMetadata:
    enabled: bool                                  # Whether routing was applied
    task_type: str                                 # Detected task classification
    complexity: float                              # Prompt complexity score (0-1)
    selected_model: str                            # Model chosen by router
    routing_reason: str                            # Explanation for model selection
    original_provider: str                         # Original provider requested
    original_model: str                            # Original model requested
    estimated_savings: float                       # Estimated cost savings
    estimated_original_cost: Optional[float]       # Estimated cost with original model
    estimated_routed_cost: Optional[float]         # Estimated cost with routed model
    estimated_input_tokens: Optional[int]          # Estimated input tokens for routing
    estimated_output_tokens: Optional[int]         # Estimated output tokens for routing
    routing_fee_reason: Optional[str]              # Reason for routing fee or waiver

Link to section: ProxyResponseMetadataProxyResponseMetadata

Comprehensive metadata extracted from proxy response headers. Use parse_proxy_metadata() to parse response headers into this typed object:

@dataclass
class ProxyResponseMetadata:
    # Core
    request_id: str                                    # Unique request identifier
    scanned: bool                                      # Whether request was scanned
    safe: bool                                         # Whether request was safe
    scan_mode: str                                     # Scan mode used
    credits_mode: str                                  # "lockllm_credits" | "byok"
    provider: str                                      # AI provider name
    model: Optional[str]                               # Model identifier
    sensitivity: Optional[str]                         # Detection sensitivity level used
    label: Optional[int]                               # Binary classification (0=safe, 1=unsafe)
    policy_confidence: Optional[float]                 # Policy check confidence (0-100)
    blocked: Optional[bool]                            # Whether the request was blocked

    # Warnings
    scan_warning: Optional[ProxyScanWarning]           # Injection warning details
    policy_warnings: Optional[ProxyPolicyWarnings]     # Policy violation details
    abuse_detected: Optional[ProxyAbuseDetected]       # Abuse detection details
    pii_detected: Optional[ProxyPIIDetected]           # PII detection details (when enabled)

    # Routing
    routing: Optional[ProxyRoutingMetadata]            # Routing metadata

    # Credits
    credits_reserved: Optional[float]                  # Credits reserved for request
    routing_fee_reserved: Optional[float]              # Routing fee reserved
    routing_fee_reason: Optional[str]                  # Reason for routing fee or waiver
    credits_deducted: Optional[float]                  # Credits actually deducted
    balance_after: Optional[float]                     # Balance after request

    # Cost estimates
    estimated_original_cost: Optional[float]           # Estimated cost with original model
    estimated_routed_cost: Optional[float]             # Estimated cost with routed model
    estimated_input_tokens: Optional[int]              # Estimated input tokens
    estimated_output_tokens: Optional[int]             # Estimated output tokens

    # Cache
    cache_status: Optional[str]                        # "HIT" | "MISS"
    cache_age: Optional[int]                           # Cache age in seconds
    tokens_saved: Optional[int]                        # Tokens saved by cache hit
    cost_saved: Optional[float]                        # Cost saved by cache hit

    # Decoded detail fields
    scan_detail: Optional[Any]                         # Decoded scan detail from header
    policy_detail: Optional[Any]                       # Decoded policy warning detail from header
    abuse_detail: Optional[Any]                        # Decoded abuse detail from header

Link to section: FunctionsFunctions

Link to section: Wrapper Functions (Sync)Wrapper Functions (Sync)

def create_client(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_openai_compatible(api_key: str, base_url: str, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_openai(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_anthropic(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> Anthropic
def create_groq(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_deepseek(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_perplexity(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_mistral(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_openrouter(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_together(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_xai(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_fireworks(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_anyscale(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_huggingface(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_gemini(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_cohere(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_azure(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_bedrock(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI
def create_vertex_ai(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> OpenAI

Link to section: Wrapper Functions (Async)Wrapper Functions (Async)

def create_async_client(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_openai_compatible(api_key: str, base_url: str, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_openai(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_anthropic(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncAnthropic
def create_async_groq(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_deepseek(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_perplexity(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_mistral(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_openrouter(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_together(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_xai(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_fireworks(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_anyscale(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_huggingface(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_gemini(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_cohere(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_azure(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_bedrock(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI
def create_async_vertex_ai(api_key: str, base_url: Optional[str] = None, proxy_options: Optional[ProxyOptions] = None, **kwargs) -> AsyncOpenAI

Link to section: Utility FunctionsUtility Functions

def get_proxy_url(provider: ProviderName) -> str
    # Get proxy URL for a specific provider
    # Example: get_proxy_url('openai') -> 'https://api.lockllm.com/v1/proxy/openai'

def get_all_proxy_urls() -> Dict[ProviderName, str]
    # Get all provider proxy URLs as a dict

def get_universal_proxy_url() -> str
    # Get universal proxy URL for non-BYOK users
    # Returns: 'https://api.lockllm.com/v1/proxy'

def build_lockllm_headers(options: ProxyOptions) -> Dict[str, str]
    # Convert ProxyOptions to X-LockLLM-* HTTP headers
    # Useful when making raw HTTP requests to the proxy

def parse_proxy_metadata(headers: Dict[str, str]) -> ProxyResponseMetadata
    # Parse response headers into typed metadata object

def decode_detail_field(detail: str) -> Optional[Any]
    # Decode base64-encoded detail fields from proxy response headers

Link to section: Exception ClassesException Classes

class LockLLMError(Exception):
    message: str
    type: str
    code: Optional[str]
    status: Optional[int]
    request_id: Optional[str]

class AuthenticationError(LockLLMError): pass       # 401
class RateLimitError(LockLLMError):                  # 429
    retry_after: Optional[int]                       # milliseconds
class PromptInjectionError(LockLLMError):            # 400
    scan_result: ScanResult
class PolicyViolationError(LockLLMError):            # 403
    violated_policies: Optional[List[Dict[str, Any]]]
class AbuseDetectedError(LockLLMError):              # 400
    abuse_details: Optional[Dict[str, Any]]
class PIIDetectedError(LockLLMError):                # 403
    entity_types: List[str]                          # PII entity types detected
    entity_count: int                                # Number of PII entities found
class InsufficientCreditsError(LockLLMError):        # 402
    current_balance: Optional[float]
    estimated_cost: Optional[float]
class UpstreamError(LockLLMError):                   # 502
    provider: Optional[str]
    upstream_status: Optional[int]
class ConfigurationError(LockLLMError): pass         # 400
class NetworkError(LockLLMError):                    # 0
    cause: Optional[Exception]

Link to section: Best PracticesBest Practices

Link to section: SecuritySecurity

Never hardcode API keys - Use environment variables or secret managers
Log security incidents - Track blocked requests with request IDs
Set appropriate sensitivity - Balance security needs with false positives
Use block mode for critical paths - Set scan_action="block" and policy_action="block" for sensitive operations
Handle errors gracefully - Provide user-friendly error messages
Monitor request patterns - Watch for attack trends in dashboard
Rotate keys regularly - Update API keys periodically
Use HTTPS only - Never send API keys over unencrypted connections

Link to section: PerformancePerformance

Use wrapper functions - Most efficient integration method
Use async for I/O-bound workloads - Better concurrency with AsyncLockLLM
Use response caching - Enable response caching via ProxyOptions(cache_response=True) to reduce costs and latency for repeated queries
Set reasonable timeouts - Balance user experience with reliability
Connection pooling - The SDK handles this automatically
Batch when possible - Group similar requests with asyncio.gather()

Link to section: Async ProgrammingAsync Programming

Use async context managers - Ensure proper resource cleanup
Avoid blocking calls in async - Don't mix sync and async code
Handle event loop properly - Use asyncio.run() or existing loop
Be cautious with global state - Async can expose race conditions
Use asyncio.gather() for concurrency - Process multiple requests in parallel

Link to section: Production DeploymentProduction Deployment

Test sensitivity levels - Validate with real user data
Implement monitoring - Track blocked requests and false positives
Set up alerting - Get notified of security incidents
Review logs regularly - Analyze attack patterns
Keep SDK updated - Benefit from latest improvements (pip install -U lockllm)
Document incidents - Maintain security incident log
Load test - Verify performance under expected load

Link to section: Migration GuidesMigration Guides

Link to section: From Direct API IntegrationFrom Direct API Integration

If you're currently calling LLM APIs directly:

# Before: Direct OpenAI API call
from openai import OpenAI
import os

openai = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": user_input}]
)

# After: With LockLLM security (one line change)
from lockllm import create_openai
import os

openai = create_openai(api_key=os.getenv("LOCKLLM_API_KEY"))  # Only change

# Everything else stays the same
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": user_input}]
)

Link to section: From OpenAI LibraryFrom OpenAI Library

Minimal changes required:

# Before
from openai import OpenAI
openai = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# After
from lockllm import create_openai
openai = create_openai(api_key=os.getenv("LOCKLLM_API_KEY"))  # Use LockLLM key

# All other code remains unchanged

Link to section: From Anthropic LibraryFrom Anthropic Library

Minimal changes required:

# Before
from anthropic import Anthropic
anthropic = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

# After
from lockllm import create_anthropic
anthropic = create_anthropic(api_key=os.getenv("LOCKLLM_API_KEY"))  # Use LockLLM key

# All other code remains unchanged

Link to section: Async Programming GuideAsync Programming Guide

Link to section: Event Loop ManagementEvent Loop Management

import asyncio
from lockllm import AsyncLockLLM

# Method 1: Using asyncio.run() (recommended for scripts)
async def main():
    lockllm = AsyncLockLLM(api_key="...")
    result = await lockllm.scan(input="test")
    print(result.safe)

asyncio.run(main())

# Method 2: Using existing event loop (for frameworks)
loop = asyncio.get_event_loop()
result = loop.run_until_complete(
    lockllm.scan(input="test")
)

# Method 3: In Jupyter notebooks
await lockllm.scan(input="test")  # Works directly

Link to section: Concurrency PatternsConcurrency Patterns

import asyncio
from lockllm import AsyncLockLLM

async def concurrent_scans():
    lockllm = AsyncLockLLM(api_key="...")

    # Scan multiple prompts concurrently
    prompts = ["prompt1", "prompt2", "prompt3"]

    # Method 1: asyncio.gather()
    results = await asyncio.gather(*[
        lockllm.scan(input=prompt)
        for prompt in prompts
    ])

    # Method 2: asyncio.create_task()
    tasks = [
        asyncio.create_task(lockllm.scan(input=prompt))
        for prompt in prompts
    ]
    results = await asyncio.gather(*tasks)

    return results

Link to section: Resource CleanupResource Cleanup

from lockllm import AsyncLockLLM
import asyncio

# Always use async context managers
async def main():
    async with AsyncLockLLM(api_key="...") as client:
        result = await client.scan(input="test")
        # Client is automatically closed when exiting the block
        return result

asyncio.run(main())

Link to section: TroubleshootingTroubleshooting

Link to section: Common IssuesCommon Issues

"Invalid API key" error (401)

Verify your LockLLM API key is correct
Check the key hasn't been revoked in the dashboard
Ensure you're using your LockLLM key, not your provider key

"No provider API key configured" error (400)

Add your provider API key (OpenAI, Anthropic, etc.) in the dashboard
Navigate to Proxy Settings and configure provider keys
Ensure the provider key is enabled (toggle switch on)

"Could not extract prompt from request" error (400)

Verify request body format matches provider API spec
Check you're using the correct SDK version
Ensure messages array is properly formatted

"Custom policy violation" error (403)

Check your custom policies in the dashboard
If using policy_action="block", violations will raise PolicyViolationError
Use scan_mode="combined" or scan_mode="policy_only" to enable policy checks

"Insufficient credits" error (402)

Check your credit balance in the dashboard billing page
Top up credits at https://www.lockllm.com/billing
The error includes current_balance and estimated_cost for reference

"Abuse detected" error (400)

Your request matched abuse patterns (bot content, repetition, resource exhaustion)
Review the abuse_details in the error for specifics
Abuse detection is opt-in - remove abuse_action parameter to disable

"PII detected" error (403)

Your input contains personally identifiable information and pii_action is set to "block"
Review the entity_types and entity_count in the error for specifics
Use pii_action="strip" to automatically redact PII instead of blocking
Use pii_action="allow_with_warning" to allow the request while flagging PII
PII detection is opt-in - remove the pii_action parameter to disable

High latency

Check your network connection
Verify LockLLM API status
Consider adjusting timeout settings
Review provider API latency

mypy errors

Ensure Python 3.8+ is installed
Check peer dependencies are installed (openai, anthropic)
Run pip install types-requests
Verify SDK is installed: pip show lockllm

Async event loop issues

Don't mix sync and async code
Use asyncio.run() for scripts
Use existing event loop in frameworks
Close resources properly with async context managers

Link to section: Debugging TipsDebugging Tips

Enable detailed logging:

import logging

# Enable debug logging
logging.basicConfig(level=logging.DEBUG)

from lockllm import LockLLM

lockllm = LockLLM(api_key=os.getenv("LOCKLLM_API_KEY"))

try:
    result = lockllm.scan(input=user_prompt)

    print(f"Scan result: safe={result.safe}, injection={result.injection}%, request_id={result.request_id}")
except Exception as error:
    print(f"Error: {type(error).__name__}: {error}")
    if hasattr(error, 'request_id'):
        print(f"Request ID: {error.request_id}")

Link to section: Getting HelpGetting Help

Documentation: https://www.lockllm.com/docs
GitHub Issues: https://github.com/lockllm/lockllm-pip/issues
Email Support: [email protected]
Status Page: status.lockllm.com

Link to section: FAQFAQ

Link to section: How do I install the SDK?How do I install the SDK?

Install using pip, poetry, or pipenv:

pip install lockllm
poetry add lockllm
pipenv install lockllm

The SDK requires Python 3.8+ and works with Python 3.8, 3.9, 3.10, 3.11, and 3.12.

Link to section: Does the SDK work as a drop-in replacement for OpenAI and Anthropic?Does the SDK work as a drop-in replacement for OpenAI and Anthropic?

Yes. Use create_openai() or create_anthropic() to get wrapped clients that work exactly like the official SDKs. All methods, streaming, and function calling are supported. Prompts are automatically scanned before being sent to the provider.

Link to section: What's the difference between sync and async?What's the difference between sync and async?

The SDK provides both synchronous (LockLLM, create_openai) and asynchronous (AsyncLockLLM, create_async_openai) APIs. Use async for I/O-bound workloads and better concurrency. Use sync for simple scripts and synchronous applications.

Link to section: What type hint support is available?What type hint support is available?

The SDK includes full type hints with a py.typed marker for mypy. It supports mypy strict mode, provides IDE autocomplete, and includes type stubs for all APIs. Python 3.8+ type hints are used throughout.

Link to section: Which AI providers are supported?Which AI providers are supported?

17+ providers are supported with both sync and async variants: OpenAI, Anthropic, Groq, DeepSeek, Perplexity, Mistral, OpenRouter, Together AI, xAI (Grok), Fireworks AI, Anyscale, Hugging Face, Google Gemini, Cohere, Azure OpenAI, AWS Bedrock, Google Vertex AI. All providers support custom endpoint URLs for self-hosted and private deployments.

Link to section: How do I handle errors?How do I handle errors?

The SDK provides 11 typed exception classes: AuthenticationError, RateLimitError, PromptInjectionError, PolicyViolationError, AbuseDetectedError, PIIDetectedError, InsufficientCreditsError, UpstreamError, ConfigurationError, NetworkError, and base LockLLMError. Use try-except blocks with specific exception types for proper error handling.

Link to section: Does the SDK support streaming?Does the SDK support streaming?

Yes. All provider wrappers fully support streaming responses in both sync and async modes. Use stream=True in your requests and iterate with for (sync) or async for (async) loops.

Link to section: What are scan modes?What are scan modes?

Scan modes control which security checks are performed on your requests. Use "normal" for core injection detection only, "policy_only" for custom content policies only, or "combined" (default) for maximum protection with both. Set the mode via scan_mode parameter on the scan API or via ProxyOptions for wrapper functions.

Link to section: How does smart routing work?How does smart routing work?

Smart routing automatically selects the best AI model based on task type and complexity to optimize cost and quality. Enable it with route_action="auto" in ProxyOptions. You can also configure custom routing rules in the dashboard and use route_action="custom". Routing is available only in proxy mode.

Link to section: Can I cache LLM responses?Can I cache LLM responses?

Yes. Response caching is enabled by default in proxy mode to reduce costs and latency for repeated queries. Use ProxyOptions(cache_response=False) to disable it, or set a custom TTL with cache_ttl (max 86400 seconds / 24 hours). Streaming requests are not cached.

Link to section: How does PII detection work?How does PII detection work?

PII (Personally Identifiable Information) detection is an opt-in feature that scans prompts for sensitive data such as names, email addresses, phone numbers, Social Security numbers, credit card numbers, and other personal information. Enable it by setting pii_action to "strip" (replaces PII with placeholders), "block" (blocks the request), or "allow_with_warning" (allows with PII metadata in response). PII detection works in both the scan API and proxy mode. See PII Detection for details and supported entity types.

Link to section: Is there a free tier?Is there a free tier?

Yes. LockLLM offers a free tier with scanning included. The free tier includes 300 requests per minute. Higher tiers with increased rate limits and free monthly credits are available based on usage. See pricing for details.

Link to section: GitHub RepositoryGitHub Repository

View the source code, report issues, and contribute:

Repository: https://github.com/lockllm/lockllm-pip

PyPI Package: https://pypi.org/project/lockllm/

Link to section: Next StepsNext Steps

View Python SDK integration page for more examples
Read API Reference for REST API details
Explore Best Practices for production deployments
Check out Proxy Mode for alternative integration
Configure Webhooks for security alerts
Browse Dashboard documentation