What does LockLLM v1 do?

LockLLM v1 scans user-controlled text before it reaches your model and returns a decision with a risk score so you can block, warn, log, or route the request.

Does it work with my current model provider?

Yes. LockLLM v1 sits in front of your provider and works with OpenAI, Anthropic, Gemini, OpenRouter, and local models since it only evaluates the input.

How do I use the risk score?

Use it to enforce a policy such as block above a threshold, warn and log for borderline cases, or route high-risk requests to a safer flow or human review.

Will this stop system prompt leaks and tool abuse?

It helps by preventing common injection and instruction override patterns from reaching your core prompt or tool router, reducing system prompt extraction attempts and risky tool-call manipulation.

Meet LockLLM: A Lightweight AI Security API

LockLLM v1 is officially released.

If you’ve ever shipped an LLM feature (chat, agents, tool calling, RAG, customer support bots), you’ve already met the problem: users will try to steer the model outside your intended behavior. Sometimes it’s accidental. Sometimes it’s creative. Sometimes it’s malicious.

LockLLM exists to solve that with a simple idea:

Scan every input before it reaches your model, and block or quarantine anything that looks like a prompt injection attempt.

Link to section: What LockLLM v1 doesWhat LockLLM v1 does

LockLLM v1 is a lightweight security layer you place in front of your AI stack. It works independently of your model - no retraining, no prompt rewrites, and no changes to your downstream provider.

With v1, you can:

Detect prompt injection & jailbreak attempts (e.g., “ignore all previous instructions,” “reveal system prompt,” “bypass guardrails,” etc.)
Score risk and decide what to do (block, warn, log, or route to a safer path)
Add protection to any model/provider (OpenAI, Anthropic, Gemini, OpenRouter, local models, doesn’t matter)
Centralize security for multiple apps using a single API key and consistent policy

Link to section: How it worksHow it works

LockLLM sits between your user and your LLM.

Your app sends the user message to LockLLM.
LockLLM scans the message with a dedicated prompt-injection detector.
LockLLM returns a decision + risk score.
Your app either:
- forwards the message to your LLM (safe), or
- blocks / asks for clarification / routes to human review (unsafe).

This pattern is simple, but it changes everything: your LLM never sees the dangerous input.

Link to section: What’s included in v1What’s included in v1

Link to section: 1) Scan endpoint (the core)1) Scan endpoint (the core)

LockLLM v1 exposes a scan endpoint that returns:

whether the input is considered safe
a label (safe vs injection)
a confidence / injection score
a request id for tracing & support

This is designed to fit cleanly into any backend - serverless, Node, Go, Python, you name it.

Link to section: 2) Middleware-friendly design2) Middleware-friendly design

LockLLM v1 is built to feel like “drop-in security”:

one request in
one response out
your app stays in control of the final decision

You decide whether you want a strict block policy, or a softer “warn + log” policy for edge cases.

Link to section: 3) Developer-focused output3) Developer-focused output

Instead of a generic “blocked,” v1 is designed to support clarity:

predictable JSON output
stable fields for dashboards and analytics
easy-to-log metadata (request ids, scores, etc.)

Link to section: 4) Performance-minded defaults4) Performance-minded defaults

Security tools only work if people actually keep them enabled. v1 focuses on:

fast scan latency
clear failure modes
safe handling for large payloads (truncate instead of hard failing)

Link to section: Who v1 is forWho v1 is for

LockLLM v1 is useful anywhere you accept user-controlled text that will influence an LLM:

Chatbots and support assistants
Tool-using agents (especially with web browsing, code execution, or API actions)
RAG systems (prompt injection through retrieved documents is real)
Internal copilots and enterprise assistants
Any “AI form field” embedded in a product workflow

If a user can type into it, someone will try to break it.

Link to section: Quick start mindsetQuick start mindset

LockLLM is not trying to replace your model, your prompts, or your existing safety layers.

It’s a front-door filter - one extra check that catches the most common and most damaging category of LLM attacks before they propagate through your system.

That means:

fewer jailbreaks reaching your core prompt
fewer "system prompt leaks"
fewer tool-call exploits
fewer weird edge cases that turn into incident reports

Link to section: What’s nextWhat’s next

v1 is the foundation. Next up on the roadmap:

Policy modes (strict / balanced / permissive) with per-route overrides
Better explainability (why something was flagged, without leaking detection tricks)
Batch scanning for documents and RAG corpora
Dashboard analytics (attack trends, top signatures, false positive tuning)
Team + org features (keys, roles, usage visibility)

Link to section: Try LockLLM v1Try LockLLM v1

If you already have an LLM feature in production, you can add LockLLM in minutes and immediately reduce your exposure to prompt injection.

LockLLM v1 is live, and we’re shipping fast from here.

Have a feature request or a tricky injection you want us to test against? Send it over. We’re building this alongside real developers shipping real AI products.