PII Detection & Redaction

Detect and protect personally identifiable information in prompts with multilingual ML-based scanning. Automatically redact PII before it reaches your LLM provider.

Link to section: What is PII Detection?What is PII Detection?

PII (Personally Identifiable Information) Detection is an opt-in feature that scans prompts for sensitive personal data before they reach your LLM provider. Using an advanced ML-based model, it identifies 21 types of personal information - from names and emails to credit card numbers and Social Security numbers - and gives you three ways to handle detected PII.

Key benefits:

  • Multilingual support - detects PII across multiple languages automatically, not just English
  • 21 entity types detected including names, financial data, government IDs, and more
  • Three flexible actions - warn, block, or automatically strip PII from prompts
  • Strip mode ensures your LLM never sees actual personal information
  • Privacy compliance support for GDPR, HIPAA, CCPA, and other regulations
  • Works in both scan endpoint and proxy mode
  • Fail-open design - detection issues never block your requests
  • Combines seamlessly with threat detection, custom policies, and prompt compression

Link to section: Multilingual CapabilitiesMultilingual Capabilities

LockLLM's PII detection model is built on multilingual ML architecture, meaning it works across multiple languages out of the box. There is no configuration needed - multilingual detection is always active.

What this means for you:

  • Names, addresses, phone numbers, and other entities are detected regardless of the language they are written in
  • International formats are recognized (e.g., European phone numbers, non-US address formats, names in various scripts)
  • Mixed-language prompts are handled naturally - PII is detected even when the prompt switches between languages
  • Ideal for global applications serving users in different countries and languages

Supported languages include (but are not limited to): English, Spanish, French, German, Italian, Portuguese, Dutch, and other major languages. The ML model handles variations in formatting, abbreviations, and regional conventions.

Link to section: Supported Entity TypesSupported Entity Types

LockLLM detects 21 types of personally identifiable information, grouped by category:

Link to section: IdentityIdentity

Entity TypeExamples
First NameJohn, Maria, Wei
Last NameSmith, Garcia, Chen
Date of Birth01/15/1990, January 15, 1990
Usernamejohn_doe, user123

Link to section: Contact InformationContact Information

Entity TypeExamples
Email[email protected]
Phone Number(555) 123-4567, +44 20 7946 0958
Street Address123 Main Street, Apt 4B
CityNew York, London, Tokyo
StateCalifornia, CA
Zip Code90210, SW1A 1AA
Building NumberSuite 400, Floor 12
Secondary AddressP.O. Box 1234

Link to section: FinancialFinancial

Entity TypeExamples
Credit Card4111-1111-1111-1111
Account Number1234567890
Tax ID12-3456789

Link to section: Government IDsGovernment IDs

Entity TypeExamples
Social Security Number123-45-6789
Driver's LicenseD1234567
ID Card NumberAB1234567

Link to section: Security & NetworkSecurity & Network

Entity TypeExamples
PasswordP@ssw0rd123
IP Address192.168.1.1, 2001:db8::1
URLhttps://example.com/profile

Link to section: ActionsActions

PII detection is opt-in and controlled via the X-LockLLM-PII-Action header. Three actions are available:

Link to section: allow_with_warningallow_with_warning

Detect PII and include the results in the response, but forward the original (unmodified) request to your LLM provider.

Use when: You want visibility into what PII is being sent but don't want to modify or block requests. Good for monitoring and auditing.

Link to section: blockblock

Reject any request that contains PII with a 403 Forbidden error. The request is never forwarded to your LLM provider.

Use when: You have strict compliance requirements and personal data must never reach your AI provider under any circumstances.

Automatically detect PII and replace each entity with a [TYPE] placeholder before forwarding the request. Your LLM receives the redacted text and never sees the actual personal information.

Use when: You want the best of both worlds - your LLM can still understand the context of the request while actual personal data stays protected.

Link to section: Strip Mode Deep DiveStrip Mode Deep Dive

Strip mode is the recommended action for most privacy-conscious applications. Here's how it works:

Link to section: Before and AfterBefore and After

Original prompt (what the user sends):

My name is John Smith and I live at 742 Evergreen Terrace, Springfield.
You can reach me at [email protected] or call (555) 123-4567.
My SSN is 123-45-6789.

Redacted prompt (what your LLM receives):

My name is [GIVENNAME] [SURNAME] and I live at [STREETADDRESS], [CITY].
You can reach me at [EMAIL] or call [TELEPHONENUM].
My SSN is [SOCIALNUM].

The LLM can still understand that the user is sharing contact information and asking about something related to their personal details, but it never sees the actual values. This is especially valuable for:

  • Customer support chatbots that process user inquiries
  • Healthcare applications where patient information must be protected
  • Financial services where personal data appears in user queries
  • Any application where users might accidentally share sensitive information

Link to section: Placeholder TypesPlaceholder Types

Each detected entity is replaced with a descriptive placeholder that preserves context:

Detected EntityPlaceholder
First Name[GIVENNAME]
Last Name[SURNAME]
Email[EMAIL]
Phone Number[TELEPHONENUM]
Street Address[STREETADDRESS]
City[CITY]
Credit Card[CREDITCARDNUMBER]
Social Security Number[SOCIALNUM]
Date of Birth[DATEOFBIRTH]
Driver's License[DRIVERSLICENSE]
Password[PASSWORD]
IP Address[IPADDRESS]
Account Number[ACCOUNTNUM]
Tax ID[TAXID]
ID Card Number[IDCARDNUM]
Username[USERNAME]
Zip Code[ZIPCODE]
Building Number[BUILDINGNUM]
Secondary Address[SECONDARYADDRESS]
State[STATE]
URL[URL]

Link to section: ConfigurationConfiguration

Link to section: HeadersHeaders

HeaderValuesDefaultDescription
X-LockLLM-PII-Actionallow_with_warning, block, stripNot set (disabled)How to handle detected PII

Link to section: Scan EndpointScan Endpoint

curl -X POST https://api.lockllm.com/v1/scan \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-LockLLM-PII-Action: strip" \
  -d '{
    "input": "Contact John Smith at [email protected] or 555-123-4567"
  }'

Link to section: Proxy Mode - JavaScript/TypeScriptProxy Mode - JavaScript/TypeScript

const OpenAI = require('openai')

const openai = new OpenAI({
  apiKey: process.env.LOCKLLM_API_KEY,
  baseURL: 'https://api.lockllm.com/v1/proxy/openai',
  defaultHeaders: {
    'X-LockLLM-PII-Action': 'strip'
  }
})

// User sends: "Contact John Smith at [email protected]"
// LLM receives: "Contact [GIVENNAME] [SURNAME] at [EMAIL]"
const response = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: userPrompt }]
})

Link to section: Proxy Mode - PythonProxy Mode - Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get('LOCKLLM_API_KEY'),
    base_url='https://api.lockllm.com/v1/proxy/openai',
    default_headers={
        'X-LockLLM-PII-Action': 'strip'
    }
)

response = client.chat.completions.create(
    model='gpt-4',
    messages=[{'role': 'user', 'content': user_prompt}]
)

Link to section: LockLLM SDK - JavaScript/TypeScriptLockLLM SDK - JavaScript/TypeScript

import { createOpenAI } from '@lockllm/sdk/wrappers'

const openai = createOpenAI({
  apiKey: process.env.LOCKLLM_API_KEY,
  proxyOptions: {
    piiAction: 'strip'
  }
})

Link to section: LockLLM SDK - PythonLockLLM SDK - Python

from lockllm import create_openai, ProxyOptions

openai = create_openai(
    api_key=os.getenv('LOCKLLM_API_KEY'),
    proxy_options=ProxyOptions(pii_action='strip')
)

Link to section: Response FormatResponse Format

Link to section: Scan Endpoint ResponseScan Endpoint Response

When PII is detected, the response includes a pii_result object:

{
  "request_id": "req_abc123",
  "safe": true,
  "confidence": 95,
  "injection": 3,
  "pii_result": {
    "detected": true,
    "entity_types": ["First Name", "Last Name", "Email", "Phone Number"],
    "entity_count": 4,
    "redacted_input": "Contact [GIVENNAME] [SURNAME] at [EMAIL] or [TELEPHONENUM]"
  }
}

The redacted_input field is only included when the action is strip.

Link to section: Proxy Mode Response HeadersProxy Mode Response Headers

HeaderDescription
X-LockLLM-PII-Detected"true" or "false"
X-LockLLM-PII-TypesComma-separated entity types found (e.g., "Email,Phone Number")
X-LockLLM-PII-CountNumber of PII entities detected
X-LockLLM-PII-ActionThe action that was applied

Link to section: Block Mode Error ResponseBlock Mode Error Response

When X-LockLLM-PII-Action: block and PII is detected:

{
  "error": {
    "message": "Request blocked due to personal information detected",
    "type": "lockllm_pii_error",
    "code": "pii_detected",
    "pii_details": {
      "entity_types": ["Email", "Phone Number", "Social Security Number"],
      "entity_count": 3
    },
    "request_id": "req_abc123"
  }
}

Link to section: Combining with Other FeaturesCombining with Other Features

PII detection integrates with all other LockLLM features. The processing order in proxy mode is:

Security scan -> PII detection/redaction -> Prompt compression -> Forward to provider

Important interactions:

  • Prompt compression: When PII stripping is enabled, compression is applied to the redacted text. This means compressed prompts never contain original PII values.
  • Threat detection: Security scanning runs on the original text before PII processing, ensuring no attacks are missed.
  • Custom policies: Policy checks run on the original text alongside threat detection.
  • Smart routing: Routing decisions are made independently of PII detection.

Link to section: Use CasesUse Cases

Link to section: Healthcare (HIPAA Compliance)Healthcare (HIPAA Compliance)

Protect patient information in medical AI applications:

const openai = new OpenAI({
  apiKey: process.env.LOCKLLM_API_KEY,
  baseURL: 'https://api.lockllm.com/v1/proxy/openai',
  defaultHeaders: {
    'X-LockLLM-PII-Action': 'strip',     // Redact patient PII
    'X-LockLLM-Scan-Action': 'block',     // Block injection attacks
    'X-LockLLM-Policy-Action': 'block'    // Block policy violations
  }
})

Link to section: Financial ServicesFinancial Services

Prevent credit card numbers, account numbers, and tax IDs from reaching your LLM:

client = OpenAI(
    api_key=os.environ.get('LOCKLLM_API_KEY'),
    base_url='https://api.lockllm.com/v1/proxy/openai',
    default_headers={
        'X-LockLLM-PII-Action': 'block'   # Reject requests with financial PII
    }
)

Link to section: Customer SupportCustomer Support

Allow customer inquiries through while protecting personal information:

// User: "My name is Jane Doe, order #12345, email [email protected]"
// LLM receives: "My name is [GIVENNAME] [SURNAME], order #12345, email [EMAIL]"
// The LLM can help with the order without knowing the customer's real identity

Link to section: EducationEducation

Protect student information in educational AI tools while preserving the learning context.

Link to section: Privacy CompliancePrivacy Compliance

PII detection helps you meet the requirements of major privacy regulations by preventing personal data from reaching third-party AI providers.

Link to section: GDPR (EU)GDPR (EU)

The General Data Protection Regulation requires data minimization (Article 5) and data protection by design (Article 25). PII detection directly supports both principles:

  • Strip mode ensures only the minimum necessary data reaches your AI provider - personal identifiers are replaced with placeholders before the request leaves LockLLM
  • Block mode prevents any request containing personal data from being processed, supporting strict data handling policies
  • LockLLM does not store prompt content or detected PII values - data is processed in memory and immediately discarded, supporting your data retention compliance

Link to section: HIPAA (US Healthcare)HIPAA (US Healthcare)

Protected Health Information (PHI) includes patient names, dates, contact information, and identification numbers. PII detection catches names, dates of birth, Social Security numbers, phone numbers, emails, and addresses. Combine with custom policies to add healthcare-specific restrictions (e.g., blocking medical diagnosis requests).

Link to section: CCPA (California)CCPA (California)

The California Consumer Privacy Act defines "personal information" broadly to include identifiers, contact details, financial data, and more. PII detection covers the key categories: names, emails, phone numbers, addresses, Social Security numbers, driver's licenses, and financial account information.

Link to section: Key Compliance BenefitKey Compliance Benefit

Across all regulations, a critical advantage of PII detection is that LockLLM never stores the personal data it detects. The detection runs in memory, results are returned immediately, and the original data is discarded. This minimizes your data processing footprint and reduces compliance risk.

Link to section: Detection Accuracy by Entity TypeDetection Accuracy by Entity Type

Detection accuracy varies by entity type based on how structured the data is. Understanding these differences helps you choose the right action mode for your use case.

Accuracy LevelEntity TypesWhy
Very highEmail, Phone Number, Credit Card, Social Security Number, IP Address, URLThese follow strict, well-defined patterns that are reliably detected regardless of context
HighDate of Birth, Zip Code, Tax ID, Driver's License, Account Number, ID Card NumberStructured data with some format variation across regions and conventions
Context-dependentFirst Name, Last Name, City, State, Street Address, Building Number, Secondary Address, Username, PasswordDetection relies on surrounding text and context to distinguish these from ordinary words

For maximum protection with context-dependent entities, use block mode and handle any edge cases in your application. For pattern-based entities (very high and high accuracy), strip mode works reliably with minimal false positives.

Link to section: Strip Mode in PracticeStrip Mode in Practice

Strip mode replaces detected PII with descriptive placeholders while preserving the meaning of the request. Here are examples across different use cases:

Link to section: Customer SupportCustomer Support

Original:

Hi, I'm Sarah Chen. My order #45678 was shipped to 42 Oak Lane, Portland, OR 97201.
Can you check the status? My phone is 503-555-0147.

What your LLM receives:

Hi, I'm [GIVENNAME] [SURNAME]. My order #45678 was shipped to [STREETADDRESS], [CITY], [STATE] [ZIPCODE].
Can you check the status? My phone is [TELEPHONENUM].

The order number passes through because it is not PII. The LLM can still help with the order inquiry without knowing the customer's real identity.

Link to section: Financial QueryFinancial Query

Original:

Transfer $500 from account 1234567890 to John Doe, routing number 021000021.

What your LLM receives:

Transfer $500 from account [ACCOUNTNUM] to [GIVENNAME] [SURNAME], routing number [ACCOUNTNUM].

Dollar amounts and transaction types pass through. Account numbers and names are redacted.

Link to section: Healthcare IntakeHealthcare Intake

Original:

Patient: Maria Garcia, DOB: 03/15/1985, SSN: 456-78-9012.
Complaint: persistent headache for 3 days, no prior history of migraines.

What your LLM receives:

Patient: [GIVENNAME] [SURNAME], DOB: [DATEOFBIRTH], SSN: [SOCIALNUM].
Complaint: persistent headache for 3 days, no prior history of migraines.

Medical symptoms and clinical details pass through because they are not personally identifiable on their own. Patient identifiers are redacted.

Link to section: Multilingual Detection ExamplesMultilingual Detection Examples

PII detection works across multiple languages automatically. No configuration changes are needed - the ML model handles different languages, scripts, and regional formats out of the box.

Spanish:

Me llamo Carlos Rodriguez, mi correo es carlos@ejemplo.com

Detected: First Name, Last Name, Email

French:

Mon numero de telephone est +33 1 23 45 67 89, j'habite a Paris

Detected: Phone Number, City

German:

Meine Adresse ist Berliner Str. 42, 10115 Berlin

Detected: Street Address, Zip Code, City

Mixed-language prompts are also handled naturally. If a prompt switches between English and another language, PII is detected in both languages within the same request.

Link to section: PricingPricing

ScenarioCost
No PII detectedFREE
PII detected (any action)$0.0001 per detection
  • PII detection is opt-in (disabled by default)
  • You are only charged when PII is actually found
  • The fee applies regardless of which action you choose (warn, block, or strip)

Link to section: LimitationsLimitations

  • Detection accuracy may vary by entity type - common entities like emails and phone numbers are detected with very high accuracy, while more ambiguous entities depend on surrounding context
  • Works best with clearly structured text; heavily obfuscated or encoded PII may not be detected
  • The ML model is continuously improved to expand language coverage and detection accuracy

Link to section: FAQFAQ

Link to section: Does PII detection work with non-English text?Does PII detection work with non-English text?

Yes. LockLLM's PII detection model is multilingual and detects personal information across multiple languages automatically. No additional configuration is needed - simply enable PII detection and it works regardless of the input language.

Link to section: Can I choose which entity types to detect?Can I choose which entity types to detect?

Currently, PII detection scans for all 21 supported entity types in every request. You cannot selectively enable or disable individual entity types. However, you can use the allow_with_warning action and filter the results in your application based on the entity_types reported in the response.

Link to section: What happens if PII detection is temporarily unavailable?What happens if PII detection is temporarily unavailable?

PII detection uses a fail-open design. If the detection service is temporarily unreachable, your request proceeds normally without PII scanning. This ensures your application's availability is never impacted. The response headers will indicate that PII detection was not applied.

Link to section: Does stripping PII affect response quality?Does stripping PII affect response quality?

In most cases, the LLM can understand the context of the request even with PII replaced by placeholders. For example, "Help [GIVENNAME] [SURNAME] update their [EMAIL]" is clear enough for the model to provide a helpful response. The placeholders preserve the semantic structure of the original text.

Link to section: Can I use PII detection in the scan endpoint?Can I use PII detection in the scan endpoint?

Yes. PII detection works in both the scan endpoint (/v1/scan) and proxy mode (/v1/proxy). In the scan endpoint, the pii_result object is included in the response body. In proxy mode, PII metadata is provided via response headers.

Link to section: Is detected PII stored or logged?Is detected PII stored or logged?

No. LockLLM does not store prompt content or detected PII values. Only metadata is logged (entity types, counts, and whether PII was detected). The actual personal information is processed in memory and immediately discarded.

Link to section: Does PII detection work with all providers?Does PII detection work with all providers?

Yes. PII detection works with all 17+ supported providers in proxy mode, and independently in the scan endpoint. It is applied before the request reaches any provider, so it works universally.

Updated 4 days ago