What is TOON and how does it help LLMs?

TOON (Token-Oriented Object Notation) is a compact format for structured data designed to cut down on the number of tokens needed in LLM prompts. By removing redundant JSON syntax like braces, quotes, and repeated keys, TOON delivers the same information with roughly 30-60% fewer tokens.

How much token reduction can I expect with TOON compared to JSON?

TOON typically saves 30-60% of tokens compared to equivalent JSON. On a 500-row dataset, JSON used 11,842 tokens while TOON used only 4,617 - a 61% reduction. This directly translates to lower API bills and faster responses.

What other formats can LLMs parse besides JSON and TOON?

LLMs can handle many text formats. YAML is more human-readable but still verbose. CSV is extremely compact for flat tables but cannot express nested structures. TRON (Token-Reduced Object Notation) compresses JSON while staying JSON-valid. XML is universal but the most token-heavy. Each format has trade-offs in readability, expressiveness, and token cost.

How do I use TOON with LockLLM's API?

LockLLM supports TOON out of the box as part of its prompt compression feature. Include the header X-LockLLM-Compression: toon in your API request. LockLLM will compress any JSON input into TOON before sending it to the model, saving tokens at no extra cost.

TOON vs JSON: Save LLM Tokens With This New Format

Large language model prompts routinely embed structured data for context. Customer records, product catalogs, RAG retrieval results, agent tool outputs - all of it typically arrives as JSON. But JSON was designed for machines parsing network responses, not for AI models parsing natural language prompts. Every brace, bracket, colon, and quote burns a token, and those tokens add up fast. A 500-row customer table in JSON can easily consume over 11,000 tokens before the model even reads a single instruction.

This matters because LLM API pricing is per-token. With flagship models charging $15-75 per million output tokens, shaving unnecessary syntax from your prompts isn't a micro-optimization. It's a direct line item on your infrastructure bill. A newer format called TOON (Token-Oriented Object Notation) tackles this head-on, encoding the same data in a leaner tabular form that typically saves 30-60% of tokens while actually improving model accuracy on structured extraction tasks.

Link to section: Why JSON Is Expensive Inside LLM PromptsWhy JSON Is Expensive Inside LLM Prompts

Before diving into alternatives, it helps to understand why JSON is so costly when fed to a language model.

Link to section: How Tokenizers Handle JSON SyntaxHow Tokenizers Handle JSON Syntax

Modern LLMs use Byte Pair Encoding (BPE) tokenizers like OpenAI's tiktoken or Google's SentencePiece. These tokenizers split text into subword units, and JSON's structural characters create significant overhead. Quotes, colons, commas, and braces each become separate subwords (or get merged with adjacent characters in unpredictable ways). A simple key-value pair like "name": "Alice" doesn't tokenize as two meaningful tokens. The tokenizer splits it into fragments: the opening quote, the key string, the closing quote, the colon, a space, another opening quote, the value, and another closing quote.

When that pattern repeats across hundreds of rows, every field name gets re-tokenized from scratch each time. In a 500-row array of objects with five fields, the string "name" appears 500 times - along with 500 colons, 1,000 quotes just for that one key, and thousands more braces and brackets for the structure itself.

Link to section: The Numbers Tell the StoryThe Numbers Tell the Story

Real-world benchmarks paint a clear picture. A 500-row customer dataset in pretty-printed JSON consumes 11,842 tokens. The same data in minified JSON still uses around 8,000 tokens. Most of those tokens carry zero information for the model - they're pure syntactic overhead that exists only because JSON parsers need them.

For perspective, that's equivalent to roughly 4-5 pages of English prose worth of tokens, spent entirely on curly braces and repeated field names. At GPT-4o pricing, processing that JSON blob in 1,000 daily requests costs roughly $1,740 per month just for the structured context portion of your prompts.

Link to section: What is TOON (Token-Oriented Object Notation)?What is TOON (Token-Oriented Object Notation)?

TOON is a compact, human-readable encoding of the JSON data model created by Johann Schopplich and released in late 2025. It was purpose-built for LLM prompts, preserving complete data fidelity while stripping out every token that doesn't carry actual information.

The core insight behind TOON is simple: when you have an array of objects that share the same fields, declaring those fields once is enough. The model doesn't need to see "name", "age", and "email" repeated for every single record.

Link to section: How TOON Syntax WorksHow TOON Syntax Works

TOON combines two layout styles depending on the data shape:

For uniform arrays (the most common case in LLM prompts), TOON uses a CSV-like tabular layout with a schema header:

users[3]{name,age,email}:
Alice,30,alice@example.com
Bob,25,bob@example.com
Charlie,35,charlie@example.com

Compare that to the equivalent JSON:

{
  "users": [
    {"name": "Alice", "age": 30, "email": "[email protected]"},
    {"name": "Bob", "age": 25, "email": "[email protected]"},
    {"name": "Charlie", "age": 35, "email": "[email protected]"}
  ]
}

The TOON version declares users[3]{name,age,email}: once at the top - specifying the array name, item count, and field names in a single header line. Each subsequent line contains just the comma-separated values. No braces, no quotes (unless a value contains special characters), no repeated keys.

For nested objects, TOON uses YAML-style indentation:

config:
  model: gpt-4o
  temperature: 0.7
  max_tokens: 2000

This hybrid approach means TOON handles both flat tabular data and deeply nested configuration objects efficiently.

Link to section: Why This Works for LLMsWhy This Works for LLMs

You might wonder whether stripping structure confuses the model. It doesn't. Benchmarks across four major model families (Claude, GPT, Gemini, and Grok) show that TOON actually improves structured data extraction accuracy compared to JSON. On a suite of 209 data retrieval questions, TOON achieved 73.9% accuracy versus JSON's 69.7% while using 39.6% fewer tokens.

The likely reason: TOON's explicit field alignment and declared row counts make data relationships clearer for models to parse. Instead of navigating nested brackets to find a value, the model reads a clean header and then scans aligned rows - similar to how humans read spreadsheets faster than nested code.

Link to section: Benchmarks: TOON vs JSON at ScaleBenchmarks: TOON vs JSON at Scale

Let's look at hard numbers across multiple scenarios.

Link to section: Token Efficiency ComparisonToken Efficiency Comparison

On a 500-row dataset, here's how each format performed:

JSON (pretty-printed) - 11,842 tokens, baseline, 69.7% extraction accuracy
JSON (minified) - ~8,000 tokens, ~32% savings, 70.7% accuracy
YAML - ~7,200 tokens, ~39% savings, 69.0% accuracy
TRON - ~5,900 tokens, ~50% savings
TOON - 4,617 tokens, 61% savings, 73.9% accuracy
CSV - ~3,800 tokens, ~68% savings

TOON sits in a sweet spot: nearly as compact as raw CSV, but with full support for nested structures, explicit schemas, and named fields that CSV can't provide.

Link to section: Model-Specific Accuracy ResultsModel-Specific Accuracy Results

Testing 209 structured extraction questions across different models revealed that TOON either matched or beat JSON on every model tested:

GPT-5 Nano: 99.4% accuracy with TOON vs 92.5% with JSON (table reconstruction)
Claude Haiku: 59.8% with TOON vs 57.4% with JSON
Gemini Flash: Comparable accuracy, 40%+ token savings
Grok: Comparable accuracy, consistent token savings

The GPT-5 Nano result is particularly striking - a nearly 7-point accuracy gain while using roughly half the tokens. TOON's tabular structure appears to give smaller, more efficient models a clear parsing advantage.

Link to section: Cost Impact at Production ScaleCost Impact at Production Scale

For a production RAG pipeline processing 1,000 prompts daily with embedded structured data:

JSON - $1,740/month, $20,880/year
TOON - $680/month, $8,160/year - saving $12,720 annually

These numbers assume GPT-4o pricing. With more expensive models like Claude Opus or GPT-5 Pro, the absolute savings multiply proportionally.

Link to section: Context Window GainsContext Window Gains

Token savings also translate to fitting more data into each request. With a 128K token context window:

JSON: ~17,000 records
TOON: ~70,000 records (4x more)
CSV: ~85,000 records

That 4x improvement means you can feed dramatically more context to your model per request, which is particularly valuable for RAG systems, analytics pipelines, and multi-agent workflows where context richness directly impacts output quality.

Link to section: How Does TOON Compare to Other Formats?How Does TOON Compare to Other Formats?

TOON isn't the only alternative to JSON for LLM prompts. Each format has distinct trade-offs, and the right choice depends on your data shape and use case.

Link to section: YAML: Human-Readable but Still VerboseYAML: Human-Readable but Still Verbose

YAML replaces braces with indentation, which makes it more pleasant to read and write by hand. For LLM prompts, YAML typically saves around 39% of tokens compared to pretty-printed JSON - a meaningful improvement, but roughly half what TOON achieves.

YAML's bigger issue for LLM use is ambiguity. The format has multiple ways to represent strings, and its escape sequences can trip up both models and developers. Values like yes, no, on, off, and bare numbers get auto-typed in surprising ways. For a data format fed to AI models, this unpredictability introduces unnecessary risk.

users:
  - name: Alice
    age: 30
    email: [email protected]
  - name: Bob
    age: 25
    email: [email protected]

YAML works well for configuration files and small nested objects. For large arrays of structured data - the most common use case in LLM prompts - TOON is substantially more efficient.

Link to section: CSV/TSV: Maximum Compactness, Minimum StructureCSV/TSV: Maximum Compactness, Minimum Structure

CSV is the most token-efficient format for flat tabular data, beating even TOON by a small margin. If your data is a simple table with no nesting, CSV is hard to beat on raw token count.

name,age,email
Alice,30,alice@example.com
Bob,25,bob@example.com

The catch is that CSV can't represent nested objects, mixed types, or hierarchical relationships. It also lacks an explicit schema declaration - the header row is just another row of text, and the model has to infer that the first line contains field names. For simple flat data this works fine, but for anything with depth or mixed structure, CSV breaks down.

TSV (tab-separated values) performs similarly to CSV in token count but uses tabs instead of commas as delimiters, which can reduce ambiguity when values contain commas.

Link to section: TRON: JSON-Compatible CompressionTRON: JSON-Compatible Compression

TRON (Token-Reduced Object Notation) takes a different approach than TOON. Instead of inventing new syntax, TRON extends JSON with "class definitions" that let you define an object schema once and reference it for each record. The output remains valid JSON (or close to it), which means existing JSON parsers can still process TRON data with minimal modification.

TRON typically achieves 20-40% token reduction compared to standard JSON. That's less than TOON's 30-60%, but TRON has a distinct advantage: backward compatibility. If your pipeline already relies heavily on JSON parsers, validators, or schemas, TRON lets you compress tokens without rewriting your parsing logic.

TRON outperforms TOON in one specific scenario: deeply nested uniform structures common in RAG pipelines, where its class definition approach achieves up to 38% savings while maintaining JSON compatibility.

Link to section: XML: Universal but Token-HeavyXML: Universal but Token-Heavy

XML is the elephant in the room. It's the most widely understood structured format, and some LLMs (particularly those trained on web data) handle XML surprisingly well. But XML's verbosity is legendary - closing tags alone double the structural overhead compared to JSON.

In benchmarks, XML used 5,167 tokens on the same dataset where TOON used 2,744 - nearly double. XML also scored the lowest accuracy (67.1%) among tested formats for structured extraction tasks.

Unless your LLM application specifically requires XML for compatibility reasons, there's no efficiency argument for using it in prompts.

Link to section: Markdown Tables: The Overlooked Middle GroundMarkdown Tables: The Overlooked Middle Ground

Simple markdown tables deserve a mention as an informal alternative:

| name | age | email |
|------|-----|-------|
| Alice | 30 | [email protected] |
| Bob | 25 | [email protected] |

Markdown tables are reasonably compact and highly readable to both humans and LLMs. They work well for small datasets embedded in natural language prompts. Their main limitation is the same as CSV: no support for nesting, and the pipe-character syntax adds some overhead that TOON avoids.

Link to section: Which Format Should You Pick?Which Format Should You Pick?

The right format depends on what your data looks like:

Flat table with simple values - CSV or TOON for maximum token savings
Array of uniform objects - TOON, since the schema header and tabular rows eliminate repetition
Deeply nested configuration - YAML or TOON, both handle indentation-based nesting well
Mixed nested and tabular data - TOON, because it handles both layout styles in one format
Must stay JSON-compatible - TRON, which compresses within the JSON spec
Small inline data in prose - Markdown tables or plain text, readable and good enough for small payloads

Link to section: Beyond Format Changes: ML-Based Prompt CompressionBeyond Format Changes: ML-Based Prompt Compression

Switching data formats is a straightforward optimization, but it only addresses structured data. For natural language portions of prompts - system instructions, few-shot examples, conversation history - a different class of techniques exists: ML-based prompt compression.

Link to section: LLMLingua and LLMLingua-2LLMLingua and LLMLingua-2

Microsoft's LLMLingua family represents the state of the art in prompt compression. LLMLingua uses a small language model (like GPT-2 or LLaMA-7B) to evaluate the "importance" of each token in your prompt based on information entropy. Unimportant tokens - filler words, redundant phrasing, repetitive examples - get pruned, while critical information stays intact.

LLMLingua achieves up to 20x compression with minimal performance loss. The follow-up, LLMLingua-2, uses a BERT-level encoder trained via data distillation from GPT-4, delivering 3-6x faster compression speeds while handling out-of-domain data more robustly.

In practical terms, light compression (2-3x) delivers roughly 80% cost reduction with less than 5% accuracy impact. More aggressive compression (5-7x) achieves 85-90% cost reduction, acceptable for many production applications.

The key distinction from format-based compression: LLMLingua works on natural language text, not just structured data. It's complementary to TOON - you can use TOON for your JSON payloads and LLMLingua for your system prompts and examples.

Link to section: Gist TokensGist Tokens

A more experimental approach, Gist tokens involve fine-tuning a language model to compress entire prompts into a small set of special tokens that encode the same semantic information. Research on LLaMA-7B demonstrated up to 26x compression of prompts with minimal quality loss, plus a 40% reduction in compute (FLOPs).

The downside is that Gist tokens require fine-tuning the specific model performing inference, which makes them impractical for API-based LLM usage where you don't control the model weights.

Link to section: Selective ContextSelective Context

SelectiveContext evaluates token importance using information entropy from a causal language model, then drops low-information tokens. It's similar in spirit to LLMLingua but uses a different selection strategy. The approach works well for removing redundancy in long context windows.

Link to section: Where Format Compression FitsWhere Format Compression Fits

These techniques form a spectrum of options:

Format compression (TOON, TRON) - Lossless, works on structured data, zero accuracy impact, no model dependency
Extractive compression (LLMLingua, SelectiveContext) - Near-lossless, works on natural language, minimal accuracy impact, requires a small helper model
Learned compression (Gist tokens, AutoCompressor) - High compression ratios, requires model fine-tuning, not usable with API-based LLMs

For most production applications, format compression is the lowest-hanging fruit: zero risk, zero accuracy loss, immediate cost savings.

Link to section: Developer Tools and ImplementationDeveloper Tools and Implementation

Adopting TOON doesn't require rewriting your application. Libraries exist for every major language, and the conversion is a one-line operation.

Link to section: JavaScript / TypeScriptJavaScript / TypeScript

import { encode } from '@toon-format/toon';

const data = {
  users: [
    { name: "Alice", age: 30, email: "[email protected]" },
    { name: "Bob", age: 25, email: "[email protected]" }
  ]
};

const toonText = encode(data);
// Output:
// users[2]{name,age,email}:
// Alice,30,[email protected]
// Bob,25,[email protected]

Link to section: PythonPython

from toon import encode

data = {
    "users": [
        {"name": "Alice", "age": 30, "email": "[email protected]"},
        {"name": "Bob", "age": 25, "email": "[email protected]"}
    ]
}

toon_text = encode(data)
# Same compact output as above

Libraries are also available for Go, Rust, .NET, and other languages. The encoding is deterministic and lossless - decode(encode(data)) always returns the original data structure.

Link to section: When to Convert and When Not ToWhen to Convert and When Not To

TOON's biggest gains come from uniform arrays of objects - the exact data shape that appears in RAG retrieval results, database query outputs, API responses, and tool call results. If your prompt embeds a list of products, search results, customer records, or log entries, TOON will deliver substantial savings.

For single nested objects (like a configuration block or a user profile), the savings are smaller since there's no array repetition to eliminate. TOON still helps by removing quotes and braces, but the percentage improvement is modest.

Don't convert data that isn't going into an LLM prompt. TOON is specifically designed for the prompt-to-model pipeline. Your APIs, databases, and inter-service communication should continue using JSON, Protocol Buffers, or whatever format they already use.

Link to section: Common Pitfalls When Optimizing Prompt FormatsCommon Pitfalls When Optimizing Prompt Formats

Link to section: Pitfall 1: Compressing Output Format InstructionsPitfall 1: Compressing Output Format Instructions

If your prompt instructs the LLM to return JSON, don't also convert those format instructions to TOON. The model needs to understand what output format you expect, and mixing TOON input with JSON output instructions can cause confusion. Keep your output schema instructions in the format you actually want the model to produce.

Link to section: Pitfall 2: Ignoring Structured Data in Favor of Natural Language CompressionPitfall 2: Ignoring Structured Data in Favor of Natural Language Compression

Developers sometimes jump to aggressive natural language compression (pruning tokens from system prompts) while leaving massive JSON blobs untouched. Format compression is safer, lossless, and often delivers larger absolute token savings than pruning a few words from instructions.

Link to section: Pitfall 3: Assuming All Models Handle Formats EquallyPitfall 3: Assuming All Models Handle Formats Equally

While TOON improves accuracy on average, smaller or older models may struggle with unfamiliar formats. Always benchmark your specific model and use case. The published benchmarks cover GPT-5 Nano, Claude Haiku, Gemini Flash, and Grok - if you're using a different model, run your own accuracy tests before deploying.

Link to section: Pitfall 4: Forgetting About Streaming and CachingPitfall 4: Forgetting About Streaming and Caching

If your application caches LLM responses keyed by prompt hash, switching to TOON changes your cache keys. Plan for a cache warm-up period. For streaming responses, TOON input works identically to JSON input - the output stream is unaffected by the input format.

Link to section: Built-In TOON Support With LockLLMBuilt-In TOON Support With LockLLM

LockLLM's prompt compression feature includes native TOON support as a free, automatic compression option. Instead of adding a library dependency and converting data yourself, you can let LockLLM handle the conversion at the proxy layer.

To enable it, add a single header to your API request:

curl -X POST https://api.lockllm.com/v1/proxy/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-LockLLM-Compression: toon" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "user", "content": "{\"users\":[{\"name\":\"Alice\",\"age\":30}]}"}
    ]
  }'

LockLLM detects JSON content in your prompt, converts it to TOON, and forwards the compressed version to the model. The response includes compression metadata so you can track exactly how many tokens you saved. Since TOON is lossless, the model's answers remain identical - you're just paying for fewer tokens to deliver the same information.

For even deeper compression on mixed prompts (JSON data combined with natural language), LockLLM also offers ML-based compact compression and a combined mode that applies TOON first, then runs compact compression on the result. Check our guide to reducing AI costs for more strategies beyond format optimization.

Link to section: Key TakeawaysKey Takeaways

TOON saves 30-60% of tokens on structured data vs JSON, with benchmarks showing up to 75% reduction on large datasets. A 500-row table drops from 11,842 tokens (JSON) to 4,617 tokens (TOON).
Accuracy improves, not just cost: TOON scored 73.9% accuracy on structured extraction tasks vs JSON's 69.7% across four major model families. GPT-5 Nano hit 99.4% accuracy with TOON vs 92.5% with JSON.
Choose the right format for your data: TOON excels at uniform arrays. CSV beats it for pure flat tables. TRON is best when you need JSON compatibility. YAML works for human-edited configs.
Format compression complements ML compression: Use TOON for structured data and techniques like LLMLingua for natural language portions. They address different parts of the prompt.
Production cost impact is real: At 1,000 daily requests, switching from JSON to TOON can save over $12,000 annually on a single pipeline - before accounting for the latency improvements from shorter prompts.
Integration is minimal: One-line library calls in JavaScript, Python, Go, and Rust. Or use LockLLM's built-in TOON compression with a single HTTP header.

The gap between JSON's token cost and what's actually necessary for LLM comprehension is large and, for most production applications, entirely avoidable. Whether you adopt TOON directly through its open-source libraries or let LockLLM's proxy handle the conversion automatically, the savings are immediate and the accuracy trade-off is nonexistent - or even positive. For teams running structured data through LLM pipelines at scale, optimizing your data format is one of the highest-ROI changes you can make today.