Cutting LLM API Costs by 40%: Integrating TOON Format with .NET Applications

Cut LLM API Costs by 40% with TOON in .NET

Trim tokens by 30-60% using TOON in .NET. Get encoder/decoder code, a quick migration plan, and real token benchmarks with OpenAI/Azure.

.NET Development Artificial Intelligence·By amarozka · November 13, 2025

Cut LLM API Costs by 40% with TOON in .NET

What if your LLM bill dropped by a third this week-without switching models or cutting features? That’s the promise of Token‑Oriented Object Notation (TOON): a compact, schema‑aware format that trims tokens before they ever hit the API. In one of my client apps we shaved 38–57% tokens on common prompts. Same logic, same model, fewer tokens.

This post shows you how to wire TOON into your .NET stack: what it is, how it beats JSON on token count, encoder/decoder code you can ship today, how to migrate existing OpenAI/Azure OpenAI calls, and how to benchmark the savings with real numbers.

TOON in one minute

TOON (Token‑Oriented Object Notation) is a compact text format that keeps large language models happy while sending fewer bytes and fewer tokens. Think of it as JSON with short field names, stable ordering, and tiny tags, plus an optional header that maps short keys to human‑readable names.

Why it helps:

  • Short keys → fewer sub‑tokens than long JSON property names.
  • Stable order → models can learn a fixed structure (lower prompt “noise”).
  • Tabular arrays → arrays of arrays beat arrays of objects on token count.
  • Enum indices → numbers over repeated strings.
  • Optional header → keep payload short while preserving meaning for logging.

A tiny example (user profile list):

JSON

[
  {"id": "u_42", "fullName": "Ada Byron", "role": "admin", "email": "ada@example.com"},
  {"id": "u_77", "fullName": "Linus T", "role": "editor", "email": "linus@example.com"}
]

TOON

{
  "H": {"0":"id","1":"fullName","2":"role","3":"email"},
  "T": [["u_42","Ada Byron",0,"ada@example.com"],["u_77","Linus T",1,"linus@example.com"]],
  "E": {"role":{"admin":0,"editor":1}}
}
  • H – header: column index → original field name (for audit/debug).
  • T – table: data as rows (arrays).
  • E – enums: repeated strings mapped to small integers.

In prompts, you can omit H and E and keep only T if you also tell the model the column order in the system message. For tools/functions, keep H to make your logs readable.

TOON vs JSON: token math that matters

On GPT‑style BPE tokenizers, field names like "fullName" split into several tokens, while short indices like 0 or one‑letter keys are cheap. In my tests (see the Benchmark section), TOON reduces token use by 30–60% depending on how nested and repetitive your JSON is.

Rules of thumb:

  • Arrays of objects → big win when converted to tables (T).
  • Deeply nested structures → map to short keys with a schema → medium win.
  • Small, flat objects with few keys → savings are small; keep JSON.

Side‑by‑side: simple object

JSON

{"fullName":"Ada Byron","email":"ada@example.com","role":"admin"}

TOON (schema‑ordered)

{"0":"Ada Byron","1":"ada@example.com","2":0}

With a header {"0":"fullName","1":"email","2":"role"} and enum {"role":{"admin":0}} in context.

Side‑by‑side: array of 1,000 rows

  • JSON (objects): ~19–22 tokens per row (keys keep repeating).
  • TOON (table): ~8–11 tokens per row (no keys, many values repeat as ints).

Multiply that by thousands of rows and you get real money back.

The TOON mini‑spec used in this post

This isn’t a formal standard; it’s a practical profile that works with today’s LLMs.

  • Objects as ordered arrays using numeric string keys: "0", "1", … (cheap tokens).
  • Tables: { "H": { index: name }, "T": [ [row], ... ] }.
  • Enums: { "E": { fieldName: { stringValue: smallInt } } }.
  • Comments: none (keep it minimal). For docs, keep a schema in your code.
  • Transport: still plain JSON text so you can send it in messages.content or tool input. The trick is shorter text → fewer tokens.

C# implementation: TOON encoder/decoder

Below is a small, dependency‑free encoder/decoder you can drop into your project. It supports:

  • Object compaction with ordered keys.
  • Table packing for uniform lists.
  • Enum mapping.
  • Round‑trip back to POCOs.

Tip: keep the schema close to your DTOs, so your team sees the meaning without hunting in docs.

Schema models

public sealed class ToonSchema
{
    public string[] Columns { get; init; } = Array.Empty<string>();
    // Field -> enum map (string->int). Example: role: {admin:0, editor:1}
    public Dictionary<string, Dictionary<string, int>> Enums { get; init; } = new();
}

public sealed class ToonTable
{
    public Dictionary<string, string> H { get; init; } = new(); // header: index->name
    public List<List<object?>> T { get; init; } = new();       // rows
    public Dictionary<string, Dictionary<string, int>> E { get; init; } = new();
}

Encoder: objects → TOON

using System.Text.Json;
using System.Text.Json.Nodes;

public static class ToonEncoder
{
    public static JsonNode EncodeObject(JsonObject json, ToonSchema schema)
    {
        var obj = new JsonObject();
        for (int i = 0; i < schema.Columns.Length; i++)
        {
            var name = schema.Columns[i];
            json.TryGetPropertyValue(name, out var value);
            obj[i.ToString()] = MapEnumIfNeeded(name, value, schema);
        }
        return obj;
    }

    public static JsonNode EncodeTable(JsonArray items, ToonSchema schema)
    {
        var table = new ToonTable();
        for (int i = 0; i < schema.Columns.Length; i++)
            table.H[i.ToString()] = schema.Columns[i];
        table.E = schema.Enums;

        foreach (var node in items)
        {
            var row = new List<object?>();
            var obj = node as JsonObject ?? throw new InvalidOperationException("Expected objects");
            foreach (var col in schema.Columns)
            {
                obj.TryGetPropertyValue(col, out var value);
                row.Add(Unwrap(MapEnumIfNeeded(col, value, schema)));
            }
            table.T.Add(row);
        }

        return JsonNode.Parse(JsonSerializer.Serialize(table))!;
    }

    private static JsonNode? MapEnumIfNeeded(string field, JsonNode? value, ToonSchema schema)
    {
        if (value is null) return null;
        if (schema.Enums.TryGetValue(field, out var map))
        {
            var s = value.GetValue<string>();
            if (map.TryGetValue(s, out var idx)) return JsonValue.Create(idx);
        }
        return value;
    }

    private static object? Unwrap(JsonNode? node)
    {
        return node switch
        {
            null => null,
            JsonValue v => v.TryGetValue(out string? s) ? s : v.ToString(),
            _ => node.ToJsonString()
        };
    }
}

Decoder: TOON → objects

public static class ToonDecoder
{
    public static JsonObject DecodeObject(JsonObject toon, ToonSchema schema)
    {
        var obj = new JsonObject();
        for (int i = 0; i < schema.Columns.Length; i++)
        {
            var key = i.ToString();
            toon.TryGetPropertyValue(key, out var value);
            obj[schema.Columns[i]] = UnmapEnumIfNeeded(schema.Columns[i], value, schema);
        }
        return obj;
    }

    public static JsonArray DecodeTable(JsonObject toon, ToonSchema schema)
    {
        var rows = (JsonArray)toon["T"]!;
        var outArr = new JsonArray();
        foreach (var r in rows)
        {
            var row = (JsonArray)r!;
            var obj = new JsonObject();
            for (int i = 0; i < schema.Columns.Length; i++)
            {
                var field = schema.Columns[i];
                var cell = row[i];
                obj[field] = UnmapEnumIfNeeded(field, cell, schema);
            }
            outArr.Add(obj);
        }
        return outArr;
    }

    private static JsonNode? UnmapEnumIfNeeded(string field, JsonNode? value, ToonSchema schema)
    {
        if (value is null) return null;
        if (schema.Enums.TryGetValue(field, out var map))
        {
            // reverse lookup: int -> string
            if (value is JsonValue v && v.TryGetValue(out int idx))
            {
                foreach (var kv in map)
                    if (kv.Value == idx) return JsonValue.Create(kv.Key);
            }
        }
        return value;
    }
}

Usage example

var schema = new ToonSchema
{
    Columns = new[] { "id", "fullName", "role", "email" },
    Enums = new()
    {
        ["role"] = new() { ["admin"] = 0, ["editor"] = 1, ["viewer"] = 2 }
    }
};

var users = JsonNode.Parse(@"[
  {\"id\":\"u_42\",\"fullName\":\"Ada Byron\",\"role\":\"admin\",\"email\":\"ada@example.com\"},
  {\"id\":\"u_77\",\"fullName\":\"Linus T\",\"role\":\"editor\",\"email\":\"linus@example.com\"}
]")! as JsonArray;

var toonTable = ToonEncoder.EncodeTable(users!, schema);
Console.WriteLine(toonTable.ToJsonString());

var back = ToonDecoder.DecodeTable((JsonObject)toonTable!, schema);
Console.WriteLine(back.ToJsonString(new JsonSerializerOptions { WriteIndented = true }));

Converting existing OpenAI/Azure OpenAI calls to TOON

You don’t need special endpoints. You just send shorter content and give the model the column map once.

1) Chat Completions (OpenAI client or raw HttpClient)

System message (one‑time):

You will receive TOON tables. Column order for users is:
0=id, 1=fullName, 2=role, 3=email. For role: 0=admin, 1=editor, 2=viewer.
When you reply with structured data, use the same order.

C# (OpenAI REST, simplified):

using System.Net.Http.Json;

var payload = new
{
    model = "gpt-4o-mini",
    messages = new[]
    {
        new { role = "system", content = "You will receive TOON tables. 0=id,1=fullName,2=role,3=email. role:0=admin,1=editor,2=viewer." },
        new { role = "user", content = toonTable.ToJsonString() }
    }
};

using var http = new HttpClient();
http.DefaultRequestHeaders.Add("Authorization", $"Bearer {apiKey}");
var res = await http.PostAsJsonAsync("https://api.openai.com/v1/chat/completions", payload);
var json = await res.Content.ReadAsStringAsync();

2) Azure OpenAI (Chat Completions)

var endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!;
var deployment = "gpt-4o-mini"; // your deployed name

var uri = $"{endpoint}/openai/deployments/{deployment}/chat/completions?api-version=2024-08-01-preview";

var payload = new
{
    messages = new[]
    {
        new { role = "system", content = "TOON users: 0=id,1=fullName,2=role,3=email; role:0=admin,1=editor,2=viewer." },
        new { role = "user", content = toonTable.ToJsonString() }
    }
};

using var http = new HttpClient();
http.DefaultRequestHeaders.Add("api-key", azureOpenAiKey);
var res = await http.PostAsJsonAsync(uri, payload);

3) Function/Tool calling

If you rely on tool calling, keep the tool’s JSON schema as usual but pass TOON inside a single string field (e.g., table), or define numeric string keys in the schema ("0", "1", …). The main win still comes from shorter payloads in messages.

Benchmarking: prove the savings

You can count tokens with [SharpToken] or similar libraries. If you can’t add a package, a quick estimate is 1 token ≈ 4 chars for English text. Use the real tokenizer for decisions, though.

Token counter helper

// dotnet add package SharpToken
using SharpToken;

public static class TokenCounter
{
    public static int Count(string text, string encoding = "o200k_base")
    {
        var enc = GptEncoding.GetEncoding(encoding);
        return enc.Encode(text).Count;
    }
}

Micro‑bench

var jsonText = File.ReadAllText("users.json");
var json = JsonNode.Parse(jsonText)!;

// JSON → TOON
var schema = new ToonSchema
{
    Columns = new[] { "id", "fullName", "role", "email" },
    Enums = new() { ["role"] = new() { ["admin"] = 0, ["editor"] = 1, ["viewer"] = 2 } }
};

var toon = ToonEncoder.EncodeTable((JsonArray)json!, schema).ToJsonString();

var jsonTokens = TokenCounter.Count(jsonText);
var toonTokens = TokenCounter.Count(toon);

Console.WriteLine($"JSON tokens: {jsonTokens}");
Console.WriteLine($"TOON tokens: {toonTokens}");
Console.WriteLine($"Savings: {(jsonTokens - toonTokens) * 100.0 / jsonTokens:F1}%");

Sample results (real runs from my project)

DatasetRowsJSON tokensTOON tokensReduction
Users (id, name, role, email)1,00019,9809,11054.4%
Orders (10 fields, 4 enums)2,500212,340121,06043.0%
Settings (flat, 12 keys)11681529.5%

Takeaway: big lists and repeated strings are where TOON shines.

Best practices for tabular data

  • Prefer tables for uniform lists. Arrays of arrays cost less than arrays of objects.
  • Sort columns by high‑entropy first (IDs, text) then low‑entropy (enums, flags). This sometimes improves compression in tokenizers.
  • Use enums for repeated strings. Keep the map small ints → strings in your system prompt or header.
  • Keep numbers as numbers. Don’t wrap everything as strings.
  • Avoid deep nesting. Flatten before you send. Use multiple tables if needed.
  • Cap row count. Send only what the model needs for the current step.

Tabular packing helper (C#)

public static class ToonTableBuilder
{
    public static JsonNode Build<T>(IEnumerable<T> items, ToonSchema schema)
    {
        var arr = new JsonArray();
        foreach (var item in items)
        {
            var json = JsonSerializer.SerializeToNode(item) as JsonObject
                ?? throw new InvalidOperationException();
            arr.Add(json);
        }
        return ToonEncoder.EncodeTable(arr, schema);
    }
}

When to use TOON vs JSON

Use TOON when:

  • You send large lists or repeat the same keys many times.
  • You have categorical fields (status, role, type) that repeat.
  • You control both sides of the prompt (you can add a line in the system message with the column map).

Stick to JSON when:

  • You send small, one‑off objects.
  • You need human‑readable logs without extra tools.
  • Third‑party tools strictly expect classic JSON shapes.

Mixed mode: keep your outer envelope as JSON (messages, tool schema) but put the heavy parts inside as TOON.

Migration plan: JSON → TOON in a day

  1. Pick targets: find the top 3 largest prompt payloads in logs.
  2. Draft schemas: list column order and enums.
  3. Drop the encoder: add ToonSchema, ToonEncoder, ToonDecoder.
  4. Patch prompts: add a one‑line legend to the system message.
  5. Run a bench: compare tokens and latency.
  6. Gate by flag: env switch to roll back if needed.
  7. Ship: monitor cost and quality.

Sample legend you can reuse:

Data is TOON format.
Users table columns: 0=id,1=fullName,2=role,3=email.
Enums: role: 0=admin,1=editor,2=viewer.
Return outputs with the same order.

FAQ: common questions before rollout

Will the model understand TOON?

Yes. Give it a clear legend once. For tool outputs, keep the same order so you can parse reliably.

What about validation?

Validate on your side before sending. If you need strict types back, ask the model to return TOON as well, then decode.

Can I mix JSON and TOON?

Yes. Often the best setup is JSON shell + TOON for the heavy fields or lists.

Do I need to compress?

Compression reduces bytes, not tokens. Tokens depend on text chunks. TOON helps because it changes the text layout itself.

Which models benefit most?

All BPE‑based models. Gains vary with your data shape.

Any pitfalls?

Don’t hide meaning. Keep a small header or legend somewhere so future you knows what "2" means.

Conclusion: cut tokens, not features

You don’t need exotic tricks to lower your LLM bill. A small format tweak-short keys, ordered columns, tiny enums-drops token use by 30–60% on the payloads that matter. You saw a ready encoder/decoder in C#, an easy way to slot it into OpenAI/Azure OpenAI, and a bench to prove the gain. Try it on one big prompt today and post your results-how much did you save?

Leave a Reply

Your email address will not be published. Required fields are marked *