Get started

Cut token usage in one line.

Quanta is an OpenAI-compatible gateway. Point your base URL at it, keep using your own provider key, and every request is compressed on the way to the model. No SDK changes, no rewrites.

1. Change your base URL

Use your existing OpenAI client. Set the base URL to the gateway and keep your own key.

from openai import OpenAI

client = OpenAI(
    base_url="https://quanta.ryshe.com/api/v1",
    api_key="YOUR_PROVIDER_KEY",   # your own key, never stored
)

resp = client.chat.completions.create(
    model="gpt-5.5",
    messages=[...],
)

Python · the only change is base_url

2. Read the savings

Every response carries the savings on that call in its headers.

x-quanta-tokens-before: 8410
x-quanta-tokens-after:  3120
x-quanta-saved-pct:     62.9
x-quanta-mode:          inline

Response headers added by the gateway

Try it now, no key required

This dry-run call returns the compression report without forwarding to a model, so you can see it work right away. Paste it into a terminal.

curl -s https://quanta.ryshe.com/api/v1/chat/completions \
  -H "content-type: application/json" \
  -d '{"model":"gpt-5.5","messages":[
    {"role":"system","content":"You are helpful.\n\n\nYou are helpful.\nBe concise."},
    {"role":"user","content":"Summarize this."}
  ]}'

Dry-run · returns the measured compression report

Three ways to adopt

Live

Drop-in base URL

Change one line in any OpenAI-compatible SDK. Nothing else changes.

Self-host

Local proxy

Run the gateway next to your app and route traffic through it. Engine layer is open-standards.

Coming soon

MCP server

Compress tool outputs and context for agent frameworks over MCP.

Accounts, usage metering, and a savings dashboard are in private beta. For now the gateway works with your own provider key and reports savings in headers. The compression engine is built to wrap open-standards and open-source components, which are credited in our notices.

See pricing Request early access