
Quanta is an OpenAI-compatible gateway. Point your base URL at it, keep using your own provider key, and every request is compressed on the way to the model. No SDK changes, no rewrites.
Use your existing OpenAI client. Set the base URL to the gateway and keep your own key.
from openai import OpenAI
client = OpenAI(
base_url="https://quanta.ryshe.com/api/v1",
api_key="YOUR_PROVIDER_KEY", # your own key, never stored
)
resp = client.chat.completions.create(
model="gpt-5.5",
messages=[...],
)Every response carries the savings on that call in its headers.
x-quanta-tokens-before: 8410 x-quanta-tokens-after: 3120 x-quanta-saved-pct: 62.9 x-quanta-mode: inline
This dry-run call returns the compression report without forwarding to a model, so you can see it work right away. Paste it into a terminal.
curl -s https://quanta.ryshe.com/api/v1/chat/completions \
-H "content-type: application/json" \
-d '{"model":"gpt-5.5","messages":[
{"role":"system","content":"You are helpful.\n\n\nYou are helpful.\nBe concise."},
{"role":"user","content":"Summarize this."}
]}'Change one line in any OpenAI-compatible SDK. Nothing else changes.
Run the gateway next to your app and route traffic through it. Engine layer is open-standards.
Compress tool outputs and context for agent frameworks over MCP.