What your AI client sees when it's close to a limit.
Reference for developers integrating the Ansvar Gateway. Rate limits are abuse protection, not a product gate — they're sized so legitimate workloads never approach them. Every tier has a daily search quota; Free is set apart by its size (100 calls per day) and its single-source scope. Every limit fails with the same machine-readable JSON-RPC error.
The live enforcement contract
Four tiers, three mechanisms: a per-user token bucket on request rate, a per-tier concurrency cap, and a daily search budget on every tier — per seat on Free and Premium, pooled per organisation on Team and Company. These are the numbers production enforces today.
| Tier | Daily quota | Concurrency | Sustained rate | Burst capacity |
|---|---|---|---|---|
| Free | 100 search calls / day | 1 | 60 rpm (≈ 1 req / s) | 100 tokens |
| Premium | 5,000 search calls / day per seat | 4 | 600 rpm (≈ 10 req / s) | 1,000 tokens |
| Team | 50,000 search calls / day, pooled per organisation | 16 | 1,200 rpm (≈ 20 req / s) | 2,000 tokens |
| Company | 500,000 search calls / day, pooled per organisation | 64 | 2,400 rpm (≈ 40 req / s) | 4,000 tokens |
Hosted onboarding. Regulated teams on guided onboarding are provisioned at Team tier in the authentication layer, so they get Team-tier capabilities and Team-tier limits — including structured workflows.
Burst behaviour. Every tier's bucket holds roughly 100 seconds of full-throttle calls before refill matters (100 tokens at 1 req/s, 1,000 at 10 req/s, 2,000 at 20 req/s, 4,000 at 40 req/s). Once empty, refill happens continuously at the sustained rate.
Three limits, one error shape
Every limit on this page fails the same way: the tool call returns a JSON-RPC error with code -32000 and the machine-readable string cap_exceeded. The failure arrives inside the MCP session — there is no HTTP 429 on this path.
search calls per day on Free and 5,000 on Premium, each per seat; 50,000 on Team and 500,000 on Company, pooled across the organisation — every seat draws from one shared counter, and a refusal says so. Paid budgets are abuse ceilings sized well above legitimate workloads, not product gates.All tiersBurst capacity is generous (≈ 100 seconds of full-throttle calls) so well-behaved AI clients rarely see cap_exceeded. If yours is hitting a limit, the cause is almost always parallel fan-out without coordination — serialise calls or reduce concurrency.
Two machine-readable failures
Limits and tier gating surface as JSON-RPC errors, not as HTTP status codes. Match on the error code and the cap_exceeded string — never on the human-readable message text.
-32000 · cap_exceededAny limit on this page — the rate limiter, a daily search budget, or the concurrency cap. The error object carries JSON-RPC code -32000 with the machine-readable code "cap_exceeded". Back off and retry; a daily quota won't clear until the window resets.Caps & quotas-32601 · Method not foundThe tool exists, but not on your tier — for example search_case_law on Free, or workflow tools on Premium. Tier-hidden tools are also absent from tools/list, so a well-behaved AI client never calls them. Retrying never helps.Tier-hidden toolscap_exceeded is the machine-readable signal — pattern-match this rather than parsing the message. If you see -32601, check tools/list: the gateway only advertises the tools your tier can actually call.
What well-behaved client code looks like
Back off on cap_exceeded with bounded retries so a transient cap doesn't become a retry storm. Never retry -32601 — the tool isn't on your tier, and no amount of waiting changes that.
import time
MAX_RETRIES = 3
CAP_EXCEEDED = "cap_exceeded"
def call_with_backoff(call_tool, name: str, arguments: dict):
"""call_tool is your MCP client's tool-call function.
JSON-RPC errors surface differently per SDK (return value
or exception); normalise to (code, text, result) first."""
for attempt in range(MAX_RETRIES):
code, text, result = call_tool(name, arguments)
if code is None:
return result
if code == -32601:
# Tool is not on your tier. Retrying never helps.
raise PermissionError(f"{name} is not available on this tier")
if code == -32000 and CAP_EXCEEDED in text:
# Rate limiter, concurrency cap, or a daily quota.
# A daily quota will not clear until the window resets,
# so keep retries bounded.
time.sleep(2 ** attempt)
continue
raise RuntimeError(f"Gateway error {code}: {text}")
raise RuntimeError(f"{CAP_EXCEEDED} after {MAX_RETRIES} retries")
If your AI client hits cap_exceeded frequently, the right move is usually to reduce parallel fan-out (run searches sequentially within a workflow rather than firing all jurisdictions at once). If the sustained workload genuinely needs more capacity than your tier provides, talk to us about the Company tier.
Pick a tier without a sales call
A typical workflow run fans out to 10–50 gateway calls. Both rate and concurrency bind: parallel fan-out counts against your tier's concurrency cap before the rate limiter matters.
Counts are flat: every successful gateway tool call is one unit, regardless of how many downstream MCPs the call fans out to. On Premium and above, a search that hits 30 MCPs counts the same as a single get_provision. Free-tier search is single-source, and each call counts one of the 100 per day.
Shared-bucket enforcement
The open enforcement item is the per-worker loophole disclosed above: moving the rate limiter, daily quota, and concurrency counters to a shared, Redis-backed store so the nominal per-user numbers are exact rather than per-process. This page tracks the live contract — when enforcement changes, the table above changes with it.
If your workload is bumping against the current limits, get in touch — Company-tier capacity is contracted, and we'd rather size it with you than have you engineer around a cap.
Need different limits?
Compare tiers on the pricing page, or talk to us about a Company contract with custom capacity.