Rate limits & quotas

What your AI client sees when it's close to a limit.

Reference for developers integrating the Ansvar Gateway. Rate limits are abuse protection, not a product gate — they're sized so legitimate workloads never approach them. Every tier has a daily search quota; Free is set apart by its size (100 calls per day) and its single-source scope. Every limit fails with the same machine-readable JSON-RPC error.

Per-tier limits today

The live enforcement contract

Four tiers, three mechanisms: a per-user token bucket on request rate, a per-tier concurrency cap, and a daily search budget on every tier — per seat on Free and Premium, pooled per organisation on Team and Company. These are the numbers production enforces today.

Tier	Daily quota	Concurrency	Sustained rate	Burst capacity
Free	100 search calls / day	1	60 rpm (≈ 1 req / s)	100 tokens
Premium	5,000 search calls / day per seat	4	600 rpm (≈ 10 req / s)	1,000 tokens
Team	50,000 search calls / day, pooled per organisation	16	1,200 rpm (≈ 20 req / s)	2,000 tokens
Company	500,000 search calls / day, pooled per organisation	64	2,400 rpm (≈ 40 req / s)	4,000 tokens

How the rate limiter is enforced today. The per-minute figures above are per-user nominal thresholds for abuse protection. The current production gateway uses a per-process bucket per user, replicated across the gateway's worker processes. With four workers and round-robin request distribution, a single user's effective sustained ceiling can be up to four times the nominal rate before the limiter rejects a call. The daily search budgets and the per-tier concurrency caps use the same per-worker counters, so the same multiple applies to them. We're explicit about this rather than hiding it because (a) the numbers are sized so legitimate workloads never approach them, and (b) shared-bucket enforcement (Redis-backed) is on the roadmap. If you need a hard, audit-grade per-tenant cap today, that's the Company tier and a contracted limit.

Hosted onboarding. Regulated teams on guided onboarding are provisioned at Team tier in the authentication layer, so they get Team-tier capabilities and Team-tier limits — including structured workflows.

Burst behaviour. Every tier's bucket holds roughly 100 seconds of full-throttle calls before refill matters (100 tokens at 1 req/s, 1,000 at 10 req/s, 2,000 at 20 req/s, 4,000 at 40 req/s). Once empty, refill happens continuously at the sustained rate.

What fires, when

Three limits, one error shape

Every limit on this page fails the same way: the tool call returns a JSON-RPC error with code -32000 and the machine-readable string cap_exceeded. The failure arrives inside the MCP session — there is no HTTP 429 on this path.

1. Per-minute rate limiterOne token bucket per user — sustained rate plus burst capacity from the table above. This is abuse protection, not a product gate: thresholds are sized so legitimate workloads never approach them.All tiers

2. Daily search budgetEvery tier carries one: 100 search calls per day on Free and 5,000 on Premium, each per seat; 50,000 on Team and 500,000 on Company, pooled across the organisation — every seat draws from one shared counter, and a refusal says so. Paid budgets are abuse ceilings sized well above legitimate workloads, not product gates.All tiers

3. Concurrency capA ceiling on simultaneous in-flight requests per user: 1 on Free, 4 on Premium, 16 on Team, 64 on Company. Parallel fan-out from your AI client counts against it — serialise calls if you hit it.All tiers

Burst capacity is generous (≈ 100 seconds of full-throttle calls) so well-behaved AI clients rarely see cap_exceeded. If yours is hitting a limit, the cause is almost always parallel fan-out without coordination — serialise calls or reduce concurrency.

Error shapes

Two machine-readable failures

Limits and tier gating surface as JSON-RPC errors, not as HTTP status codes. Match on the error code and the cap_exceeded string — never on the human-readable message text.

-32000 · cap_exceededAny limit on this page — the rate limiter, a daily search budget, or the concurrency cap. The error object carries JSON-RPC code -32000 with the machine-readable code "cap_exceeded". Back off and retry; a daily quota won't clear until the window resets.Caps & quotas

-32601 · Method not foundThe tool exists, but not on your tier — for example search_case_law on Free, or workflow tools on Premium. Tier-hidden tools are also absent from tools/list, so a well-behaved AI client never calls them. Retrying never helps.Tier-hidden tools

cap_exceeded is the machine-readable signal — pattern-match this rather than parsing the message. If you see -32601, check tools/list: the gateway only advertises the tools your tier can actually call.

Sample retry logic

What well-behaved client code looks like

Back off on cap_exceeded with bounded retries so a transient cap doesn't become a retry storm. Never retry -32601 — the tool isn't on your tier, and no amount of waiting changes that.

import time

MAX_RETRIES = 3
CAP_EXCEEDED = "cap_exceeded"

def call_with_backoff(call_tool, name: str, arguments: dict):
    """call_tool is your MCP client's tool-call function.
    JSON-RPC errors surface differently per SDK (return value
    or exception); normalise to (code, text, result) first."""
    for attempt in range(MAX_RETRIES):
        code, text, result = call_tool(name, arguments)
        if code is None:
            return result

        if code == -32601:
            # Tool is not on your tier. Retrying never helps.
            raise PermissionError(f"{name} is not available on this tier")

        if code == -32000 and CAP_EXCEEDED in text:
            # Rate limiter, concurrency cap, or a daily quota.
            # A daily quota will not clear until the window resets,
            # so keep retries bounded.
            time.sleep(2 ** attempt)
            continue

        raise RuntimeError(f"Gateway error {code}: {text}")

    raise RuntimeError(f"{CAP_EXCEEDED} after {MAX_RETRIES} retries")

If your AI client hits cap_exceeded frequently, the right move is usually to reduce parallel fan-out (run searches sequentially within a workflow rather than firing all jurisdictions at once). If the sustained workload genuinely needs more capacity than your tier provides, talk to us about the Company tier.

Sizing your usage

Pick a tier without a sales call

A typical workflow run fans out to 10–50 gateway calls. Both rate and concurrency bind: parallel fan-out counts against your tier's concurrency cap before the rate limiter matters.

Free · 1 req / sB2B-gated evaluation. Single-source search with a 100-call daily quota and concurrency 1 — enough to test grounded answers against your jurisdictions, not enough to run production workloads.Evaluation

Premium · 10 req / sHeavy individual research with premium fan-out: case law, preparatory works, and agency guidance alongside the base sources. 5,000 search calls / day per seat, concurrency 4. Workflows live on Team and above.Individual

Team · 20 req / sMulti-seat compliance work with structured workflows (DPIA, gap analysis, tender review, threat model) and concurrency 16. Guided-onboarding customers are provisioned at Team tier today.Multi-seat

Company · 40 req / sAdds the tamper-evident audit ledger and concurrency 64. Capacity is contracted — sustained rates above 40 req/s are negotiated per agreement.Per contract

Counts are flat: every successful gateway tool call is one unit, regardless of how many downstream MCPs the call fans out to. On Premium and above, a search that hits 30 MCPs counts the same as a single get_provision. Free-tier search is single-source, and each call counts one of the 100 per day.

What's coming

Shared-bucket enforcement

The open enforcement item is the per-worker loophole disclosed above: moving the rate limiter, daily quota, and concurrency counters to a shared, Redis-backed store so the nominal per-user numbers are exact rather than per-process. This page tracks the live contract — when enforcement changes, the table above changes with it.

If your workload is bumping against the current limits, get in touch — Company-tier capacity is contracted, and we'd rather size it with you than have you engineer around a cap.

Need different limits?

Compare tiers on the pricing page, or talk to us about a Company contract with custom capacity.

See pricing