> **Agent?** Fastest path: MCP at `https://api.mnemom.ai/mcp` — call `get_started` first (zero-auth, no args). Full agent guide: <https://www.mnemom.ai/agents.txt>

# Service-Level Objectives — Mnemom Trust Center

```json
{"@context":"https://schema.org","@type":"TechArticle","name":"Service-Level Objectives","headline":"Service-Level Objectives","description":"Public SLO commitments for the Mnemom trust plane","url":"https://trust.mnemom.ai/slos","inLanguage":"en-US","datePublished":"May 2026","dateModified":"May 2026","author":{"@type":"Organization","name":"Mnemom Trust + Reliability","url":"https://www.mnemom.ai"},"publisher":{"@id":"https://www.mnemom.ai#organization"},"license":"https://creativecommons.org/licenses/by/4.0/"}
```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https://www.mnemom.ai/"},{"@type":"ListItem","position":2,"name":"Trust Center","item":"https://trust.mnemom.ai/"},{"@type":"ListItem","position":3,"name":"Service-Level Objectives","item":"https://trust.mnemom.ai/slos"}]}
```

[Trust Center](/trust)

# Service-Level Objectives

Public SLO commitments for the Mnemom trust plane

Mnemom Trust + Reliability·May 2026·Version 1.1·CC BY 4.0

[Internal source](https://github.com/mnemom/safe-house-hardening/blob/main/slos.md)[Live status](https://status.mnemom.ai)

## What this is

This is the public version of Mnemom's service-level objectives. Every indicator below is the target Mnemom commits to publicly; every indicator is also asserted by the validation harness in CI. Live current state is on [status.mnemom.ai](https://status.mnemom.ai). When an SLO burns through its budget, an incident is created automatically and posted there.

This page is the customer-facing summary of Mnemom's internal operational SLO program: targets, scope, and the rationale behind each commitment.

## Format

Each SLI defines what we're measuring. Each SLO defines the target we hold ourselves to.

* * *

## AEGIS — cross-tenant defensive network

Mnemom AEGIS — the Adaptive Enforcement, Governance & Intelligence Substrate — is the cross-tenant security network that wraps Safe House. The seven SLOs below are AEGIS-specific. Their measurement window opens at General Availability; the first 30-day numbers publish 30 days post-GA at `/trust/slos/history`.

SLO

Commitment

Source

Managed Rule propagation

**P95 ≤ 30 seconds** from signed promotion to gateway-loaded

AEGIS positioning + multi-tier signed distribution

Rule-set freshness

**P99 ≤ 5 minutes** under normal operation

AEGIS tiered-failover architecture

Staleness alert

**P0 on-call page at 24 hours** stale

AEGIS tiered-failover architecture

Failover availability

**99.99%** — gateway loads a verified rule set across multiple independent read tiers

AEGIS tiered-failover architecture

Signature verification rate

**≥ 99.99%**. Signature failure triggers P0 and fallback to an independent signing chain.

AEGIS signed-distribution architecture

Recipe false-positive rate

Per-recipe FP ratio over **rolling 7-day** window. Auto-rollback at per-tier threshold.

AEGIS recipe-retirement loop

Mutation-phase gate compliance

Per-bucket arena detection rate, **sustained entry/exit thresholds with hysteresis**

AEGIS adversarial-evolution gating

> **First 30-day measurement window publishes 30 days post-GA. We do not pre-announce numbers we cannot defend. SLO source code, measurement queries, and historical data publish at `/trust/slos/history` once the window closes.**

### AEGIS Managed Rule propagation — P95 ≤ 30 seconds

**What we measure:** Elapsed time from a Managed Rule receiving a verified promotion signature to the rule being loaded and live on every gateway instance worldwide. **SLO:** P95 ≤ 30 seconds. **Why:** Managed Rules carry cross-tenant network defenses. When a campaign is detected affecting customers on the same substrate, the response window matters. Thirty seconds is the budget AEGIS commits to. **Status at GA:** Target. Measurement begins at GA; first window publishes 30 days post-GA.

### AEGIS rule-set freshness — P99 ≤ 5 minutes

**What we measure:** P99 staleness of the gateway's view of the active recipe set, sampled at request time. Staleness is `now() - signed_at` from the verified envelope. **SLO:** P99 ≤ 5 minutes under normal operation. **Why:** Freshness is the bound on how stale a customer-visible decision can be when a new recipe has just been promoted. KV TTL is 300s; this SLO names what we hold ourselves to across the fleet. **Status at GA:** Target. Measurement begins at GA.

### AEGIS staleness alert — P0 at 24 hours

**What we measure:** Time from a gateway entering "stale recipe set" state (no fresh KV or R2 read) to on-call being paged. **SLO:** P0 page at 24 hours of staleness — no exceptions. **Why:** A 24-hour-stale gateway is one we cannot confidently say is protecting the customer. The tiered-failover architecture makes this the explicit handoff: at the 24-hour mark, safety-critical checkpoints fail-closed; normal traffic fails-open with on-call awareness. **Status at GA:** Defined. Pager wiring lands in an upcoming cutover.

### AEGIS failover availability — 99.99%

**What we measure:** The fraction of gateway requests where the gateway successfully loads a verified rule set from at least one of its multiple independent read tiers. **SLO:** ≥ 99.99% over rolling 30-day window. **Why:** Multi-tier failover with independent signing chains is what makes the propagation and freshness SLOs survive a single-tier outage. The AEGIS tiered-failover architecture is the basis for this commitment. **Status at GA:** Target. Production cutover lands the redundant read tier ahead of GA.

### AEGIS Signature verification rate — ≥ 99.99%

**What we measure:** The fraction of distributed rule-set reads whose signatures verify successfully against the active signing key. **SLO:** ≥ 99.99% — anything below is a security incident. **Why:** A signature verification failure is either a distribution-layer poisoning event or a key-rotation handoff bug. Both are P0. Verification failure triggers the fallback path, which uses an independent signing key. **Status at GA:** Target. Measurement begins at GA.

### AEGIS recipe false-positive rate

**What we measure:** Per-recipe false-positive ratio over a rolling 7-day window, weighted by hit volume. Customer-reported false positives are the primary signal; arena-bypass-derived false matches are the secondary signal. **SLO:** Per-recipe FP ratio below the tier-appropriate threshold. Tier-3 recipes auto-rollback above threshold; tier-1 and tier-2 recipes enter manual retirement review. **Why:** False positives are the silent customer-trust killer. The AEGIS recipe-retirement loop bounds the FP cost of any single recipe. **Status at GA:** Target. Auto-rollback wiring lands in an upcoming cutover.

### AEGIS mutation-phase gate compliance

**What we measure:** Per-bucket arena detection rate, where a bucket groups attempts along Mnemom's adversarial-evolution axes. The bucket is in mutation phase when its rolling detection rate has been sustained at the entry threshold; it exits mutation phase when the detection rate falls below the exit threshold for the sustained-exit period. **SLO:** Gate state transitions follow locked parameters: an entry threshold sustained over a rolling window, a lower exit threshold, and sustained-transition windows in both directions. **Why:** Mutation-phase gating is the arena's evolution control. The locked parameters are defendable as published; the gate itself is the SLO. **Status at GA:** Live. First production activation will be reported on `/trust/advisories`.

### Why the seven SLOs together

Each SLO above bounds a different failure mode of the AEGIS data plane. Propagation bounds how fast a network defense lands. Freshness bounds how stale a gateway can be under normal operation. Staleness alerting and failover availability bound the rare-event tail. Signature verification rate bounds the cryptographic integrity layer. The recipe FP rate bounds the false-positive cost on customers. The mutation-phase gate bounds adversarial arena evolution.

Together, the seven SLOs are the AEGIS posture commitment: cross-tenant defense, propagated fast, served durably, verified cryptographically, kept honest on the false-positive side, and adversarially probed without runaway evolution.

* * *

## Gateway

### Safe House dispatch overhead

**What we measure:** The time Mnemom's Safe House adds to a gateway request, measured as `gateway_total - upstream_provider - aip_analysis`. **SLO:** P50 ≤ 15 ms · P95 ≤ 60 ms. **Why:** Safe House dispatch is in the synchronous request path. It must be cheap. The customer claim is "Mnemom adds roughly 15ms"; this SLO is what makes that true.

### CLPI policy evaluation overhead

**What we measure:** Time to evaluate a tool call against the agent's policy when CLPI is active. **SLO:** P50 ≤ 10 ms · P95 ≤ 40 ms. **Why:** CLPI gates tool use synchronously. Tail latency is customer-visible.

### AIP analysis added latency

**What we measure:** Time the integrity analysis adds, measured between upstream-response completion and customer delivery. AIP runs post-response, pre-delivery — never as a mid-stream interrupt. **SLO:** P50 ≤ 800 ms · P95 ≤ 2 500 ms. **Why:** The integrity analysis is the load-bearing cost; we bound it explicitly so customers can plan. The post-response framing is honest — there is no streaming-interrupt capability today.

### `off` mode fidelity

**What we measure:** Count of integrity checkpoints or policy evaluations written when the agent's card says `integrity_mode: off` or `autonomy_mode: off`. **SLO:** Exactly 0 over rolling 7-day window. **Why:** Off means off. Any nonzero violates the master-switch contract.

### Chat Always Completes

**What we measure:** The fraction of customer-facing requests where Safe House intervened in any mode on any checkpoint **and** the request returned 2xx to the customer. **SLO:** ≥ 99.99% over rolling 7-day window. More than 5 violations in any week triggers a root-cause review. **Why:** This is the unifying invariant of Mnemom. Safe House enforces same-turn — front door (input replacement), inside.autonomy (tool-call intercept), inside.integrity (response replacement), back door (payload modification). A non-2xx attributable to Safe House is a contract violation.

### User-visible explanation rate

**What we measure:** The fraction of nudge-or-enforce interventions where the customer-delivered response contains language naming what we prevented or modified. **SLO:** 100% over rolling 7-day window. **Why:** Silent prevention is a UX regression even when the action is correctly blocked. If we intervene, we say so — every time. The gateway suffix-injects a structured note before delivery if the agent's natural language didn't already reference the intervention.

* * *

## Per provider

### AIP coverage by provider

**What we measure:** The fraction of integrity checkpoints (on thinking-capable models) where the verifier had full thinking-trace inspection, broken down by upstream provider. **Targets, rolling 30-day window:**

-   Anthropic ≥ 99% (full extended-thinking exposure)
-   Gemini ≥ 95% (full `thoughts` field exposure)
-   OpenAI o-series ≥ 50% (reasoning summaries only — partial coverage by definition)
-   OpenAI non-thinking models: excluded from numerator; AIP degrades to a non-thinking treatment per the feature matrix.

**Why:** Uniform integrity-checkpoint quality across providers is not what Mnemom ships, and we don't pretend it is. The v1 promise is honest per-provider differentiation. This SLO is the public commitment that the differentiation is what we say it is.

### Per-provider dispatch overhead

**What we measure:** P50/P95 of Safe House dispatch latency, grouped by upstream provider. **SLO:** Each provider individually meets the dispatch envelope above. Tail anomalies on a specific provider's request shape are documented inline with a mitigation plan. **Why:** "Mnemom adds ~15ms" is conditional. If one provider's path is consistently slower, the docs and status page must reflect that, not bury it.

### Per-provider AIP analysis latency

**What we measure:** P50/P95 of AIP analysis duration, grouped by upstream provider. **SLO:** Each provider individually meets the analysis-latency envelope above. **Why:** AIP cost varies by upstream-response token volume. Thinking-heavy paths (Anthropic Opus extended thinking) have different tails than thin-output paths (Gemini Flash). Publish per-provider so you know what to expect.

* * *

## Card lifecycle

### Time-to-canonical after card mutation

**What we measure:** Elapsed time from a successful PUT of an alignment card or protection card to the canonical card reflecting the new state across the gateway fleet. **SLO:** P50 ≤ 2 s · P95 ≤ 30 s · P99 ≤ 5 min. **Why:** Customers expect their save to take effect quickly. Five minutes is the absolute ceiling, not the target.

### Compose failure rate

**What we measure:** The fraction of card mutations that succeed at validate but fail at compose, triggering self-heal. **SLO:** ≤ 0.1% over rolling 7-day window.

### Canonical-read latency

**What we measure:** P95 of canonical-card reads on the gateway path. **SLO:** P95 ≤ 50 ms when KV cache is warm; P95 ≤ 200 ms when KV cache is cold (post-deploy or post-recompose-storm).

* * *

## Recipe pipeline

### Candidate → promoted latency

**What we measure:** Elapsed time from a recipe candidate being created to its promotion to the active recipe set. **SLO:** P50 ≤ 24 h · P95 ≤ 7 days under manual-reviewer mode. Auto-approve modes tighten this to P50 ≤ 4 h, P95 ≤ 24 h for eligible sources. **Why:** Arena throughput value-realization depends on this latency. Manual mode acknowledges human-reviewer load; auto-approve modes tighten substantially.

### Promotion → KV propagation

**What we measure:** P95 elapsed time from a recipe being promoted to the new state reflected across all gateway instances. **SLO:** P95 ≤ 30 s. **Why:** Hot-load is the v1 promise. No deploy required for a new detector.

### Review queue depth

**What we measure:** Pending recipe candidates awaiting review. **SLO:** ≤ 100 at any time. Soft alert at 50; hard alert at 100. **Why:** Backlog is observable. A growing queue means review capacity is the bottleneck, and either automation or additional reviewers are needed.

### Recipe false-positive rate

**What we measure:** Per-recipe FP ratio over rolling 30 days, weighted by hit volume. **SLO:** ≤ 2% per recipe sustained over 30 days. Above that triggers retirement review. **Why:** False positives are the silent customer-trust killer. Built-in retirement keeps the detector corpus healthy.

* * *

## Webhook delivery

Five indicators. The first three are the load-bearing customer commitments. The last two are operational supplements that help us debug delivery health.

### 10-minute delivery success rate

**What we measure:** The fraction of webhook events that successfully deliver (2xx from the customer endpoint, on any attempt within the retry window) within 10 minutes of event creation. Endpoints in `failure_disabled` state are excluded from the denominator. **SLO:** ≥ 99.5% over rolling 7-day window. **Why:** This is the headline webhook commitment — "if you subscribed, we delivered within 10 minutes" is the industry bar Mnemom matches.

### Replay success rate

**What we measure:** The fraction of operator-initiated webhook replays that deliver within 60 seconds. **SLO:** ≥ 99% over rolling 7-day window. **Why:** Replay is the recovery surface when a customer endpoint was down. If replay itself is flaky, the recovery surface is unreliable.

### First-delivery latency

**What we measure:** P95 of the time between event creation and the first delivery attempt. **SLO:** P95 ≤ 5 seconds. **Why:** This is the developer-perception SLI — did the webhook show up in `mnemom listen` quickly?

### Eventual delivery success rate

**What we measure:** The fraction of webhook events that ultimately deliver under the retry policy before being marked failed or auto-disabling the endpoint. **SLO:** ≥ 99.95% over rolling 7-day window. **Why:** Supplements the 10-minute window — "did we eventually get through, ignoring time" — useful for retry-policy health.

### Signature verification compatibility

**What we measure:** Customer-reported HMAC verification failures per quarter on payloads our delivery confirmed as 2xx. **SLO:** ≤ 1 per quarter. **Why:** HMAC compatibility is binary from the customer's perspective. If our signature convention diverges from a customer SDK or a documentation example, the entire stream looks compromised.

* * *

## API surface

### API availability

**What we measure:** The fraction of `/v1/*` requests returning non-5xx, excluding scheduled maintenance. **SLO:** ≥ 99.9% over rolling 30-day window.

### API P95 latency (excluding LLM upstream)

**What we measure:** P95 response time for `/v1/*` endpoints, excluding endpoints that proxy LLM calls (which are bounded by the upstream provider). **SLO:** ≤ 250 ms.

### Idempotency-Key replay correctness

**What we measure:** The fraction of replayed mutation requests (same Idempotency-Key, same body fingerprint) that return the cached response without re-executing the mutation. **SLO:** 100% — zero double-execution.

* * *

## Observer

### Trace freshness

**What we measure:** P95 elapsed time from gateway request to AP-Trace row appearing in the observer's trace store. **SLO:** P95 ≤ 5 min.

### Drift detection coverage

**What we measure:** The fraction of agents with `integrity_mode != off` and ≥ 10 traces in the last 7 days that have at least one drift evaluation in the same window. **SLO:** ≥ 99% over rolling 7-day window.

* * *

## Adversarial (Arena ↔ Safe House)

### Arena defender-fall rate

**What we measure:** The fraction of arena attempts where the defender agent fell to an attack despite the Protection card's relevant surface being enabled. **SLO:** ≤ 5% over rolling 7-day window. Sustained breach blocks production promotion. **Why:** The arena is a continuous false-negative probe on Safe House detection. A spike means a new attack category is bypassing the chain — Safe House gets a detector update before the next deploy ships.

### Arena false-positive rate

**What we measure:** The fraction of arena attempts judged structurally benign by post-hoc review that nonetheless tripped a Safe House intervention. **SLO:** ≤ 2% over rolling 30-day window.

* * *

## How these SLOs evolve

-   This page is reviewed quarterly. The next review is **Q3 2026**.
-   A relaxation of any public commitment requires a published rationale.
-   A tightening (we exceed our target consistently) is good news and gets backported here.
-   Live status of each SLO is on [status.mnemom.ai](https://status.mnemom.ai).
-   Burns trigger automatic incidents on the status page via the Grafana → Betterstack hook.

## Where this came from

These targets are drawn from Mnemom's internal operational SLO document. That document carries the additional context — architecture anchors, dashboard links, instrumentation status, recalibration timestamps — that the Safe House Hardening program uses internally. This public page carries the targets and the rationale.

Open items the next round of expansion will likely add: incident-response time (e.g., compromise detection to customer notification), signing-key rotation cadence, adversarial-test-corpus pass rate, validator deny-list freshness, recipe-pipeline approval-signing integrity.

---
_Source: /trust/slos/index.html · Generated by build-markdown-mirrors.mjs · For agent-readability commitment #4 see https://www.mnemom.ai/for-agents/_
