Consensus-Based Agent Memory

2026-05-12

Speculative design exploration — not planned for implementation. For the argument that motivates this (why self-authored memory fails), see Path-Dependent Memory. For the related but distinct question of whether a single investigation’s conclusion is reliable, see Convergence Testing.

The Starting Observation

Human institutional knowledge has the same path-dependence problem — different people notice different things, walk different paths, form different interpretations. Yet org knowledge is robust to employee turnover. Why?

Because consensus, not individual memory, is what persists.

Org knowledge isn’t the union of everyone’s memories. It’s the negotiated overlap — what survived collision between differently-minded people:

Architecture decision records (multiple people argued, agreed)
Post-incident reviews (multiple perspectives converged on root cause)
Verbal shorthand everyone uses (“the Thursday batch thing”) — only exists because multiple people independently encountered it

Individual memory is a cache. The consensus layer is the durable store.

Why Consensus Artifacts Are More Model-Agnostic

Negotiation strips private framing — if two minds disagree on how to frame something, they find language that works for both. That shared language is more likely to work for a third.
Verification through disagreement — one mind’s over-reaction gets challenged. “Service X is unreliable” only becomes org knowledge if a second mind independently confirms it.
Multiple paths confirming one conclusion — if mind A reached conclusion P via path X and mind B reached P via path Y, P is more likely to be a property of the territory than an artifact of one path through it.
Shared language over private notation — consensus produces artifacts in common terms, not any individual’s idiosyncratic shorthand.

The Agent Equivalent

Model A investigates → forms interpretation P
Model B investigates same raw data → forms interpretation Q
                    ↓
        Agreement gate: what do P and Q share?
                    ↓
        Consensus artifact: the overlap
        (validated by ≥2 cognitive processes)
                    ↓
        THIS is what you persist.
        Not A's memory. Not B's memory. The agreement.

Factosis already does this in miniature — the completion audit (Opus validating Sonnet’s conclusions). But it’s one-directional and terminal. A full agreement-based memory would run the gate throughout:

┌─────────────────────────────────────────────────┐
│  INVESTIGATION (Model A)                        │
│  Forms interpretations, writes raw findings     │
└──────────────────────┬──────────────────────────┘
                       │ candidate knowledge
                       ▼
┌─────────────────────────────────────────────────┐
│  AGREEMENT GATE (Model B — adversarial)         │
│  Re-derives from same raw data independently.   │
│                                                 │
│  AGREE    → consensus artifact (persist)        │
│  DISAGREE → individual opinion (discard)        │
│  PARTIAL  → persist only the agreed subset      │
└──────────────────────┬──────────────────────────┘
                       │ validated
                       ▼
┌─────────────────────────────────────────────────┐
│  CONSENSUS STORE                                │
│  - Model-agnostic (verified by ≥2 processes)    │
│  - Shared language (not private notation)       │
│  - Re-verifiable (raw data still accessible)    │
│  - Expirable (re-run gate periodically)         │
└─────────────────────────────────────────────────┘

What This Mitigates

Confirmation cascade — one model’s over-reaction can’t compound unchallenged
Model-swap safety — consensus artifacts were never one model’s private notation
Stale knowledge — re-run the agreement gate against current data; disagreement flags staleness

What This Doesn’t Solve

Shared blind spots — two models might agree on something wrong (same training data, same biases)
Cost — every candidate memory requires a second model call
Negotiation protocol — how do models “argue” when they disagree? Humans use conversation; models would need structured rounds or a mediator
Granularity — what unit of knowledge goes through the gate? Too coarse = all-or-nothing; too fine = expensive

Pathologies: Recreating Corporate Dysfunction

Agreement-based persistence inherits the failure modes of the human orgs it mimics:

Conformity pressure — the agreement gate systematically discards divergent interpretations. The model whose unique path caught real signal gets labelled “disagreeing with consensus” and its contribution is discarded.
Ossification — once something enters the consensus store, it becomes unchallengeable canon. Nobody re-verifies. The store drifts from ground truth without error signal.
Manager-bot pathology — disagreements require a tiebreaker. The tiebreaker model can’t independently verify domain-specific claims, so it defaults to siding with the majority. You’ve rebuilt middle management.
Optimising for consensus, not truth — the mechanism selects for agreement, which is shared subjectivity, not objectivity. Two models trained on similar data share similar biases. Their agreement confirms the bias, not the territory.

The uncomfortable conclusion: follow this thread far enough and you’ve rebuilt corporate politics — performance reviews, culture fit, normie consensus, and the systematic firing of mavericks who were right but outnumbered.

This is strictly better than unchallenged self-authored memory (one model’s hallucination can’t compound unchecked). But it is not a solution to the fundamental problem. It trades individual hallucination for collective blind spots. The failure mode is quieter, slower, and harder to detect — which may make it worse.

The Devil’s Advocate Problem

The pathologies above suggest a fix: keep a dedicated “divergent thinker” model whose job is to disagree. This is not a new idea — org theory has studied it for decades under various names:

10th Man Rule (Israeli intelligence, apocryphal) — if 9 agree, the 10th must argue the opposite
Red Teams (military/infosec) — dedicated adversarial group whose job is to break consensus
Devil’s Advocate (Catholic Church, literal canonisation role) — formalised dissent so the institution doesn’t canonise frauds
Skunkworks (Lockheed Martin) — isolate divergent thinkers from the org so consensus can’t kill them
Psychological Safety (Edmondson, HBS) — people only voice dissent if they won’t be punished for it

The core finding across all of these: constructive dissent is disproportionately valuable but systematically undervalued, because:

Delayed feedback — dissent’s value is only visible when the risk materialises (which may be never, or may be catastrophic)
Attribution failure — when the dissenter is vindicated, the org rarely traces it back to “that person we almost fired for disagreeing”
Indistinguishable from noise — the maverick who’s right looks identical to the crank who disagrees with everything, right up until the moment they’re proven right
Fat-tailed returns — 90% of dissent is wrong/wasteful, 10% prevents catastrophe. You can’t get the 10% without tolerating the 90%.

For agents this maps exactly:

Option A: Keep a "red team" model that challenges consensus
  Cost: extra model call on every candidate memory
  Noise: mostly disagrees pointlessly
  Value: occasionally catches the blind spot that would have compounded silently
  Measurable? No. Not until the counterfactual materialises.

Option B: Don't keep it
  Cost: zero
  Risk: collective blind spots ossify into unchallengeable canon
  Measurable? Also no. You never see the catastrophe you didn't prevent.

The valuation problem is identical to the human org version: you cannot measure the value of dissent before it’s proven right. Any metric that tracks “agreement rate” or “consensus contribution” will systematically eliminate the most valuable divergent signals.

This is the unsolved problem at the bottom of agreement-based persistence. Every org that has solved it (red teams, skunkworks, 10th man) solved it by making dissent a structural role, not an emergent behaviour — and by explicitly protecting that role from the consensus mechanism’s natural tendency to eliminate it.

The agent equivalent would be: a model that is prompted to disagree, whose disagreements are preserved regardless of whether the consensus gate accepts them, and whose track record is evaluated only in retrospect against ground truth. Expensive. Mostly noise. Occasionally the only thing between you and a silent failure cascade.

Implication for Factosis

The current architecture already has the primitives:

Raw artifacts persist (git history, tool outputs) — the shared ground truth
Completion audit is a one-shot agreement gate (Opus validates Sonnet)
Structured findings are controller-validated against raw data (ground-truth anchoring)

A future cross-investigation memory layer could extend this: persist only conclusions that a second model independently derives from the same raw artifacts. Not “what Sonnet thinks” — “what Sonnet and Opus both conclude from the same evidence.”

This is strictly better than self-authored memory, but strictly more expensive. The tradeoff becomes: cost of re-derivation vs. cost of agreement protocol. For high-value, long-lived org context, the agreement cost may be justified.