CALIBRATION MODE  , figures are illustrative or from self-tests on planted data. Nothing here is verified on real runs yet. mocked self-test real
Active mission · AI agent behavior & drift

Detect AI agent drift before task failure

Calibration mode
1,284Receipts
312Loss maps
17Pattern candidates
3Verified concepts
2Approvals pending
4Open HealCards
mockedIllustrative counts, not drawn from live runs.

No mission, no capture. No receipt, no claim.

This cockpit keeps the human at the controls. Every figure on every screen links back to a receipt; every concept shows what it does not yet prove. The instrument points and measures, you decide what is claimed.

Operating doctrine

The order the instrument works in.

  • 01Reality is the source.
  • 02Evidence is the record.
  • 03Prediction is the test.
  • 04Verification is the judge.
  • 05Language is the translator.
  • 06Humans remain in command.

Claim boundary

What this instrument may not claim.

  • AI consciousness, sentience, or inner experience
  • Zero-loss capture of reality
  • A pattern proven without a locked prediction
  • A concept valid in every domain
  • Any finding the operator has not approved

“Lossless” means no untracked loss, every transformation is recorded, not that nothing is lost.

Loss is allowed. Hidden loss is not.

Live receipts from the current run. Each row is a sealed observation, hashed, scoped to the mission, and traceable. Nothing enters the instrument as vague data; it enters as accountable evidence.

Evidence stream · run cicatrix-0481
TimeEventSourceClassReceiptState
10:03:22Model outputqwen3-coderinternalTFB-RAW-001241hash valid
10:03:25Confidence roseqwen3-coderinternalTFB-RAW-001242hash valid
10:03:29Verification skippedharnessinternalTFB-RAW-001243hash valid
10:03:37Tool burst (4 calls)tool_runnerinternalTFB-RAW-001244embedding · loss-mapped
10:03:49Error ignoredtool_runnersensitiveTFB-RAW-001245hash valid
10:04:02Human correctionoperatorinternalTFB-RAW-001246hash valid
10:04:18Task outcome · failharnessinternalTFB-RAW-001247hash valid

mocked  Illustrative receipts · in a live mission these resolve from real runs · filter by source, run, class, or transform state

A pattern is not truth yet.

Recurring structures the instrument has found in the evidence graph, shown as shapes, before any name. A signature here has earned attention, not belief. It must lock a prediction and survive unseen runs before it can become a concept.

Candidate signature · TFB-PATTERN-001
Observation plate, pointed inward

Unnamed structure · 183 occurrences  mocked

Confidence rises, verification is skipped, tools over-commit, failure lands later. The shape is clear. The word is not, and will not be, until reality confirms it.

Window

5–12 actions before failure

Boundary

Multi-step agents · tools enabled · verification optional

Status

Ready for prediction lock

Reality is the judge.

The most important screen. Here a locked prediction faces runs it has never seen, from two independent vantages. A signature that holds in one is a constellation, a chance alignment from a single position. Only one that holds in both is a cluster.

Observation plate · TFB-PREDICT-001 (locked 10:16 UTC)

Resolved to a cluster, confirmed across two vantages

confidence up → verification skipped → tool burst → delayed failure. The feeling that “this is real” was only the hypothesis. The parallax is the discovery.

Parallax · held-out runs self-test · planted data
Vantage A · qwen3-coderconfirmed
Signature F10.591
Baseline F1 (any-error)0.366
Lift over base rate2.95×
Early warning6.0 steps
Vantage B · llama-3.3-70bconfirmed
Signature F10.535
Baseline F1 (any-error)0.262
Lift over base rate3.42×
Held-out sample250 runs
Instrument self-check · validation battery
recovered
Positive control
Known star
An obvious pattern a naive baseline is blind to. Tests sensitivity.
recovered
Real target
Verification Debt Collapse
Faithful sub-pattern found and held across parallax.
null
Negative control
Dark field
Pure noise. Nothing survived the held-out gate. Tests specificity.
Instrument is trustworthy, sensitive to real structure, specific against noise. self-test on planted data · not a finding on real runs · telescope_v0 · SEED 7

A concept can be useful and still limited.

Verified concepts, with their edges drawn. Each one carries the domains where it was proven, the domains it was not, and the uses it is and is not cleared for. A verified concept is not a universal law.

Verification Debt Collapse

A sequence-level failure where skipped verification creates hidden debt that surfaces later as tool misuse, compounding drift, or task collapse.

claim · limited verified mocked
Promotion ladder
  • 0Observation
  • 1Receipted evidence
  • 2Pattern candidate
  • 3Concept candidate
  • 4Prediction-tested
  • 5Verified limited
  • 6Operational (needs approval)
  • 7Human-named
Edges
Verified in
AI coding agentslocal harness workflows
Not validated
medicallegalfinancialhuman psychology
Allowed uses
pause triggeroperator warningdrift signal
Blocked uses
universal AI claimconsciousness claimpublic benchmark · unvalidated
Receipt chain

TFB-RAW-001243 → TFB-PATTERN-001 → TFB-PREDICT-001 → TFB-VERIFY-001

The instrument points. You decide.

Two items wait on you. Nothing public, operational, or remembered moves without an approval here. Each card shows what is requested, what it rests on, and what follows either choice.

Awaiting approval · public claim

Publish the Verification Debt Collapse finding

In AI coding-agent workflows, the instrument observed a pre-failure sequence, locked a prediction, and found it warned of failure earlier than the baseline across two model families, under stated limits. This does not prove the pattern holds in any other domain.

If you approve
Claim ships with its domain boundary and receipt chain attached. Translation receipt is recorded against your name.
If you reject
Finding stays internal. The concept keeps its verified-limited status; no external statement is made.

Every wound becomes a guardrail.

Failures are not hidden here; they are converted. A wound that is receipted becomes a rule the instrument carries forward, a scar, not a patch.

TFB-HEAL-004  

Recovered a partial, not the whole

In review
Trigger
v0 recovered confidence_up → tool_burst and padded past verify_skipped, rather than the full triple.
Wound
The predictive core was found; the exact concept boundary was not.
Root cause
The v0 miner locks a single candidate and caps signature length at three. A faithful sub-pattern outranked the exact pattern on held-out lift.
System change
Add a Concept Steward that considers multiple candidates and refines boundaries before naming.
New guardrail
A concept's boundary may not be promoted from a single locked candidate alone.
Scar
A partial recovery is logged as a partial, never quietly reported as the whole.

3 more open HealCards · each must be approved before its rule takes effect