TFBthumb · Playground

Browser agents that don't flake. Website audits that don't lie.

Two modes on one substrate. Drive a page by accessible name and the substrate handles the selectors, the timing, the retries, the identity, and the security. Audit any URL end-to-end and seven lenses surface the real bugs: dead buttons, sub-44 tap targets, broken settles, occluded affordances, aria-stuffing, viewport-only loss of functionality. One CLI, one report, three viewports.

0 / 200flakes on the canonical harness

7.7×fewer tokens per decision

7audit lenses, 3 viewports

v0.3.25 (Heals A-F — the doctrine demanded the ratchet)six substrate heals absorbed from the 2026-06-20 outside-developer round-3 review. The recursive shape: the v0.3.24-minted doctrine D-MIGRATION-GLUE-INHERITS-THE-WOUND, applied to its own minting case, demanded the v0.3.25 ratchet. The dev's freeform caught it: "Heal A is correct: a sight-bound store cannot emit thumb.*. But walk the residue. _normalize_kind still defaults to 'thumb'; __init__ still defaults to 'thumb'. The A2 wound moved from 'a sight store silently writes thumb' (closed) to 'a consumer who forgets the namespace kwarg ENTIRELY gets a thumb store' (open). The default IS the shim." HEAL A (THE P1 DIVIDEND): register_consumer was check-then-act on shared mutable state. Two threads racing with different vocabs could both pass `existing is None` before either wrote, silently clobbering last-writer-wins. Latent under CLI; real under FastAPI wrapper. Module-level threading.Lock around the get-compare-set. SUB-F12 proves: 32 threads race, exactly 1 wins, 31 raise conflict, no silent clobber. The reviewer's idempotency-root pushback they CONCEDED then sent them to find this exact race — the framing predicted the bug. HEAL B (namespace REQUIRED): RegistryStore.__init__(path, *, namespace) — keyword-only, no default. All 18 callsites (10 in registry + 8 in gate) pass namespace="thumb" explicitly. "Forgot the kwarg → silent thumb store" is now TypeError at construction. SUB-F13 proves. HEAL C (verify-at-construction): __init__ checks namespace in _CONSUMER_REGISTRY + raises with the BEFORE-construction contract named. SUB-F14 proves. HEAL D (real bugs first per dev's B1): registry except-block `e` leak across cmd_status/cmd_verify renamed `e → engine`; last_verdict None-narrowing added; closed 7 mypy findings in one pass. Baseline 33→26. HEAL E (sharpen both v0.3.24 carves + mark as N=1 candidates per dev's E2): D-MIGRATION-GLUE-INHERITS-THE-WOUND now leads with STRUCTURAL heal (required-not-defaulted) per A3; D-SELF-NARRATING-DECOMPOSITION gains the behavioral-vs-empirical boundary per D2 (name-as-doctrine for behavior; INLINE prose-receipt for measured constants — the 1200ms in type_into stays beside the number). Both marked CANDIDATE — N=1 explicitly. HEAL F (00b refreshed in place per dev's C3): the SUPERSEDES notice was doc migration-glue; refreshed 00b directly. 28/28 sub-gates green + 3/3 self-tests. Round-3 scorecard on the three pushbacks we sent: P1 CONCEDED (and found a bug — the dividend above); P3 CONCEDED (with their refutation test sharpening our doctrine); P2 PARTIALLY (drift risk survives narrower — generate the views from source). Two clean, one partial. The cycle compounds: round 1 = 3 heals; round 2 = 5 heals + 2 carves + silent-data-loss closed; round 3 = 6 heals + concurrency race closed + 2 carves sharpened + 00b refreshed. The doctrine carved one round ago named what the next ratchet had to do.

v0.3.24 (Heals A-E — round-2 absorption + the silent-data-loss receipt)five substrate heals + two doctrine candidate carves, all absorbed from the 2026-06-20 outside-developer round-2 review. The centerpiece was a SILENT DATA LOSS BUG the reviewer found in code — the v0.3.23 Heal C migration glue's _normalize_kind prefixed any bare kind with the GLOBAL default namespace ("thumb"). A sight-context caller who forgot to bind their namespace AND wrote a bare kind colliding with thumb vocab ("register") silently became "thumb.register" + validated + OVERWROTE a thumb patient record on a colliding patient_id. Cherry's welcome page record gone, no error fired. The reviewer named the recursive pattern: "a heal that absorbs a doctrine at altitude N can reintroduce the same doctrine's violation at altitude N-1, in the migration glue, where nobody is looking. The migration path is where doctrines go to die quietly." HEAL A (store-bound namespace): RegistryStore(path, namespace="sight") binds at construction; bare "register" from a sight store → "sight.register" → REJECT loudly. SUB-F10 proves the wound is structurally impossible. HEAL B: events_for_namespace helper enforces the namespace+"." boundary so consumers can't open-code startswith("sight") and silently match "sightseeing.*". SUB-F11 verifies. HEAL D (the reviewer's F2): heal one real bug in the same ship as Heal A — retina.start() now binds the freshly-acquired CDPSession to a narrowed local variable; same runtime, mypy-clean; closed 7 baseline findings in one strike. HEAL E (the reviewer's B3): mypy.ini promotes governor.py to strict + 5 trivial annotations; closes 2 more. Net: TOLERATED_BASELINE 42 → 33. 25/25 sub-gates green + 3/3 substrate self-tests, zero regression. DOCTRINE CARVES: D-MIGRATION-GLUE-INHERITS-THE-WOUND (the reviewer's recursive finding) + D-SELF-NARRATING-DECOMPOSITION-OVER-COMMENT-ARCHAEOLOGY (the CEO's push-back on the founder's first answer to thumb.py:type_into heal-archaeology — name-as-doctrine makes future scars structurally small; off-code ledger relocation is fix-shape). Three pushbacks sent back to the reviewer (C2 IDEMPOTENCY vs ATOMICITY split, F4 sprawl is 3 not 4 surfaces, E2(1) decompose YES + relocate NO). The reviewer-packet doctrine compounds: round 1 produced 3 heals, round 2 produced 5 + 2 doctrine carves + closed a real silent-data-loss bug.

v0.3.23 (Heals A + B + C — outside-review absorption)three heals, all absorbed from a 2026-06-20 outside-developer review of the TFBthumb intro packet. The reviewer's structured-verdict findings became the v0.3.23 heal arc — D-EVERY-SHIP-CARRIES-A-REVIEWER-PACKET compounding into D-HEAL-NOT-FIX in code. HEAL A (C1 REFUTED — "wire mypy now, mis-filed as debt"): mypy 2.1.0 wired at substrate module seams via mypy.ini; strict on tfb_substrate + tfb_patient_registry, gradual on partially-typed modules. gate_mypy_strict_seams.py runs mypy + asserts no regression against TOLERATED_BASELINE=42 (the substrate's pre-existing typing debt, now mechanically tracked). New errors fail the gate; healed errors fail with "lower the baseline" humane message — per D-HEAL-NOT-FIX the substrate ratchets down with every seam-touching heal. HEAL B (B1 NEEDS_REVISION — "36 doctrines flat where it should be tiered"): docs/doctrine_hierarchy.md carved at the parent TFB repo — 5 roots (D-HEAL-NOT-FIX / D-ABSENCE-IS-AMBIGUOUS / D-VALIDATION-STRATIFIES / D-NARRATE-DON'T-COLLAPSE / D-DETECTIVE-OVER-PREVENTIVE) + 31 corollaries mapped + 2 root-adjacent candidates named (D-CONTINUITY-OVER-EXTRACTION + D-CONSEQUENCE-IS-DIFFERENT-FROM-IDENTITY). The reviewer's freeform observation cleanest: "absence is ambiguous wears six costumes" — identity_break / reach=None / inconclusive≠fail / detector_status / capture-off / scorer_kind — and now the hierarchy names it as the spine. HEAL C (E3 REFUTED — "registry composition breaks at the validation layer"): per-consumer namespace primitive in tfb_patient_registry.py. New register_consumer(namespace, allowed_event_kinds) + list_consumers() module-level functions. Bare-kind callers (cmd_register, cmd_verify) auto-prefix to "thumb.*" for migration compat; legacy un-prefixed events on disk auto-prefix at read time; thumb's patients() only aggregates thumb.* events. TFBwitness now registers its sight.* namespace at init without editing TFBthumb's source — the cross-organ coupling the dev caught is structurally healed. gate_registry_namespacing.py: 9/9 SUB-F checks green (migration compat / unknown-namespace rejects / idempotent / conflicting-vocab rejects / prefix required / unknown-kind-in-known-ns rejects / sight isolation E2E / legacy events aggregate / list_consumers inspectable). gate_patient_registry.py: 14/14 SUB-E still green (one update to SUB-E7 for the migration). The reviewer's review is itself the receipt — D-EVERY-SHIP-CARRIES-A-REVIEWER-PACKET compounds across review rounds; three findings absorbed in code; the next packet quotes the dev verbatim. 23/23 sub-gates + 3/3 substrate self-tests green. The dev can now start TFBwitness Phase 2.

Coming soon — TFBwitnessthe Pixel-RAG sibling to TFBthumb (CEO-ratified name 2026-06-20, other finalists were TFBsight / TFBscry / TFBglance; picked for doctrine-fit — it attests + produces receipts, composes with TFBthumb's legibility/verification ethos). TFBthumb verifies pages; TFBwitness reads them. Names a separate wound the Berkeley/Princeton/Databricks paper documented: the HTML-to-text translation step destroys >1/3 of QA-benchmark answers before retrieval. TFBwitness never converts the page to text — renders to pixels, slices into tiles, embeds each tile with a vision model (ColPali primary), stores in LanceDB, retrieves top-k tiles at query time, hands them to a 4B-floor-or-higher VLM (TFB's M5 Max already holds mlx-community/gemma-4-31b-it-4bit, 17GB, multimodal-confirmed — verified on-system 2026-06-20 per D-VERIFY-BIND-BEFORE-ANNOUNCING; 4B is the floor for non-M5 deployers). Composes with TFBthumb's existing capture layer (capture_clip + retina settle + atomic_write + dispatch_browser_type) and the patient registry (new event kinds: pixel_rag_ingested / pixel_rag_query / pixel_rag_reembedded / pixel_rag_invalidated). The net architecture: ~6,000 lines of TFBthumb substrate reused, ~1,200 lines of new glue (tile slicer + vision embedder + pixel store + ingest pipeline + retrieval+VLM + CLI + FastAPI on 127.0.0.1:8772). 4 build phases (smallest E2E loop → patient registry composition → PDF + journey-shaped ingest + 2-stage retrieval → FastAPI + gates + reviewer packet). NOT shipped — build packet handed to outside developer 2026-06-20; architecture locked. ANSWERING DEV'S H1 FINDING: the 130/0 number is from harness.py against a SYNTHETIC randomized-latency page, NOT a Wikipedia journey (my earlier docs conflated the customer-facing demo with the internal benchmark — corrected per D-VERIFY-BIND-BEFORE-ANNOUNCING). Both sides uncalibrated; no registry involved (the registry didn't exist at v0.2.2). The number is real and reproducible (canonical_harness_n200_v0_2_2.log.txt). One registry, two consumers, one patient_id per URL: the Cherry welcome page gets BOTH visual-regression verification (TFBthumb) AND pixel-grounded Q&A (TFBwitness) under the same patient.

v0.3.22 (Heal 5 — stack closeout)patient registry — the architectural payoff that closes the v0.3.22 5-heal stack. New tfb_patient_registry.py module: append-only JSONL event store (register / verify_result / calibrate_result) with last-write-wins aggregation; PatientRecord pins URL + affordance + engine matrix + viewport set + tolerance preset + named masks + calibration receipt; closed-enum validation at register time (patient_id regex, EngineLabel.parse, ViewportLabel.parse, tfb_visual.TOLERANCES) per D-FAIL-FAST-ON-REJECTED-INPUT. CLI subcommands: register / list / status / verify / calibrate. verify reconstructs the Inspector command from the registered contract (--browser per engine, --viewport per size, --mask per name, --check NAME visual_match TOLERANCE) and dispatches as a subprocess; verdict appends a verify_result event with passed/rc/stdout_tail. calibrate dispatches calibrate_tolerance.calibrate() directly + stores the receipt; --apply carries the recommendation into the tolerance_preset field. Dr. LLM's patient-registry pattern extended one architectural layer outward: TFBthumb becomes a peer of Dr. LLM / sandbox inspector / vision_harness / model_identity_gate. Cherry's welcome page, Jonathan's dashboard, the receipts / harness / org-chart pages, future conductor2 status pages — each registers once with its calibrated shape and the registry carries the contract forward. The substrate's wisdom compounds at the patient layer, not just the tool layer. New TFB surfaces join the doctrine by REGISTERING, not by re-deriving. 8 / 8 SUB-E wiring gates green (self-test, register/list/status round-trip, garbage rejection, verify PASS captured + verify FAIL captured after color flip, calibrate --apply updates tolerance, append-only history preserved, --registry-path + $TFB_PATIENT_REGISTRY honored) + 6 / 6 SUB-D + 6 / 6 SUB-C + 8 / 8 SUB-B + 6 / 6 SUB-A + 6 / 6 visual_match + 24 / 24 console-capture + 5 / 5 v0.3.15 inspector = 69 sub-gates regression-clean. The v0.3.22 stack closed an architectural arc: substrate unification (foundation) → multi-engine fan-out (capability) → named masks (the TFB-way variant of a Playwright move) → calibrated tolerance (doctrine evolution via data) → patient registry (substrate wisdom compounds across organs). Five heals, ordered, each double-layering the next.

v0.3.22 (Heal 4)calibrate_tolerance.py — empirical noise floor → tightest doctrine preset that fits OR doctrine candidate when none fits. Closes long-tail tolerance gap WITHOUT going Playwright-numeric-shaped.

v0.3.22 (Heal 3)named masks — --mask "live timestamp" resolves via retina at journey time; symmetric mask application in diff_pixels; missing masks INCONCLUSIVE-named. Playwright's mask is coordinate-bound; ours name-bound — the named-affordance doctrine extends to masking.

v0.3.22 (Heal 2)browser as a journey verb — multi-engine fan-out lands. EngineLabel closed-enum primitive (chromium | firefox | webkit) + dispatch_browser_type helper join ViewportLabel in tfb_substrate.py. --browser flag wires through verify_self (single) + Inspector (repeatable; cartesian product over engine × viewport). baseline_path keys {engine}/{viewport}/{name}.png so each engine's pixels live in their own tree.

v0.3.22 (Heal 1)substrate unification: new tfb_substrate.py owns ViewportLabel (closed-enum label generator + parser, round-trip invariant) and pid-unique atomic_write_bytes; verify_self + Inspector both route through it (24 inline viewport interpolations swept, 1 inline parse on each side, 4 atomic-write sites pid-unique now); closes receipt §9 gaps #2 + #3.

v0.3.21pixel lens grown INTO TFBthumb — visual_match gate predicate + --snapshot-visual journey verb + 5 doctrine-named tolerance presets; replaces Playwright's toHaveScreenshot at the substrate level so customers don't pick between TFBthumb-for-state and Playwright-for-pixels; 6 / 6 SUB-Y gates + 24 / 24 console-capture + 28 prior gates regression-clean; adversarial review caught 7 blockers before ship (brightness chromatic predicate, motion doctrine prose, alpha-channel RGBA walk, exact preset reachable through real Chromium, size-mismatch closed-enum kind + NaN sentinels, display-layer visual_match formatter at 4 sites, perceptual JND filter for Playwright threshold=0.2 parity) — all healed, gates re-green, no regression

The way you'd write it without TFBthumb

Wikipedia changes the DOM under your feet. Its Vue-based Codex search component swaps the bare <input> for a wrapped variant the instant focus lands; the same name resolves to two different inputs (one visible, one hidden for responsive); the success state is in <title> and <h1>, not an aria-live region. The same shape exists on every Material UI, Radix, Chakra, and modern design-system page. Here is what those four problems cost you side by side.

Playwright (what you write)

~30 lines of selector engineering, timing logic, navigation waits, and assertion. Breaks the next time Wikipedia swaps a class name.

import asyncio
from playwright.async_api import async_playwright

async def search_wiki(query):
    async with async_playwright() as p:
        b = await p.chromium.launch()
        page = await b.new_page()
        await page.goto("https://en.wikipedia.org/")
        await page.wait_for_load_state("networkidle")
        # Which input? Two have aria-label "Search Wikipedia".
        # The visible one is the first; the hidden duplicate
        # would error on .fill(). Hope nothing re-renders.
        box = page.locator(
            'input[aria-label="Search Wikipedia"]'
        ).first
        await box.click()
        await asyncio.sleep(0.05)  # Codex swap on focus
        # The box you held is now an orphaned node. Re-resolve.
        box = page.locator('.cdx-text-input__input').first
        await box.fill(query)
        await page.get_by_role("button",
            name="Search").click()
        await page.wait_for_url("**/wiki/**")
        # Did it work? Parse the title yourself.
        title = await page.title()
        return query.lower() in title.lower()
        await b.close()

TFBthumb (what you call)

One API call. Selectors are accessible names, what humans see. The substrate handles the Codex swap, the hidden duplicate, the navigation wait, and the title-level success observation. Receipt comes back signed and structured.

curl -X POST https://projecttfb.com/api/v1/runs \
  -H "Authorization: Bearer tfb_beta_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://en.wikipedia.org/",
    "fields": {"Search Wikipedia": "Pizza"},
    "submit_target": "Search",
    "success_text": "Pizza"
  }'

# Returns:
# {"run_id": "run_...", "status": "queued", ...}
#
# Poll GET /runs/{run_id}:
# {
#   "status": "done",
#   "final_summary": "observed 'Pizza'",
#   "trace": [
#     {"outcome":"acted", "action":"type",
#      "target":"Search Wikipedia"},
#     {"outcome":"acted", "action":"click",
#      "target":"Search"},
#     {"outcome":"done", "detail":"observed 'Pizza'"}
#   ],
#   "ledger_receipts": [...]
# }

Try it right now

Paste a URL, describe what you want done, click Run. The agent runs in a sandboxed headless Chromium on our server. The playground tier refuses every consequential action (submit, send, delete, pay) at the wrapper, exactly the boundary you'd want for an evaluation. The beta opens the human-in-the-loop gate when you're ready to actually submit.

Ledger receipt

Audit any URL end-to-end

The Inspector runs seven lenses across three viewports (mobile, tablet, desktop) and emits one structured report. Each finding names the affordance, the lens that surfaced it, the severity (BLOCKER, FAIL, WARN, NOTE), the evidence, and a one-line suggested heal. Real bugs from the canonical run set, none cherry-picked:

settling

Diagnoses pages that never quiesce. Caught NPR.org on three viewports against five seconds of Stripe and ad-pixel noise; the rest of the lenses correctly skipped an unsettled page.

tap_target

Sub-44 affordances at mobile per Apple HIG, sub-48 per Material. Caught a 35x21 "Go" button and a 101x29 search input on weather.gov, plus 108 sub-spec anchors on Hacker News in a single run.

reach

Center hit-test refused on a visible affordance. Caught a "No Thanks" survey button on usa.gov occluded behind another modal, and tiny "+" expand icons hidden under sticky elements on craigslist.

dead_button

Click each non-consequential button or link, observe four channels (URL, world map, aria-live, DOM mutation). Catches dead-href links (href="#", javascript:void(0)) too. Refuses any affordance whose name matches a consequential intent (Send, Pay, Submit, Sign in, Delete).

name_warnings

Unnamed actionables fail (screen-reader hostile). Aria-stuffing fails (hidden label disagrees with visible text). Descriptive prose stays a NOTE so good a11y practice never gets flagged as a bug.

viewport_compare

Affordances at desktop that are gone at mobile. Responsive swaps (nav collapses to hamburger menu) are NOTEs. Pure vanish with no replacement is a WARN: real loss of functionality on a smaller screen.

canvas_disclosure

The substrate is honest: DOM perception is blind to <canvas> contents. Where canvas is present, the Inspector tells you to bring a pixel lens. Silent on pages that have none.

One CLI call. One report.

python3 tfbthumb_inspect.py https://your-site.com
#  -> 7 lenses x 3 viewports, perf + canvas pixel metrics, evidence screenshots
#  -> structured findings: severity / lens / affordance / observation / heal
#  -> exit 0 (clean) | 4 (BLOCKER) | 5 (journey step failed) | 6 (gate failed)

# Restrict to a subset:
python3 tfbthumb_inspect.py https://your-site.com --lens tap_target,reach,dead_button

# Mobile only:
python3 tfbthumb_inspect.py https://your-site.com --viewport 390x844

# JSON for CI:
python3 tfbthumb_inspect.py https://your-site.com --json > report.json

The lantern, specifically, lit on a dark scene, on mobile and desktop, gated end-to-end

v0.3.3 drove past the title screen. v0.3.4 healed the doctrine-pivot footnote: the customer can now say the lantern, specifically, not "the canvas has any bright spot somewhere." --canvas-roi declares a named rect inside the canvas; --check canvas ... in NAME reads metrics from that rect specifically. Plus reachable-any for the responsive nav pattern (Main Menu visible as a desktop button, as a hamburger on mobile, both honest). The Lanternlight-shape customer writes one invocation and gets one exit code:

python3 tfbthumb_inspect.py http://localhost:3001/ \
    --viewport 390x844 \
    --click "begin" \
    --wait-for "Wonder Book" reachable \
    --canvas-roi lantern 280,140,60,60 \
    --canvas-roi sky 0,0,400,100 \
    --gate "milestone-0.1" \
    --check "Wonder Book" reachable \
    --check "Main Menu" reachable-any \
    --check perf fps>=55 \
    --check canvas brightness>=0.6 in lantern \
    --check canvas brightspot=present in lantern \
    --check canvas brightness<=0.3 in sky
#
#  gate 'milestone-0.1': PASS
#    [ok] 'Wonder Book' reachable             measured=reachable_on_all_viewports
#    [ok] 'Main Menu' reachable-any           measured=reachable_on 2/3_viewports
#    [ok] perf fps>=55                        measured=58.2
#    [ok] canvas brightness>=0.6 in lantern   measured=0.71
#    [ok] canvas brightspot=present in lantern measured=present
#    [ok] canvas brightness<=0.3 in sky       measured=0.18
#  exit 0
#
#  When any check fails, exit 6 with the measured value AND the ROI on
#  the failed line. exit 5 if a --click step missed (silent skips are
#  gone). fail-closed per D-FAIL-CLOSED: if perf couldn't measure because
#  the page never settled, the gate says INCONCLUSIVE (page_never_settled),
#  not a silent null.

v0.3.3 receipts document the journey-on-Inspector heal plus WebGL canvas decode validated end-to-end (Pillow correctly reads gl.clearColor swapchain frames), NPR-shape fail-closed naming the inconclusive reason, the new 2-token --check grammar, per-finding screenshot_capped traceability, and two conflict-flag warnings. v0.3.1 calibration receipts: HEALS_v0_3_0_to_v0_3_1.md. Same canonical URL set across every version: inspector_run_v0_3_0_*.json through inspector_run_v0_3_3_*.json, no cherry-picked results.

What you actually use it for

Six shapes the same substrate solves. Drive-mode for agents and tests, audit-mode for reviewers.

QA on real apps

Replace flaky Playwright suites that re-break every time the design system swaps a class name. Selectors are accessible names; the substrate handles re-renders, hidden duplicates, and title-level success observation.

Pre-launch website audits

Run the Inspector on the staging URL before a release. Catch the dead button, the 35x21 mobile tap target, the modal that occludes another modal, the page that never settles because of a third-party tracker. Real bugs, not Lighthouse noise.

LLM browser agents

Give your model a structured world map instead of screenshots. 7.7× fewer tokens per decision and a security ceiling that refuses anything consequential without a human-minted single-use token.

Accessibility reviews

Aria-stuffing flagged where it actually hurts. Descriptive link prose passes through as a NOTE so screen-reader-friendly long names don't read as bugs. Sub-44 tap targets, unnamed actionables, and reach failures are sorted into actionable severities for a triage list.

Lead enrichment and scrape

Pull values off pages without writing CSS selectors. The substrate addresses elements the way a human would describe them: "the Email field," "the Subscribe button."

Compliance and audit

Every action emits a hash-chained, fsync'd ledger receipt. Single-byte tamper of any past receipt is detectable. The Ceiling refuses consequential dispatch without a single-use Ed25519-signed token.

How it actually works

Three primitives, one loop.

Continuous perception

The DOM, network traffic, animations, and accessibility tree fuse into one stream. The substrate addresses every actionable element by stable id; the id survives re-renders via signature inheritance.

Settle by observation

The substrate never reports "ready" until the page has actually quiesced: no fixed sleeps, no networkidle guesses. The same engine catches CSS animations, async route transitions, and focus-induced component swaps.

The Ceiling

Every consequential action is refused without a single-use, action-bound, Ed25519-signed human token. The model can't self-certify. The ledger detects tampering. Mislabeled actions get caught at the wire.

What the playground will not do

The playground tier refuses every consequential intent at the wrapper, before it ever reaches the Ceiling. Specifically: no submitting forms, no clicking Send/Pay/Subscribe/Delete-classified buttons, no POST/PUT/PATCH/DELETE requests at the wire. If you want a real end-to-end task with a human-in-the-loop approver, that is the beta.