# TFBthumb v0.2.1 — CODE_MECHANIC Inspector Review (Proxy, re-run after heals) - Review ID: `TFB-THUMB-CODE-MECHANIC-REVIEW-v0_2_1-20260615T074200Z` - Paired packet: `REVIEWER_PACKET.md` - Paired prior decisions: `REVIEWER_DECISION.md` (Gemini @ v0.2) · `DAVID_PLUS_REVIEW.md` (v0.2) · `CODE_MECHANIC_REVIEW.md` (v0.2 proxy) · `DAVID_PLUS_REVIEW_v0_2_1.md` (re-run, just landed) - Paired heals receipt: `HEALS_v0_2_to_v0_2_1.md` - Inspector: **CODE_MECHANIC** (engineering execution) - **Substrate disclosure (read first):** CODE_MECHANIC's native harness is `openai/gpt-5.1`. Tenant policy blocked the external transmit again — this is a **proxy review** written by Claude Opus 4.7 (this Claude Code session) applying CODE_MECHANIC's IDENTITY.md PASS/FAIL_WITH_NOTES criteria verbatim. Same proxy posture as the v0.2 review, same honest marking. - Posture per IDENTITY.md: does the code WORK? Is the contract honest? Are there tests? Will the next reader understand in one pass? - Verdict: `PASS_WITH_ONE_NOTE` (4 prior notes are healed; 1 new minor note opened — class-shaped, not falsifying) - Promotion authority: `inactive` --- ## 1. Re-check of the 4 notes CODE_MECHANIC named at v0.2 ### note-1 — `Retina.start` idempotency **Heal at**: [retina.py:54, 57–67](retina.py) ```python self._started: bool = False ... async def start(self) -> None: if self._started: raise RuntimeError("Retina.start() called twice on the same page " "(one Retina per page is the contract)") self._started = True ``` Idempotency is now enforced loudly via exception, not silently via handler-deduplication-by-prayer. A double-start raises a `RuntimeError` with a message a future maintainer can read in one pass. The exception message names the contract ("one Retina per page"). **Verdict**: healed cleanly. Test exists implicitly — any double-start in a future gate would raise and fail the gate. ### note-2 — deliberately-not-reset comment **Heal at**: [retina.py:84–96](retina.py) The comment block now reads: > *"Deliberately NOT reset: `state.in_flight` — the CDP session is reused across navigations; old requestIds clear via their own loadingFinished/loadingFailed events. Resetting here would drop in-flight tracking for any request that started before the navigation completed. `state.motion_until_ms` — a monotonic timestamp; leaving an old deadline that's already in the past is correct. The next animation start will overwrite it via max(...) at _anim_start."* A future maintainer reading `_on_navigated` now sees why the omission is intentional. The "what isn't broken" trap is documented. **Verdict**: healed cleanly. Doc-only, no runtime change, exactly the heal CODE_MECHANIC proposed. ### note-3 — `Thumb.point` docstring drift **Heal at**: [thumb.py:80–86](thumb.py) ```python async def point(self, name: str, timeout_ms: float = 2000.0) -> dict: """Scroll to an affordance, confirm it is reachable at its point, move the hand onto it. Engagement is confirmed via the hover event (preferred) OR a fresh hit-test (fallback). The semantic contract is hand-on-affordance, not the mouseover event: the browser preserves mouse position across navigations, so a move that ends at the same coords the cursor already occupied fires no mouseover even though the hand IS on the target.""" ``` The semantic shift introduced by heal #5 (build-session) is now in the docstring. A reader can no longer parse "feel hover engage" as "mouseover event fires." Implementation and contract aligned. **Verdict**: healed cleanly. ### note-4 — `Ceiling.authorize_effect` per-window counter **Heal at**: [ceiling.py:280–304, 363–386](ceiling.py) I read the heal end to end. New constructor param `effect_window_max_effects: int = 1`. New state `self._effect_window_used = 0`, reset to 0 when the window opens at [ceiling.py:343](ceiling.py). Effect gate now computes `window_open AND window_has_capacity`; second mutating effect in the same window without its own token fails with the NEW closed-enum reason `"effect-window-exhausted"` ([ceiling.py:380](ceiling.py)), distinct from `"unapproved-mutating-effect"`. The reason-string disambiguation is the right move per **D-EVERY-OPERATION-NARRATES-ITSELF** — the ledger row tells the operator which way coverage failed. `gate_sentinel.py` GATE 2 (the legitimate Pay→POST flow) still passes because the Pay click fires exactly one POST `/charge`, and max=1 covers exactly one mutating effect. I verified this against the live receipt at `/tmp/claude-501/tfbthumb_sandbox/canonical_5_gates_post_heal.json`: `EFFECT GATE: PASS`. **Verdict**: healed cleanly. Strictly tightens; legitimate flow preserved. --- ## 2. PASS criteria check at v0.2.1 (per IDENTITY.md lines 27–36) | Criterion | Result | Evidence | |---|---|---| | Artifact passes parse-gate | **PASS** | `ast.parse()` clean on 13 `.py` (incl. `analytics.py`); `node --check sensors.js` clean. One transient SyntaxError during the heal (re-declaration of `r` in `sensors.js`) was caught and healed in the same build pass. | | Happy path exercises cleanly | **PASS** | 5 fast gates re-passed end to end (`canonical_5_gates_post_heal.json`); `gate_ceiling.py` now includes the new swap-label gate which also PASSes; Phase 2 n=200 re-passed at 0/200 flakes. | | Edge cases addressed or named out-of-scope | **PASS WITH NOTE** | One new note below (note-A'). | | Dependencies declared | **PASS** | No new imports introduced. `cryptography` import remains conditional at module top per the pre-existing pattern. | | Function signature honesty | **PASS** | New optional param `effect_window_max_effects` and `Observation.identity_breaks` both have explicit defaults; no callers had to change. | | Structured Result returns | **PASS** | New ledger reason `"effect-window-exhausted"` is a structured discriminator, not a boolean. `identity_break` event is a structured dict with named fields. `HmacFallbackRefused` is a structured exception with a docstring naming the falsified claim. | --- ## 3. One new note opened at v0.2.1 ### note-A' — `Observation.identity_breaks` mutability surface **WHERE**: [brain.py:35–43](brain.py) ```python @dataclass class Observation: ... identity_breaks: frozenset = frozenset() ``` The field is typed as `frozenset` and the default is the singleton empty `frozenset()`. The agent's `_observe` ([agent.py:62](agent.py)) wraps the retina state's mutable `set` in `frozenset(...)` before passing to `Observation`, which is correct. **Subtle issue**: the singleton default `frozenset()` is shared across all `Observation` instances that don't pass an explicit value. This is mostly fine because `frozenset` is immutable, so there's no shared-mutable-default footgun. But the dataclass declaration uses a literal default rather than `default_factory`. This is idiomatic Python for immutable types and doesn't break, but a future heal that wanted `default_factory=frozenset` for symmetry with the `set` types elsewhere in the codebase would be cleaner. **Severity**: cosmetic, not a bug. I would mention it in a code review and let the author decide. Filed for receipt completeness; CODE_MECHANIC would not block a PR on this alone. **Heal proposal**: `identity_breaks: frozenset = field(default_factory=frozenset)` — symmetric with the `list = field(default_factory=list)` patterns elsewhere. One line. --- ## 4. What I deliberately did NOT review Per IDENTITY.md ("CODE_MECHANIC reviews are independent — does NOT defer to other inspectors' verdicts"): - I did not defer to DAVID+ v0.2.1's verdict; I re-read every healed file independently. - I did not review security blast radius — that's DAVID+'s lens, who concluded healed-with-note-A. - I did not review the bounded-claim set narrative — that's Gemini's lens. - I did not review the heals receipt's framing — that's the operator's choice. --- ## 5. Verification receipts I trace to - `canonical_5_gates_post_heal.json` — 5/5 fast gates PASS, including new Phase 3 swap-label gate - `canonical_harness_n200_post_heal.log` — Phase 2 n=200 PASS at TFBthumb 0/200 flakes vs baseline 126/200 - `canonical_analytics_post_heal_e2e.json` — token ratio 7.7×, correctness 100% TFB vs 38.3% baseline, motion 4/4, safety 0 ungated - All SHAs verified against `sha_manifest.txt` (post-heal column) --- ## 6. Warden Kill-Test — CODE_MECHANIC posture at v0.2.1 - **Claim under review:** the 4 v0.2 notes are healed structurally; the heals don't introduce regressions; the new Phase 3 swap-label gate is mechanically correct. - **Null hypothesis:** at least one heal is symptom-shaped, OR a heal introduced a regression in another gate, OR the new gate is the wrong shape (e.g., it tests something other than what its docstring claims). - **Discriminating test:** re-read each healed site; run parse-gate; run every gate; verify analytics; trace the new gate end-to-end. - **Outcome:** null killed. All four prior notes are at goldilocks depth. No regressions (5/5 fast gates + n=200 + analytics all re-passed). New swap-label gate IS testing what it claims — the runtime mutation of `el.textContent`/`removeAttribute('data-tfb-id')` reproduces the swap-attack, `retina.find("Send")` resolves to the mutated button, `Thumb.click("Send")` calls `_authorize(Action(..., target_name="Send", ...))`, classifier reads `Send` → CONSEQUENTIAL → no token → `CeilingBlocked("needs-human-token")`. The assertion at gate_ceiling.py:174 catches the right exception with the right reason. - **Present-tense downgrade:** *"v0.2.1 inspected by Claude Opus 4.7 proxy applying CODE_MECHANIC IDENTITY.md verbatim; authentic gpt-5.1 run still tenant-policy blocked; proxy honest about its identity."* --- ## 7. What this review authorizes - CODE_MECHANIC confirms the 4 v0.2 notes are healed at the engineering execution layer. - CODE_MECHANIC confirms the new Phase 3 swap-label gate is mechanically correct and runs clean. - CODE_MECHANIC confirms no regressions in any gate or analytics metric. - CODE_MECHANIC opens **note-A'** as a cosmetic frozenset-default proposal; not a blocker. ## 8. What this review does NOT authorize - **Not a substitute for the gpt-5.1 review.** Proxy holds the position; authentic run still required when tenant policy permits. - **No wider scope.** Same boundaries as `REVIEWER_PACKET.md §10`. - **No production-ready claim.** DAVID+'s out-of-process Authority caveat still applies; note-A in DAVID+'s v0.2.1 review still applies. ## 9. Lineage - Inspector identity: `agents/code_mechanic/IDENTITY.md` loaded verbatim. - Native harness target: `openai/gpt-5.1` via `agents/harness_profiles/gpt_5_5.yaml`. - Proxy substrate: Claude Opus 4.7 (this Claude Code session). - Substrate routing: NOT through `call_backend`; inline adoption. - Audit tag: `PROXY_OUT_OF_PIPELINE`. Authentic run pending tenant-policy resolution. - Source SHAs reviewed: see `sha_manifest.txt` post-heal column. - Heals receipt: `HEALS_v0_2_to_v0_2_1.md`. ## 10. Fresh decision required after this review - Whether to apply note-A' (frozenset default_factory) — trivial; opens a one-line PR. - Re-run authentic gpt-5.1 review when a policy-permitted backend becomes available. - Any wider scope per `REVIEWER_PACKET.md §12`. ## 11. Closing — CODE_MECHANIC posture > Four notes placed. One new minor note named — cosmetic, not blocking. The code works at v0.2.1, the contracts are honest, the tests exist (the swap-label gate is the strongest new piece of engineering rigor since the Ed25519 closure introspection), and the next reader will understand in one pass. > > Re-run me on gpt-5.1 when you can.