2026‑07‑03
latest
v2: sleep harness, escapes 0 → 4 levels, RHAE, trace explorer
A harness coordinate bug was silently eating every click. Eight fixes later, the agent clears level 4 on ft09 — with a data-driven run viewer over the real traces.
level 4 cleared
RHAE 17.9
21 runs · 8 fixes
read the week →
2026‑06‑30
v1: code-native baseline, failure ladder, first architecture
The first honest log: one autonomous CodeAct loop, an LLM-free skill gate, and the failure-mode chain — built and genuinely code-native, not yet solving a level from scratch.
0 levels from scratch
failure-mode ladder
LLM-free skill gate
read the week →