research build log · ARC-AGI-3 · weekly snapshots

▣ code‑native/arc‑agi‑3 — archive

An agent that plays ARC‑AGI‑3 games by writing Python against the frame, with an execution-grounded “sleep” phase that mines skills and lets the game itself be the judge. Each card below is one week’s snapshot of the log — newest first.

2026‑07‑03 latest

v2: sleep harness, escapes 0 → 4 levels, RHAE, trace explorer

A harness coordinate bug was silently eating every click. Eight fixes later, the agent clears level 4 on ft09 — with a data-driven run viewer over the real traces.

level 4 cleared RHAE 17.9 21 runs · 8 fixes
read the week →
2026‑06‑30

v1: code-native baseline, failure ladder, first architecture

The first honest log: one autonomous CodeAct loop, an LLM-free skill gate, and the failure-mode chain — built and genuinely code-native, not yet solving a level from scratch.

0 levels from scratch failure-mode ladder LLM-free skill gate
read the week →