portfolio·active research·ARC-AGI-3

Inductive biases (within reason).

An approach to the ARC-AGI-3 Reasoning Challenge.

Active researchARC-AGI-3Symbolica × ArcgenticaOpus 4.6 agent

why this matters

Two bottlenecks in AI reasoning:

1.Adapting to unseen environments and situations (e.g. a robot doing household tasks in my kitchen when it was trained in Bob’s kitchen).
2.Discovering new knowledge that we haven’t yet observed in the world, but that doesn’t contradict our worldview of what we believe is true (e.g. valid hypotheses for closed-loop labs, or new mathematical equations).

I’m using ARC-AGI-3 to explore some facets I think could be part of the solution.

my approach

·Impose inductive bias at the structural level instead of the primitive level. Rather than hand-coding the mechanical primitives of each game, we want the agent to discover them itself. To shrink the large search space, though, we still need to impose a human bias: build perception and embedding modules that let the agent view and represent the puzzle through a human lens. For ARC-3, instead of only raw values, the agent sees: objects, similarity classes, and recurring pair relationships. The right priors collapse the hypothesis space before the model starts reasoning.
·Reason within a framework. Build a harness that can leverage the primitives it discovers to reason about how to solve the task. The human framework balances exploration of the unknown with exploitation of its current knowledge of the potential goal and surrounding environment. The harness poses hypotheses, then uses the modules to run experiments and update its current beliefs about the potential goal and surrounding environment.

current progress

A relationship hierarchy built from frame 0.

Identify the objects on the grid and analyze their 1st, 2nd, and 3rd order relationships.

nextTrack how these relationships change over time to identify the game mechanics.

view interactive demo →

opens in a new tab (mobile layout)

what i built

Two technical components.

01
A relationship hierarchy over the objects in each puzzle.
It extracts every object from the frame, then scores how similar each pair is on a fuzzy scale. Similarity is computed not just on the objects themselves (1st order) but also on each object's relationships with other objects (2nd order), and the patterns those relationships form (3rd order).
+detailshide
Objects come out of the frame as exact pixel sets. The 1st-order score for a pair is a fuzzy combination of shape (canonical-resize IoU), colour, size, and orientation. The 2nd-order score lifts this to how each object sits in its neighbourhood: which kinds of objects it touches, how they're arranged. The 3rd-order score compares the relationship patterns themselves. Together, these scores cluster objects that are the same kind of thing even when raw pixels disagree, and surface pair motifs that recur across the scene.
02
A reasoning harness that combines (01)'s perception with memory and an explore-experiment loop.
Scaffolds the explore, hypothesize, experiment cycle. The harness wraps (01)'s perception and embedding layer as one of its modules, then adds memory and reasoning components so the agent can build a worldview across resets instead of starting from raw pixels each turn.
+detailshide
Candidly, this performance does not beat their baseline yet. I’m building this layer to plug into Symbolica’s existing Arcgentica harness, combining (01)’s structured perception with a memory and reasoning loop that mirrors how humans explore unfamiliar environments. The harness composes several modules. The perception and embedding layer from (01) runs object detection every frame and hands the agent a list of entities with shapes and positions. A memory layer tracks coverage, frame-shift detection, action information gain, oscillation detection, and explicitly snapshots state before a RESET so what was witnessed survives. A reasoning loop interleaves hypothesis formation with experimentation, surfacing the exploit-versus-explore tension explicitly rather than leaving it to chance. Live-tested on bp35 with Opus 4.6: the agent reached level 1 and used the new tools throughout (53+ perception calls, 68 memory calls in one run). Next: pipe (01)’s discovered hierarchy into this harness and run controlled comparisons.

←

back to the voyage

next project →

Pattern Discovery for Anomaly Detection