Mathematical AI Reasoning Core · Research Brief v0.1 · Target: AAAI-27

M AR C

A model that derives mathematics instead of recalling it.

Instead of writing a chain of words, MARC represents a problem as a constraint graph and solves it by iteratively denoising that graph toward a checker-verified state — offloading arithmetic to an exact engine and reasoning about structure.

Denoising diffusion over constraint graphs CAS-guided · checker-rewarded Paper due Jul 27, 2026
Scroll
The premise
01 — CONCEPT

Why not just a chain of thought?

Chain-of-thought works, but three cracks motivate a different substrate. MARC keeps the reasoning and removes the cracks by moving computation into a structured, verifiable object.

Crack 01

It can recall

Much benchmark performance reflects solutions seen in training, not reasoning. A system that can recall an answer can't be trusted to have derived it.

Crack 02

It can lie to itself

The written chain often doesn't match the computation that produced the answer — a post-hoc story, unreliable to study or verify.

Crack 03

It fumbles arithmetic

Token models compute digits poorly and burn capacity tracking them. The mechanical part belongs in a calculator, not the model.

Mathematics is the right first domain because answers are verifiable. A computer-algebra engine or proof kernel supplies an objective signal with no human labels — so MARC can be built as the tightest possible loop between generating a solution and checking it.
The idea
02 — IN DEPTH

What MARC actually does

Four ideas do all the work — and none of them has the model write out its reasoning in words. The thinking happens inside a structured object that can be checked at every step.

01 · REPRESENT

A problem becomes a graph

Every quantity in the problem — knowns and unknowns alike — is a variable node. Every relation it must satisfy — an equation, an inequality, a definition — is a constraint node wired to the variables it touches.

There is no first step or last step. The whole graph is the problem, and a solution is simply an assignment where every constraint is satisfied at once.

02 · SOLVE

Solving is relaxation, not search

The model doesn't hunt for a path to the answer. It starts from a noisy guess and denoises the whole graph — nudging every value toward consistency, again and again, until nothing is violated.

The injected noise isn't a flaw: it's what lets the process escape states that look locally fine but are globally wrong — the trap that sinks ordinary constraint solvers.

03 · COMPUTE

The calculator does the arithmetic

A computer-algebra engine measures exactly how violated each constraint is. Those residuals are the signal that steers every denoising step.

So the model never grinds through digit arithmetic — the thing it's worst at. It spends capacity on which constraints to attack and how to move, and leaves exact computation to the engine.

04 · VERIFY

Trained only against a checker

The single reward is whether a checker accepts the result. There's no answer key to copy, so the model can't fall back on recalling a memorized solution — it has to produce something that actually checks out.

We confirm it's truly deriving by testing on unseen structure and far larger problems than it trained on. Memorization can't fake that; only a real procedure generalizes.

THE TWO VERSIONS

Denoise the numbers — or denoise the graph itself

▸ VALUE DIFFUSION · the MVP
The graph's wiring is fixed; only the values at the nodes are noisy. Denoising settles them onto the solution. Essentially a learned, stochastic equation solver — buildable, and the first real result.
▸ STRUCTURE DIFFUSION · the frontier
The graph itself is noisy — edges and nodes are added and removed during denoising. This is what lets the model invent intermediate objects (an auxiliary variable, a lemma) a fixed graph can't express. Hardest, most novel, highest risk.

How it differs from the usual approaches

Chain-of-thoughtSymbolic tree searchMARC
Represents a problem asa stream of text tokensdiscrete expression / proof statesa constraint graph of quantities & relations
Makes progress byappending the next token, left to rightbranching and searching over discrete movesdenoising the whole graph toward consistency
Computation lives inthe token stream (often parroted)the search policya learned relaxation + an exact calculator
Can it recall an answer?Yes — a real riskPartlyNo — it must produce a checkable derivation
Verified byusually just the final answereach move valid by constructionevery residual must vanish; the checker gates it
The mechanism
03 — HOW IT WORKS

Watch a graph relax into a solution

A tiny system of equations as a constraint graph. Variable nodes hold noisy values; constraint nodes show how violated each equation is. Each denoise step nudges every value at once — guided by the calculator's residuals — until the checker accepts.

x 0.00 y 0.00 z 0.00 x+y=3 r=0 y+z=4 r=0 x+z=5 r=0 VARIABLE NODES · CONSTRAINT (FACTOR) NODES
Global consistency‖r‖ = 0.00
awaiting consistency

The loop, in four moves

The solution here is x=2, y=1, z=3. Press Denoise step and watch the residuals fall from red to amber as the whole graph settles at once — no left-to-right token order.

  • 1Constraint graph. The problem becomes nodes (quantities) and factors (equations) — currently a noisy guess.
  • 2Denoise step. One round of message passing nudges all values toward consistency together.
  • 3Calculator. The CAS computes each residual exactly — that signal is what steers the step.
  • 4Checker. When every residual vanishes, the checker accepts. That acceptance is the only training reward.

Why noise? Injected noise lets the relaxation escape locally-consistent-but-wrong states — the failure mode of deterministic solvers. Try + Inject noise mid-run.

iterate until consistent constraint graph noisy state denoise step message passing checker accepts result calculator (CAS) residuals → guidance
The graph is refined by a learned denoiser; the calculator scores each constraint; the checker gates the result — repeat until consistent.
The plan
04 — TIMELINE

Seven weeks to AAAI-27

The full paper is due July 27, 2026, with the abstract and title locked a week earlier on July 20. The plan: build the verifiable core first, scope the hard math to preliminary results, and reserve the final stretch for writing.

days to the paper deadline
counting down to Mon, Jul 27, 2026
Abstract + title locked  Mon · Jul 20, 2026
Full paper due  Mon · Jul 27, 2026
AAAI-27 · Montréal · Feb 16–23, 2027
Wk 1
Jun 8–14
Wk 2
Jun 15–21
Wk 3
Jun 22–28
Wk 4
Jun 29–Jul 5
Wk 5
Jul 6–12
Wk 6
Jul 13–19
Wk 7
Jul 20–27 ✦
Infrastructureeveryone · parallel tracks
P0 · schema · CAS · diffusion · eval · model stub
Core modelingDavin · Quang · Akash · Sparsh
P1 · solve loop · denoiser · data loader · baseline eval
Training & evaluationeveryone · parallel tracks
P2 · GRPO · rewards · paper eval · CoT baseline
Structure diffusioneveryone · parallel tracks
P3 · structure proto · toys · H2 eval
Paper writingeveryone
draft → figures → polish → submit
✦ Week 7 is submission week — abstract registered Jul 20, full paper in by Jul 27.
P3 is deliberately scoped to early, promising results; the full structure-diffusion build and everything in P4 is future work, after submission. Dates follow the AAAI-27 CFP listing — deadlines are track-specific, so confirm them on the official call for papers.
The crew
05 — TASK ASSIGNMENTS

Who owns what

Filter the board by person. Each card is a step-by-step plan with exact file paths. All Python code goes in a new folder marc/ at the repo root — it does not exist yet; you create it and push to github.com/saidlaboratory/MARC. See TECHNICAL_GUIDE.md §13 for the full layout.

D Davin Q Quang A Akash S Sparsh
Parallel by design. Tasks within a phase run on separate tracks — use stubs and the shared marc/data/examples/two_equations.json contract so nobody blocks anyone. Integration happens at the end of each phase, not at the start. Davin & Quang = modeling/diffusion. Akash & Sparsh = data/eval/checker.