MARC — Mathematical AI Reasoning Core

The reframe

Two kinds of decision. Only one is worth learning.

Solving a constraint problem splits cleanly. We bet on the wrong half first, measured it honestly, and moved the learning to the half where classical methods have nothing.

Continuous · what values satisfy the equations

Owned by classical solvers

Smooth, gradient-rich, and covered by sixty years of numerical analysis. Levenberg–Marquardt with restarts saturates our hard nonlinear families. A learned proposal ties or loses to plain random restart once solutions are coupled.

→ delegate. never compete.

Discrete · what representation makes it tractable

Owned by nothing, until now

Which auxiliary quantity to introduce. Which substitution linearizes the system. No gradient exists over this space, enumeration grows combinatorially, and a solver cannot decide to invent d = x − y. A learned prior amortizes exactly this choice.

→ the bet. one forward pass vs. exponential search.

What we measured

The evidence chose this framing.

We ran the controls that could kill our own first idea, and they did. The negatives are reported with confidence intervals and kept in every paper we write: they are the motivation, measured.

Coupled systems · learned value proposals

Ties or loses to random restart at every dimension

The earlier high-dimensional win was a separability artifact. The "learned proposal beats classical search" route is closed.

Classical baseline · Levenberg–Marquardt

1.000 solve rate on the hard families

With restarts, LM saturates the nonlinear suites. Raw solve rate can never be the claim; cost and representation can.

Stochasticity in search · pre-registered RQ

Entrapment 1.000 → 0.475 with noise (−0.525 ± 0.086)

Deterministic descent always traps; Langevin noise escapes. Real and CI-backed, but it argues for noise in search, not for a learned denoiser.

Structure selection · trained policy

Beats its controls; clean numbers pending

The one learned component that beat random-slot and no-context controls. Regenerating at scale under a contamination-proof seed protocol before any number is cited.

House law: every rate ships with N and a Wilson interval, every comparison with a corrected significance test, and structure-selection numbers exist only if the run's own artifact proves its seeds were clean. We caught one contaminated result ourselves, withdrew it, and rebuilt the protocol so the eval refuses to repeat the mistake.

The system

Watch structure make a problem solvable.

A fixed graph with no consistent assignment. The policy proposes one auxiliary: a node flips from absent to active with its defining relation. A classical solver then fills the values and the checker accepts. That flip is the entire thesis.

variables violated factor proposed structure checker-accepted

The invention ladder

From selection to generation, one falsifiable rung at a time.

Every rung keeps the same end-to-end bar: a proposal counts only if the augmented graph actually solves and passes the checker. Until the last rung, the honest term is menu-based structure selection. "Invention" names the destination, not the current claim.

Menu selection

Pick the correct augmentation from K procedurally generated candidates: exactly one is solvable by certified construction, and hard negatives share the right structure with the wrong constant.

built · trained

Predicted defining value

The policy's value head supplies the defining constant itself. The candidate space becomes continuous; the menu only provides insertion structure.

built · clean numbers pending

Compositional & multi-aux

Choose the insertion set and defining relation independently; instantiate several auxiliaries at once. The slot schema already supports it.

designed

Free-form generation

Emit the defining expression itself: invention proper. The prize, out of scope until the rungs below it hold.

future

Results integrity

Rules we cannot break.

Classical solvers are always baselines. refine, lm, exact, random are labeled on every table and never presented as system results.
No number without its interval. Every rate carries N and a Wilson CI; every comparison carries a Holm-corrected significance test.
Seed hygiene is mechanical, not promised. Train, validation, and test seed ranges are disjoint by contract; the eval asserts it at checkpoint load and refuses to run otherwise.
Withdrawn means gone. Results that fail the protocol are withdrawn in the ledger and never cited, including our own.
Provenance or it didn't happen. Every published number maps to a command, seed, and commit in paper/PROVENANCE.md.