なぜ私たちは失敗した実験を公開するのか
本サイトのすべてのクレームには、反証可能なテストが裏付けとして存在します——そしてテストが「否」と言えば、私たちはその否を報告します。その規律こそがプロダクトです。
It is easy to publish wins. The harder, more valuable discipline is publishing the experiments that were supposed to win and did not — because a research programme you can only see the highlights of is one you cannot trust. We try to run the other way.
Every headline on this site sits behind a falsifiable test with a threshold set before the run. The point of fixing the bar in advance is that it removes the temptation to move it afterwards.
Falsifiable by design
A continuous-time model only counts as a win if it wins where timing matters and honestly loses where it does not. A memory result only counts if it holds on a held-out, leakage-safe split, not a slice chosen after the fact. When a verifier cannot run, the reward is reported as undefined — never quietly replaced with a number that makes the chart look finished.
- Thresholds fixed before the run, not after.
- Held-out, leakage-safe splits for anything that claims to generalise.
- A win must come with the matching loss: where does this not work?
- No faked signals — an unrunnable check is reported as undefined.
The negatives we stand behind
We have falsified two paradigm-sized claims of our own. The first: that a new architecture is what drives grounded learning — it is not; the rearing method is, and a conventional backbone reared the same way ties. The second: that our small-model efficiency edge holds at frontier scale — it does not; beyond a point a Transformer pulls ahead, and we publish the crossover instead of hiding it.
The negatives are not a confession. They are the reason the positive numbers are worth anything.
ReasonLoom research principles
Why it is the product
For the people who will rely on these systems — clinicians, compliance teams, engineers shipping to constrained devices — the most useful thing we can offer is not a bigger number. It is a number you can act on, with its limits stated next to it. That is what falsifiable research buys, and it is the standard we hold every release to.