Science tools

Holobiont

Compressing the laws underneath protein dynamics into physically-invariant signals.

thesisfind the invariant, not the lookup table
auditnested cross-validation · leakage budget reported on every feature
practiceretire what does not survive · publish the negative
lift attribution · one representative feature
83%
17%
  • leakage through pre-computed retrieval axes
  • residual mechanism after nested CV
The Holobiont thesis

Find the invariant, not the lookup table

Holobiont is a science program built around a specific idea: there are physically-invariant signals underneath protein dynamics that compress better than any amount of memorisation. We pursue those signals, audit them ruthlessly against retrieval-axis leakage, and publish the negatives when an attractive feature turns out to be measurement artifact.

most pipelines

add features until the leaderboard moves and ship the leaderboard, even when the lift is structural artifact.

holobiont

biases the pipeline toward features that look like physics — compact, invariant, auditable — and retires the rest before they reach production.

How Holobiont is structured

Three commitments

HB1

Compress, don't memorise

We aim for compact mechanisms that look like the laws of physics, not for embeddings that look like the data.

f(x) compact mechanism lookup table
rewards compact mechanisms that look like the laws of physics punishes embeddings that look like the training set
HB2

Audit retrieval axes for leakage

Pre-computed retrieval axes are notoriously easy to leak test labels through. We default to nested cross-validation and report leakage budgets explicitly.

fold 1 fold 2 fold 3 fold 4 fold 5 test PDB never touches the retrieval axes
rewards nested cross-validation as the default, leakage budgets reported punishes pre-computed retrieval axes used without audit
HB3

Publish what does not work

Where a feature class hurts performance once leakage is removed, we say so. The product is the mechanism, not the leaderboard.

F1 kept F2 kept F3 retired F4 kept F5 retired F6 kept
rewards retiring a feature class when it does not survive the audit punishes leaderboard lift kept after leakage is found
coverage cliff

Where a tempting feature class quietly turns negative

Column statistics — Shannon entropy and amino-acid frequency — look attractive at low coverage. Past a coverage budget around the middle of the chart, they begin to hurt the best estimator. We retired them for production use.

+ 0.02 + 0.01 0.00 -0.01 -0.02 -0.03 -0.04 0 0% 25% 50% 75% 100% cliff · ~53 % coverage Δ Spearman ρ on held-out PDBs coverage budget →
Each point is a feature variant under audit. Below the zero line, the feature is hurting the estimator. We do not ship below zero.
Negatives and progress

What Holobiont has decided

Each decision is a measured statement, not a marketing one. Where a result hurt, the result and the retirement live on the page.

  1. D15.1

    Retrieval-axis leakage quantified

    A large fraction of an apparently strong feature's lift was attributable to leakage through pre-computed k-NN axes. The methodology was tightened accordingly.

    D15.1 83 % of apparent lift was retrieval-axis leakage
  2. D22

    Conservation features retired at coverage

    Shannon and frequency column statistics hurt our best estimator beyond a certain coverage budget. We retired them for production use and kept exploring direct-coupling pairs.

    D22 Δρ −0.0382 on the best estimator above the coverage cliff
  3. Today

    Mechanism-first feature pipeline

    The current pipeline is biased toward features that look like physics. Each candidate has an explicit leakage audit and an explicit coverage budget before it ships.

    today physics-shaped features only, with explicit leakage audit

Why this is its own program

Protein dynamics is the obvious place to put pressure on the "compress, don't memorise" idea. Holobiont is where we run that pressure — and where we have already retired features that looked attractive but did not survive the audit.

cross-cuts

Holobiont is where the compress-vs-memorise idea gets stress-tested

The mechanism-first audit travels across our research programs — from evaluation discipline through to alignment posture. Where a feature does not survive here, it does not ship anywhere.