Research tools · deterministic replay · domain-agnostic

Emmy

A workspace orchestrator that runs reproducible research pipelines across scientific domains.

A workspace orchestrator that travels with the research. Pipelines run, replay, and audit identically.

building blocks source · transform · evaluation · report
replay deterministic by construction
domains biology · materials · cognition
What Emmy is

Orchestration that travels with the research

Emmy is the orchestration tool we use to keep scientific research reproducible across domains. It composes our research building blocks into pipelines that can be run, replayed, and audited, with the data versioning and dependency management visible to the scientists using it. It is the layer that lets a result move from a notebook into a defendable artifact.

How Emmy is organised

Three properties

E1 Composable building blocks

Pipelines are composed from versioned, declarative components that travel with their dependencies.

E2 Reproducible by construction

Replays are deterministic. Re-running a pipeline yields the same artifacts unless the inputs changed and that change is visible.

E3 Domain-agnostic

Emmy is used across biology, materials, and cognition without becoming a domain framework. It is workflow, not modelling.

Replay log

Five replays. One hash.

The same pipeline produces the same artifact every time the inputs match. When the inputs change, the change is visible at the row level.

run timestamp output hash match
run #1 2026-04-12 09:14 sha256:5c0a…b71e identical
run #2 2026-04-14 22:38 sha256:5c0a…b71e identical
run #3 2026-04-22 11:02 sha256:5c0a…b71e identical
run #4 2026-05-03 16:50 sha256:5c0a…b71e identical
run #5 2026-05-19 07:22 sha256:5c0a…b71e identical
Building blocks

Four kinds, composed into anything.

01
source

versioned input — data, model, or artifact pinned by hash.

02
transform

deterministic transform from inputs to outputs.

03
evaluation

graded against a fixed suite, output includes the trace.

04
report

final artifact pinned to the run, replayable end-to-end.

Domains we use it in

One orchestrator, three sciences.

BIO

biology

sequence pipelines, structure prediction, multi-stage analyses with versioned references.

MAT

materials

silicon flow runs, analog block sweeps, layout-to-LVS pipelines kept reproducible across processes.

COG

cognition

training and evaluation pipelines for cognitive substrate research, including multi-seed runs.

01

Why we built it

Research velocity is bottlenecked by reproducibility infrastructure. Emmy is our answer: a workspace orchestrator that lets a scientist hand off a pipeline to a collaborator and get the same answer back.

Workflow that lets a pipeline travel without losing its result.