Pipelines are composed from versioned, declarative components that travel with their dependencies.
Emmy
A workspace orchestrator that runs reproducible research pipelines across scientific domains.
A workspace orchestrator that travels with the research. Pipelines run, replay, and audit identically.
Orchestration that travels with the research
Emmy is the orchestration tool we use to keep scientific research reproducible across domains. It composes our research building blocks into pipelines that can be run, replayed, and audited, with the data versioning and dependency management visible to the scientists using it. It is the layer that lets a result move from a notebook into a defendable artifact.
Three properties
Replays are deterministic. Re-running a pipeline yields the same artifacts unless the inputs changed and that change is visible.
Emmy is used across biology, materials, and cognition without becoming a domain framework. It is workflow, not modelling.
Five replays. One hash.
The same pipeline produces the same artifact every time the inputs match. When the inputs change, the change is visible at the row level.
Four kinds, composed into anything.
versioned input — data, model, or artifact pinned by hash.
deterministic transform from inputs to outputs.
graded against a fixed suite, output includes the trace.
final artifact pinned to the run, replayable end-to-end.
One orchestrator, three sciences.
biology
sequence pipelines, structure prediction, multi-stage analyses with versioned references.
materials
silicon flow runs, analog block sweeps, layout-to-LVS pipelines kept reproducible across processes.
cognition
training and evaluation pipelines for cognitive substrate research, including multi-seed runs.
Why we built it
Research velocity is bottlenecked by reproducibility infrastructure. Emmy is our answer: a workspace orchestrator that lets a scientist hand off a pipeline to a collaborator and get the same answer back.