ResearchMarch 17, 20267 min read

The Lab Memory Problem

A new postdoc joined a computational materials science lab I know. Smart, prepared, motivated. On her first week, she spent four hours reading a paper on nanocrystal defect characterization — taking careful notes, marking key figures, flagging methodological questions.

The previous postdoc had already done this. A year earlier, he'd worked through the same paper, annotated it in detail, and written a summary noting which figures were directly applicable to their specific synthesis system and which were misleading for their use case.

That postdoc graduated nine months ago. His notes were in a Google Drive folder no one uses anymore.

This is not a story about laziness or bad documentation practices. The previous postdoc documented carefully. The lab had good intentions. The problem is architectural: research institutions were not designed to accumulate working knowledge. They were designed to produce published knowledge. The two are not the same thing, and the gap between them is where most of a lab's actual intelligence lives.

What Gets Published vs. What Gets Used

A published paper is a curated artifact. By design, it presents the approach that worked, the figures that support the argument, and the methods section sufficient for nominal replication. It does not include the six failed approaches before the one that worked, the note about which batch of reagents was inconsistent, the afternoon someone realized the temperature calibration was off by twelve degrees, or the judgment call — made by a second-year grad student, never discussed again — about which parameter to hold fixed.

That omitted knowledge is not incidental. It is, in many cases, the most valuable knowledge the lab has. It is what distinguishes a researcher who has been in the lab for three years from one who arrived last week. It is what lets a PI say “we tried that in 2022, here's why it didn't work” rather than repeating two months of effort.

Published papers are the tip of the iceberg. The knowledge that actually drives decisions never gets written down — and when the person who held it leaves, it goes with them.

This working knowledge — the unstructured, informal, institutional intelligence that accumulates in Slack threads, meeting notes, annotated PDFs, failed experiment rationale, and HPC job configurations — is not captured by any current research infrastructure. Electronic lab notebooks capture structured experimental data. Literature tools surface published findings. Neither touches the actual decision layer of how science gets done.

The Turnover Equation

The average US PhD takes 5.5 years. The average postdoc appointment is 3 years, with most researchers doing 2–3 appointments before leaving academia or landing a faculty position. Lab turnover rates — the proportion of a lab's active members who leave in a given year — typically run between 20% and 35%.

At a 25% annual turnover rate, a lab of eight people replaces its entire team in four years. In practice, the transition happens unevenly — some people overlap, some don't — but the compound effect is the same: a lab that has been running for ten years may have almost no personnel overlap with the lab that ran five years ago.

The math

A computational lab runs for 12 years. 25% annual turnover. The postdoc who set up the core simulation pipeline in year 2 graduated in year 4. The grad student who wrote the internal parameter documentation left in year 6. The PI who designed the original research program is the only person with continuous institutional memory — but she manages 8 people, attends 40% of grant-required meetings, and has not personally run a calculation in three years.

The lab's effective institutional memory at any given moment is approximately: what the current senior grad student can access, plus what the PI can recall, minus everything that was documented only in the systems of people who have left.

This is not a dysfunction. This is the default operating mode of almost every research lab in the country. It produces genuinely excellent science — papers get published, grants get renewed, PhDs get awarded. What it does not produce is compounding institutional intelligence. Each cohort largely rebuilds from scratch.

The Reproducibility Layer

The reproducibility crisis in science is usually framed as a methods problem: insufficient statistical rigor, underpowered studies, selective reporting, p-hacking. These are real and serious. But there is a parallel reproducibility crisis that gets less attention: the inability to reproduce one's own prior work.

A lab returns to a line of research after a two-year gap. The PI who ran it is still there; several of the original researchers are not. The published papers exist. The data exists, archived somewhere. The methods sections are technically complete. And yet: the lab cannot reproduce the conditions under which the original work was done, because the conditions were never fully documented. They lived in the heads of the people who did the work.

This is not hypothetical. It is a recognized and studied phenomenon in HPC-adjacent research — where long-running computational jobs accumulate configuration decisions, parameter choices, and error history that no one formally captures. When the person who ran the jobs leaves, their “knowledge” of the job — the kind of knowledge that would let you debug a failure at 3am — does not exist anywhere in a retrievable form.

Why Current Tools Don't Solve This

The research software market has addressed every version of this problem except the actual one.

Electronic lab notebooks (Benchling, LabArchives, Scispot) are excellent at capturing structured experimental data: protocol steps, sample identifiers, instrument parameters, results tables. They are not designed for — and do not capture — the unstructured reasoning layer: why a protocol was designed the way it was, what was tried before it, what the failure modes of the alternatives were.

Literature tools (Elicit, Consensus, ResearchRabbit) are excellent at surfacing what the world has published. They have no access to what your lab has done — the private, unpublished, often unwritten knowledge that is the actual basis for most lab decisions.

Team knowledge tools (Notion, Tettra, Confluence) capture knowledge that people explicitly choose to document. But the most valuable lab knowledge is exactly the knowledge that nobody thought to document explicitly: the hallway conversation that resolved an ambiguity, the Slack thread where someone figured out why the previous approach failed, the email where the PI gave informal guidance that quietly became lab orthodoxy.

The knowledge that Probe surfaces is mostly the stuff that nobody thought to document explicitly — which is also the stuff that nobody can find when they need it.

What a Memory Architecture Looks Like

The problem is not that labs fail to document. Most do, in some form. The problem is that the documentation is stored in ways that make it inaccessible to future researchers — scattered across systems, indexed by file name rather than meaning, locked in the accounts of people who have left.

A lab memory architecture has three requirements that current tools don't satisfy together:

Persistence across personnel changes. Knowledge captured in the account of a researcher who left is effectively lost. A memory system needs to outlast the person who created the memory — owned by the lab, not by the individual.

Retrieval by meaning, not by file name. The postdoc who needs to know “what did we try with catalyst X in 2023?” will not find the answer by searching for a file name. She needs a system that can reason across the lab's history and return a specific, cited answer from actual lab records.

Capture without friction. Any system that requires researchers to explicitly document will capture a fraction of what needs capturing. The Slack thread, the annotated PDF, the meeting summary, the HPC job config — these need to be indexed automatically, as a byproduct of normal work, not as a separate documentation task.

This is what Probe is built to do. It indexes the unstructured knowledge layer of a research lab — Slack, Notion, annotated papers, experiment notes, HPC job histories — and makes it retrievable by meaning. When a new postdoc joins, the institutional knowledge doesn't reset. When the PI asks what approaches have been tried on a given problem, the answer comes from the lab's actual history, not from memory or Google Drive archaeology.

The working knowledge a lab accumulates over ten years is valuable. Right now, most of it evaporates every five.

Probe — Lab Memory Infrastructure

Probe indexes the institutional knowledge inside your research lab and makes it retrievable across personnel changes, long-running projects, and expanding teams. Design partner pricing available for Q2 2026.

Learn more at probe.onstratum.com →