StratumJournal
March 6, 2026Probe / ResearchOS

Why Your Commit History Isn't Lab Memory

When a new researcher joins a computational lab, they usually get two things: a GitHub organization invite and a handshake. The assumption is that the code history contains the knowledge. It contains the changes. Those are not the same thing.

Version control is the most important tool computational science has for tracking what a codebase does at any given point in time. That's not in question. What is in question is whether the commit log — or any derivative of it — constitutes a lab memory. It doesn't. And the gap between “we have everything on GitHub” and “a new researcher can reconstruct why we do things the way we do” is where most computational labs lose a year of every PhD student's time.


The diff is not the decision

A commit records a state transition. The diff shows you what lines changed. The message, when it exists at all, tells you the proximate cause: “switch to PBE+U for transition metal oxides,” “update k-point grid,” “fix convergence issue on Mn structures.”

What a commit cannot record is the decision. Not the change — the decision. Why PBE+U and not HSE06 for this material class. What the convergence tests showed that made the k-point density increase necessary. Which three other approaches failed before this one worked. What the Mn failure mode was and how you know it's fixed rather than masked.

These things live in the researcher who made the commit. Sometimes in a lab meeting. Sometimes in a Slack thread that's now unindexed. Sometimes in a notebook that graduated with the PhD student in April.

$ git log --oneline src/vasp_workflow/functional_selector.py
7f2a3b1  switch perovskite calcs to PBE+U (U=3.5)
4d9e0c7  revert HSE06 for transition metals — too slow
2b1f8a9  try HSE06 for better gap accuracy
c73d4e2  initial functional: PBE

# What the log doesn't contain:
#
# - Why U=3.5 and not U=4.0 (benchmarked against exp. lattice params?)
# - Which perovskite classes still need HSE06 and why
# - What "too slow" meant in terms of allocation budget
# - Whether the HSE06 results were actually wrong or just expensive
# - The paper that justified this choice (it's not linked anywhere)
# - The postdoc who made commit 7f2a3b1 defended in May

The commit log is a change log, not a knowledge log. The distinction matters because knowledge is what you need when you're running a new material class and wondering whether to trust the current defaults — or when a reviewer asks you to justify a methodological choice from three years ago.

The archaeology problem

Every computational lab that has run for more than two or three years has experienced some version of this: a new researcher inherits a workflow and spends weeks doing what amounts to archaeological reconstruction. They read the commit history, which tells them the sequence of changes but not the reasoning. They find old lab meeting slides, which may contain fragments of the rationale, or may not. They track down the previous researcher, who remembers some things clearly and has forgotten others. They run the code on systems they understand and try to reverse-engineer the design choices from the output.

This is expensive in the obvious way — it takes time. But it's also expensive in a less obvious way: the reconstruction is usually incomplete. The new researcher forms a mental model of why the code works the way it does. That mental model will contain gaps and errors that they won't be able to identify, because they don't know what they're missing.

A commit log tells you what the lab decided. It doesn't tell you what the lab learned — what alternatives were tried, what constraints were binding, what the decision would look like if those constraints changed.

This is particularly acute for the choices that feel obvious in hindsight. The “obviously we use PBE+U for Mn oxides” kind of choices — the ones where the current lab members don't document the rationale because it feels too basic to bother explaining. Those are exactly the choices that cost a new researcher the most time to reconstruct.

Why more documentation doesn't solve it

The natural response to this problem is documentation: write better commit messages, maintain a lab wiki, require researchers to document their methodological choices. Labs that have tried this know how it goes. The documentation is thorough for the first few weeks. Then a deadline arrives. Then another. The wiki stops being updated. The commit messages revert to one-liners. The methodological choices accumulate undocumented, and the problem compounds.

The issue is not that researchers don't value documentation. Most do. The issue is that good documentation requires stopping work to explain work you just did, in enough detail that a future researcher with different context will understand it, at a moment when you are thinking about the next problem, not the last one.

That's not a discipline failure. It's a structural mismatch between when knowledge is generated (during active research) and when documentation systems expect input (at designated documentation moments that interrupt active research). The knowledge evaporates in the gap.

What version control is actually good for

None of this is a critique of Git. Version control is the right tool for what it does: tracking state changes, enabling collaboration without conflicts, providing a reversible history of code modifications. These are genuinely important properties. They are just not the properties that constitute lab memory.

Lab memory requires capturing intent alongside state. Not what changed, but why the change was the right response to the research question at that moment. Not which parameters are set, but the convergence tests and literature that justify those parameters for those material classes. Not the sequence of commits, but the sequence of decisions — including the decisions that didn't make it into the code at all because the approach didn't work.

A researcher who can answer “why do we use this functional?” is not reading the commit history. They are drawing on accumulated context from lab meetings, conversations, debugging sessions, and their own experimental intuition. When that researcher graduates, that context graduates with them. The commit history remains.

The multi-agent coordination problem

Labs that build AI agents for their research workflows hit this problem in a particularly sharp form. An agent that runs VASP jobs on Alpine can tell you the INCAR settings it used. It cannot tell you why those settings were chosen for that material class — because that reasoning was never captured anywhere the agent can access.

The result is an agent with perfect short-term memory and no institutional memory. It repeats experiments that were already ruled out. It applies defaults that were appropriate for the original system and wrong for the new one. It cannot answer the question every new researcher eventually asks: “did we already try this?”

This is the problem ResearchOS is built to solve. Not version control — Git already does version control. The layer above it: the persistent record of why, what-else-was-tried, and what-the-lab-learned, surfaced when the next researcher needs it.


Your repository is intact. Every commit is preserved. The code runs exactly as it did when the postdoc who wrote it defended their thesis. What isn't preserved is the conversation they had with you in your office the week before they left, when they explained the three things that the code handles wrong and how to catch them. That conversation was your lab memory. The commit log is its shadow.

Probe / ResearchOS

ResearchOS maintains the reasoning layer above version control — capturing the decisions, constraints, and methodological context that accumulate in conversations rather than commits. Design partner trials are open for computational research labs.

Request early access →
← All essays