March 4, 2026Probe / ResearchOS

The Tool With No Manual

Every productive computational lab eventually builds a tool nobody else has. A custom extension to LAMMPS for a specialized sampling method. A Python wrapper that chains their DFT workflow together in exactly the way their research requires. A machine learning potential training pipeline assembled from three different frameworks, duct-taped together by a postdoc who understood every seam.

These tools are the lab's competitive advantage. They encode years of scientific judgment — about which approach converges, which shortcut introduces error, which edge cases to watch for. They are usually on GitHub. They usually have a README. And they are almost never documented in any way that captures why they work the way they do.

When the person who built one leaves, the tool keeps running. It's the understanding of it that stops.

What is not in the README

A README describes what a tool does and how to run it. It rarely describes why specific design choices were made — because those choices felt obvious to the person making them, and because explaining the obvious is tedious.

But the obvious is only obvious to the person who accumulated the background to make it obvious. When a new graduate student inherits a custom LAMMPS extension, the implementation is right there in the C++ files. What is not there:

Why this thermostat and not that one. What the first implementation got wrong before this version. Which molecular system classes the current implementation fails on quietly, and how an experienced user catches that failure. What the performance characteristics are on the cluster the lab actually uses, as opposed to the laptop the code was developed on. Why the parameter defaults were set to the values they were — convergence tests that took weeks, now invisible.

The code is the answer. The README is a guide to using the answer. Neither one records the problem that was being solved, the alternatives that were tried, or the judgment calls embedded in every non-trivial line.

That context is the manual. And it does not exist.

The onboarding tax

When a new researcher joins a lab that has custom tooling, they face a choice: trust the tool, or understand it.

Trusting it means running it as given and hoping the parameters make sense for their system. For many researchers, especially early-stage graduate students, this is the only realistic option. They don't have the scientific background yet to evaluate the implementation choices, and there's no documentation to help them.

Understanding it requires what amounts to an archaeological project. Read the code. Trace through the commit history. Find old Slack messages. Track down the previous user and hope they remember. Reconstruct the reasoning from outputs — run the tool on a system you understand, see if it matches expectation, form hypotheses about what it's doing and why.

Most researchers do some version of the archaeological project, more or less consciously. It takes anywhere from weeks to months, depending on the complexity of the tool and the depth of their prior background. It is entirely invisible work — it produces no outputs, generates no papers, earns no credit.

# The new student's first month with an inherited workflow
$ git log --oneline tools/mlpotential_train.py
a3f7d2c  fix edge case for high-symmetry structures
b891e4f  update default hyperparams
c204a71  add gpu support
d115f90  first working version

# None of these commits answer:
# - Why these hyperparameters specifically?
# - What "edge case" was the fix addressing?
# - Which system classes should we test before trusting this?
# - The postdoc who wrote this graduated in April.

The scientific knowledge embedded in code

Custom scientific software is different from general software in a specific way: the design choices are not just engineering decisions. They are scientific decisions.

A database schema encodes data relationships. A custom LAMMPS extension encodes a theory about how a physical system behaves, which approximations are acceptable for the research questions the lab is asking, and which failure modes matter. Understanding why the code works the way it does requires understanding the science, not just the programming.

This is why standard software documentation practices — docstrings, READMEs, API references — are necessary but not sufficient. They document the interface. They do not document the scientific judgment embedded in the implementation.

The gap becomes acute when the tool is used outside its original context. A machine-learning potential trained on equilibrium configurations may behave poorly for a new researcher studying non-equilibrium processes. A sampling extension optimized for a specific energy landscape may converge slowly or incorrectly on a different system. The person who built the tool would recognize these failure modes immediately. The person inheriting it has no way to know they exist.

When external adoption creates internal pressure

Some custom lab tools get released publicly and adopted by the broader community. Salmon for RNA-seq quantification. LAMMPS extensions for specialized molecular dynamics. DFT workflow tools used by dozens of groups.

External adoption creates a pressure the lab often doesn't anticipate: users file issues describing behavior the original developer would immediately understand, but which no current lab member can diagnose. Support questions arrive that reveal the gap between what the README says and what an experienced user knows. The tool's reputation becomes attached to the lab's name — and the institutional knowledge required to maintain that reputation is increasingly absent.

This is the tool-with-no-manual problem at its most visible: a piece of software with hundreds or thousands of external users, maintained by a lab whose collective understanding of why the tool does what it does diminishes every time a PhD student graduates.

What capturing this knowledge would actually require

The knowledge embedded in a custom lab tool is not a document that can be written once. It is a living record: the series of design decisions made over the tool's lifetime, the failure modes discovered in use, the extensions that were tried and discarded, the system classes where the tool breaks down and the workarounds that experienced users apply.

This kind of knowledge accumulates through conversations — code reviews, debugging sessions, the brief exchange when a postdoc explains to a new student why the default parameters are set the way they are. It is generated continuously, not at publication time.

Capturing it requires a layer that's present in those conversations and can synthesize them into something queryable. Not a wiki — wikis require deliberate documentation work that never happens consistently. Not commit messages — commit messages describe what changed, not why the change reflects a scientific judgment. Something that accumulates context as work happens, so that when a new researcher joins in three years and asks “why does this handle high-symmetry structures differently?” — there is an answer.

ResearchOS is built for this. It maintains context across a lab's computational workflows — including the conversational context around code development and debugging — and surfaces it when a new researcher needs to understand not just how a tool works, but why it works that way.

The tool your postdoc built is still running. In two years, a new student will inherit it and spend three months learning what they built in three weeks. The question is whether those three months are recoverable.

Probe / ResearchOS

ResearchOS is in design partner trials with computational research labs that produce custom scientific software. If your group has built tools that encode significant scientific judgment — and you've felt what happens when the person who built them leaves — we'd like to hear from you.

Request early access →

← All essays