ResearchMarch 3, 20268 min read

The Onboarding Tax

When a new graduate student joins a computational materials lab, she does not start from zero. She starts from negative: before she can do anything useful, she has to reconstruct a context that already exists — scattered across people's heads, old Slack threads, the annotated PDFs of someone who graduated, and the half-documented config files on a shared HPC partition.

This reconstruction takes time. Depending on the lab, the project, and how well the previous generation documented their work, onboarding a new researcher in a computational lab typically absorbs four to eight weeks of productive time — the researcher's and the lab's both. The senior grad student explains things she has explained before. The PI has the same orientation conversation for the fourth time. The new person reads papers that the lab has already internally characterized, runs preliminary tests on approaches the lab has already tried and abandoned, and asks questions whose answers live in a Notion page nobody told her about.

This is the onboarding tax. It is paid in full, by every new researcher, on every rotation.

The Compounding Structure of the Tax

What makes the onboarding tax expensive is not just its immediate cost — it is that the tax compounds across the lab's history.

A typical computational lab runs 7-12 people with 20-30% annual turnover. Over a five-year span, the entire graduate student and postdoc cohort replaces itself roughly once. Each departure takes knowledge out of the system. Each arrival requires rebuilding a partial version of it. The institutional intelligence doesn't accumulate — it oscillates around whatever the current PI can remember and the senior researcher in the room happens to know.

The five-year ledger

Lab started in 2021. Eight researchers. 25% annual turnover.

2021–2026: approximately 10 researcher transitions. Each one: 4–8 weeks of onboarding overhead, split between the new person and the existing lab members who provide orientation.

Optimistic estimate: 10 transitions × 4 weeks × 0.5 FTE equivalent = 20 researcher-weeks of compounded onboarding overhead. That is five months of lost research productivity — before accounting for the knowledge that left with each departure and was never fully recovered.

This calculation understates the real cost because it only captures the explicit orientation time. It does not account for the decisions made on incomplete context — the experiment that was run because the new researcher didn't know it had already been tried, the approach that was adopted because the institutional reason for abandoning it was no longer in anyone's memory.

What the Tax Is Actually Buying

The onboarding tax purchases a partial, lossy reconstruction of something that should have been preserved. The knowledge being rebuilt is not exotic — it is the ordinary working knowledge of a functioning lab:

Which approaches have been tried on the central research question and why each was abandoned or deprioritized. Which parameter regimes are known to produce noise versus signal. Which HPC job configurations have been tested and what the failure modes of each are. Which papers are relevant to the current synthesis direction and which are misleading for the specific system being studied. What the PI's actual opinions are on the three open questions in the subfield — not the public positions, the working ones.

None of this knowledge is secret or difficult to express. It is simply not stored in a form accessible to someone who wasn't there when it was created. It lives in the memories and informal communication patterns of the people currently in the lab — which means it resets with every departure.

The onboarding tax is not a documentation failure. It is a retrieval failure. Most of the knowledge exists somewhere. The problem is that it cannot be found by someone who doesn't already know where to look.

Why Standard Documentation Doesn't Solve It

The instinctive response to this problem is documentation: lab wikis, shared Notion spaces, onboarding checklists, “knowledge transfer sessions” before a researcher departs. These efforts are genuine and often earnest. They do not solve the problem.

First: documentation captures what people think to document, not what's actually valuable. The hallway conversation where someone figured out why the VASP calculation kept diverging does not become a Notion page. The Slack thread where the lab debated which basis set to use before settling on a preference — documented in no formal system, but influential over three years of subsequent calculations — is not retrievable by someone who was not in the channel.

Second: documentation is indexed by its creator's mental model, not by its future reader's questions. A file called “HPC_notes_Fall2024.md” is findable only if you know it exists and what it contains. A new researcher asking “what parameters did we use for the surface relaxation runs?” will not find it unless someone tells her to look there.

Third: documentation requires bandwidth that researchers rarely have. A departing postdoc has a defense to prepare, a job to start, and a life to move. A 30-page knowledge transfer document is optimistic. A two-hour conversation is more realistic — and a two-hour conversation is not retrievable by the next person who needs the same information two years from now.

The Retrieval Architecture Problem

The underlying issue is architectural. Research labs have designed their knowledge systems for two outputs: published work (papers, datasets, code repositories) and in-person transmission (conversations, mentorship, informal training). The first is queryable but thin — it captures what was worth formalizing. The second is rich but volatile — it disappears with the person.

What is missing is a system that captures the working knowledge layer — the informal, decision-driving context that accumulates between publications — and makes it retrievable by meaning rather than by file name or personal familiarity with the lab's folder structure.

That is what Probe does. It indexes the Slack channels, annotated papers, meeting notes, HPC job histories, and experiment rationale that constitute a lab's actual working knowledge — and makes them queryable by the questions new researchers actually ask. Not “what files exist about this topic?” but “what did we try with this approach, and why did we move away from it?”

The onboarding tax is not an unavoidable cost of running a research lab. It is the cost of not having a memory architecture that persists beyond the people who hold the knowledge. The knowledge exists. The problem is retrieval. That is an infrastructure problem, and infrastructure problems have infrastructure solutions.

Probe — Lab Memory Infrastructure

Probe indexes your lab's institutional knowledge — Slack, meeting notes, annotated papers, HPC histories — and makes it retrievable across personnel changes. The onboarding tax compounds. The solution is architecture.

Learn more at probe.onstratum.com →