ResearchMarch 13, 20267 min read

When Your Lab Is Also a Company

Some research groups have built tools that industry actually uses — licensed, deployed at scale, spun into startups. That creates a knowledge problem no one has designed a product for.

A certain class of computational research group has done something unusual. They have built tools that other people actually use — not papers, but software. Tools with thousands of external users, tools that run at pharmaceutical companies and national labs, tools that have been licensed to industry or spun into companies. The groups that built these tools are often run by PIs who understand both the academic and commercial sides of research better than anyone. And almost without exception, they have a knowledge management problem that nobody has addressed.

The problem is structural, and it emerges from the moment the first license is signed.

The Divergence

An academic lab and a commercial entity have fundamentally different knowledge needs. The lab needs exploration — it tries things that fail, pivots between hypotheses, pursues hunches, and generates enormous amounts of negative data that never makes it into any publication. The company needs reproducibility — it needs to know exactly what worked, why it worked for that specific compound class or material system, and whether the method can be transferred reliably to a new person who wasn't in the room when the decisions were made.

Both of these are legitimate. But they accumulate separately. And the gap between them widens every day the lab runs without a shared context layer.

The lab and the company use the same codebase. They don't share the same reasoning.

Consider a group that has built a machine-learned interatomic potential — trained on hundreds of thousands of DFT calculations, licensed to industrial partners. The paper that describes the training methodology exists. The model itself is publicly available. But the decisions behind the training data — which configurations were included in the training set and which were excluded, which system classes were underrepresented and why that was accepted, what the failed training runs before the published protocol looked like — those decisions live in the graduate student who ran the curation campaign. When that student joins a pharma company in June, the decisions go with them.

The company now holds a license to a tool whose training provenance is partially undocumented. When they ask "can we extend this to compound class X," the people who could answer that question have moved on.

Why the Standard Solutions Miss This

Most knowledge management tools are designed for one context or the other — not the transition between them.

Version control captures what the code looked like at each point in time. It does not capture the reasoning behind the code — which design choices were made and why, what the alternative implementations that didn't make it into the commit looked like, what the failure mode was that the current version was designed to avoid.

Lab notebooks (Notion, Obsidian, paper) are personal. They capture what one person knows, organized the way one person thinks. Industrial partners cannot search them. New hires cannot query them. They do not survive the PI or the senior postdoc who maintained them.

Published papers are publication-time snapshots. They capture the endpoint of a research process — the method that worked — and omit the four previous methods that didn't, the parameter choices that were tested and abandoned, and the system-specific knowledge that was acquired along the way. The evolution between papers is gone.

Shared drives and wikis capture what someone chose to document at the moment they chose to document it — which is almost never during the busy phase of a research campaign, and often happens as a rushed retrospective at the end. The reasoning behind choices is not the same as a description of the choices.

The Commercial Cost of Undocumented Methods

For a purely academic lab, the cost of undocumented methods is measured in rediscovered knowledge — a new student who spends six months re-learning what the previous student figured out. Painful, but contained.

For a lab whose methods have commercial downstream users, the cost is different. When a partner at a pharmaceutical company asks whether the tool can be extended to a new compound class, they are implicitly asking: does anyone at the lab know why the training protocol was designed for the original compound classes, and can that reasoning be applied to the new case?

If the answer is "yes, our postdoc who designed the training pipeline knows," the tool is effectively undocumented. The IP is in the postdoc's head. When that postdoc leaves, the partner has a license to a black box.

The tools you've built are commercially valuable. The knowledge of why they work — for which systems, under which conditions, with which caveats — is more valuable still. And it's almost entirely undocumented.

What the Lab-Company Knowledge Bridge Looks Like

The specific problem is not documentation — it's capture at the moment of generation. Training decisions are made continuously, during the research process, not at publication time. The method for capturing them has to be present in the workflow.

What this looks like in practice: as DFT calculations are run for a training set, the rationale for which configurations were selected is captured alongside the calculation — not in a separate notebook, but linked to the actual data. When a model version is trained, the connection to the specific simulation runs that generated its training data is preserved. When a postdoc makes a parameter choice that differs from the previous version, the reasoning for the change is queryable later — not locked in a Slack message or a personal notebook.

For the industrial partner who asks "can we extend this to compound class X," this means the answer comes from the lab's actual documented history of what was tried, not from whoever happens to be in the building that week.

The Labs This Matters Most For

This problem is most acute for groups that have done one or more of the following: licensed tools to industry, co-founded a company from lab research, maintained open-source tools with a large external user base, or operated as a de facto consulting partner to pharma or materials companies.

These groups have created commercial obligations around their methods — obligations that require the methods to be transferable, reproducible, and understandable by people who weren't part of the original research. Meeting that obligation from a base of undocumented methods is possible, but it requires the original researchers to be available indefinitely. Which they are not.

ResearchOS captures the reasoning layer in computational research labs — not just what ran, but why — and makes it queryable by the group, by new hires, and by the collaborators and partners who depend on the lab's methods being more than a publication. If your methods have moved beyond the lab, the knowledge behind them should be able to follow.

ResearchOS is institutional memory for computational research labs. We work with computational chemistry, materials science, and physics groups whose methods are advancing science — and, in some cases, industry. If this resonates, probe.onstratum.com.