March 18, 2026Probe / ResearchOS

The Pharma Audit Problem

A machine-learning interatomic potential trained on twenty million DFT calculations gets licensed to a pharmaceutical company. The licensing agreement is straightforward. What nobody anticipated is what happens when the pharma partner's regulatory team asks a question the lab has no way to answer.

Not a technical question. A documentation question. Something like: which configurations were included in the training set, and why were others excluded? Or: what functional was used as the reference level for this molecular class, and what validation was performed on that choice? Or, more bluntly: if this method appears in a regulatory submission, can you defend it?

The answers exist. They are in a postdoc's memory, partially reconstructable from a Jupyter notebook, inferrable from a methods section that describes what was done without explaining the decisions behind it. For an academic publication, that is sufficient. For the downstream regulatory context the pharma partner operates in, it is not.

Two different documentation standards

Academic publication is designed to establish that a method produces valid results. The methods section answers: what did you do, at sufficient detail for a specialist to reproduce it? Peer review validates that the approach is sound. Citation establishes that others have used and extended it. These are the relevant standards for building scientific knowledge.

Pharmaceutical regulatory documentation is designed to establish that a specific result can be relied upon in a specific context for a specific decision. It answers a different set of questions: why was this method chosen over alternatives for this application? What are its failure modes, and how were they characterized? What training, validation, and testing was performed, and what were the results? If the method changes, how will changes be tracked and validated?

These are not the same questions. A methods section that satisfies peer review at a top journal may be entirely insufficient for regulatory purposes — not because the science is wrong, but because it was written to answer different questions.

The gap is not between good science and bad science. It is between documentation that establishes validity and documentation that establishes accountability. Academic labs produce the first. Pharmaceutical regulatory environments require the second.

What the gap actually looks like

For a computational chemistry tool licensed from an academic group to pharma, the documentation gap typically has three components:

What exists (academic)	What pharma needs
Training set size (e.g., “trained on 20M DFT calculations”)	Data curation criteria — what was included, what was excluded, and on what basis
Published benchmark accuracy on standard test sets	Validation performance on the specific molecular classes in the partner's pipeline
Reference level (functional, basis set) in methods section	Rationale for reference level choice per molecular class; known limitations for specific chemistries
Model architecture and hyperparameters (published)	History of architecture decisions — what was tried, why it was changed, what tradeoffs were made
GitHub repository (current version)	Version history with documentation of what changed between versions and why
Lab contact for questions	A queryable record that doesn't depend on the availability of a specific person

The information in the right column was generated during the tool's development. It is not in the paper. It may be partially reconstructable from lab notebooks, Slack histories, and conversations with people who are still in the lab. It may not be, if enough time has passed and enough people have left.

When the gap becomes visible

The documentation gap is invisible until it is not. For most academic-to-pharma technology transfers, it becomes visible at one of a few specific moments:

The first is when the pharma partner needs to validate the tool for their specific application. Validation requires understanding failure modes for the chemistry they're working with — which means knowing which molecular classes the tool was tested on, which it was not, and what the expected performance boundary is. If the lab doesn't have this documented, the pharma partner has to generate it themselves, which is expensive and creates the awkward situation of the licensor not being able to answer basic technical questions about their own tool.

The second is when the tool produces a result that informs a regulatory decision. Not a final IND or BLA decision — regulatory agencies don't accept ML methods for those directly yet. But increasingly, computational methods inform the decision of which candidates to synthesize, which experiments to run, which hypotheses to prioritize. When those decisions are downstream of a licensed academic tool, the pharma partner needs to be able to document their basis for trusting the tool's outputs. That documentation requires information the tool's developers may not have recorded.

The third is routine: due diligence. A licensing agreement is a contractual relationship, and at some point — audit, partner review, acquisition — someone will ask what the scientific basis is for this tool. “Published in Nature Chemistry” is not a complete answer. The institutional knowledge behind the publication is also part of the answer, and it is the part that walks out the door when postdocs graduate.

The structural problem

Academic labs are not designed to produce pharmaceutical-grade documentation. This is not a criticism — they are designed to produce scientific knowledge and train researchers. The incentive structure of academic science rewards publication velocity, not documentation depth. A postdoc who spends extra time writing internal documentation of training data curation decisions is spending time that does not appear on their CV.

# The documentation that exists vs. what's needed
# ------------------------------------------------
# TRAINING SET: what the paper says
training_set = {
    "size": "20M DFT calculations",
    "source": "high-throughput screening campaign",
    "reference_level": "ωB97X-D/def2-TZVP"
}

# TRAINING SET: what pharma's regulatory team needs
training_set_full = {
    "size": "20M DFT calculations",
    "curation_criteria": ???,      # Why these configurations?
    "exclusion_criteria": ???,     # What was filtered and why?
    "coverage_analysis": ???,      # Which molecular classes are represented?
    "known_gaps": ???,             # What's known to be out-of-distribution?
    "version_history": ???         # How did this evolve across model versions?
    # Stored in: Dr. Chen's notebook (Chen graduated April 2024)
}

The structural problem is that the documentation required for downstream pharma use needs to be captured during the research process — not at publication time, and not when the licensing conversation happens. By the time a pharma partner asks these questions, the researchers who could answer them may be at Genentech, Pfizer, or insitro, accessible only by email, their memories of specific decisions from three years ago imperfect.

What capturing this knowledge would require

The documentation pharma needs is not a document. It is a record of the decision process that produced the tool: what configurations were included and why, what was tried and discarded, what the performance boundaries are and how they were characterized, how the architecture evolved and what drove the changes.

This record is generated continuously during research — in the reasoning behind experimental choices, in debugging sessions, in the conversations where a postdoc explains to a collaborator why the current version handles a specific molecular class differently than the previous one. It cannot be written down after the fact, because after the fact the people who could write it are gone.

Capturing it requires a layer that is present during the research process and accumulates context as work happens. Not a wiki, which requires explicit documentation effort that never happens consistently under publication pressure. Something that synthesizes context from lab conversations, experiment records, and computational logs into a queryable institutional record — so that when a pharma partner asks why the training set excluded a specific molecular class, the answer exists in a system, not only in a person.

ResearchOS is built for this. For labs that produce tools with pharmaceutical applications, it maintains the decision trail behind those tools — the curation criteria, the validation experiments, the architecture choices — in a form that can be queried and documented at the moment the downstream context requires it.

The pharma company licensed your tool because it works. What they need now — and what will become more pressing as AI methods enter regulated workflows — is evidence that they can account for it. The science is in the paper. The accountability trail is in the institutional memory of the lab that built it. The question is whether that memory still exists.

Probe / ResearchOS

ResearchOS is in design partner trials with computational chemistry groups whose tools have entered pharmaceutical or materials industry applications. If your lab has licensed tools and you have felt the documentation gap this essay describes, we'd like to hear from you.

Request early access →

← All essays