AI × Biology Series: Where Value Accrues

writing • AI x biology series

part 4: where value accrues

the build map. where to invest, where to build, what happens next. the India angle.

Where Value Accrues

where value accrues

style

blogbaneer

biology is digitising. the same reorganisation is coming. the question is which layer captures the value and who owns it.

models commoditise. every capability that can be abstracted gets abstracted. GPT-Rosalind, Claude Life Sciences, AlphaGenome commercial, Amazon Bio Discovery, five platforms with similar claims launched within weeks of each other. that is a commodity market forming in real time.

value in digitising information systems concentrates at the layer that controls ground truth.

layer one: human genetic target validation

pursuing genetically validated targets increases the probability of clinical success by more than 2x and expands the frontier of drug feasibility by 35%. that is BridgeBio's own published model, peer-reviewed in Drug Discovery Today, and validated by their own results: three positive Phase 3 readouts in just over three months. their entire operating thesis is built on one insight, targets with human genetic evidence win at disproportionate rates, and it is working in practice, not just in theory.

the tools to apply this filter at scale are arriving simultaneously. despite comprising nearly 25% of the world's population, South Asians make up less than 2% of participants in global genome-wide association studies. a study of over 1,000 individuals across Indian subpopulations found significant variation in drug-metabolising genes like CYP2C9 and CYP2C19, variants that did not just differ from European populations but varied widely within South Asian subgroups. the drugs being designed right now are being designed against the wrong population genetics for two billion people.

whoever builds the infrastructure that makes human causal evidence systematically accessible, across diseases, across populations, across the full regulatory genome, owns the highest-value data layer in biology.

layer two: where the investment filter actually fires

the correct frame for where to build in AI biology comes from Melissa Du: ML is presently useful for reducing combinatorially large search spaces with closed-loop signal from wet lab verification. the question is never can AI solve this biology problem. it is three questions: is the search space large enough that AI compression matters? is the functional assay clear and fast enough to close the loop? can the model and the experiment be made to talk to each other in near-real time?

where all three fire simultaneously, the evidence is already there. antibody design, Chai-2 at 16% de novo hit rate versus less than 0.1% prior, a 160x improvement, because the binding assay is clear, the combinatorial space is enormous, and the loop between model and experiment is fast. mRNA design, sequence optimisation for expression, stability, and immunogenicity, where Moderna-scale validation has confirmed the approach and the functional assay is unambiguous. epigenetic reprogramming, NewLimit betting on transcription factor perturbations to predict and reverse cellular aging, with Yamanaka factors as proof of concept that specific combinations of roughly 1,600 human transcription factors can reprogram cell state, and a bounded search space with a clear phenotypic assay.

Schrödinger's zasocitinib, the first AI-designed drug to clear Phase 3, with FDA submission in 2026, is the proof that the filter works at the molecule design layer too. physics-based simulation reducing a 180,000 CPU-hour problem to 4.5 hours, then closing the loop with clinical validation. the ceiling is not iteration speed. it is a qualitatively different kind of science.

wherever the search space is large but the assay is unclear or slow, the virtual cell, systems disease modelling, multi-organ interaction, the filter does not fire. the capital is going there anyway. that is the hype gap.

layer three: the regulatory evidentiary framework

this is the most underappreciated build opportunity in the entire space.

the FDA published its first comprehensive AI framework for regulatory decision-making in January 2025, with final guidance expected in Q2 2026. the FDA and EMA released joint guiding principles in January 2026. the regulatory infrastructure is moving fast. but the evidentiary standard, what counts as proof when the system generating the proof was itself learning throughout the process, is still being written.

the framework is moving toward radical transparency and rigorous validation. black box systems will not clear NDA submission. the audit trail standards do not yet exist. nobody has built the full stack: the statistical theory for continuously adaptive trial systems, the credibility assessment methodology that regulators will actually accept. whoever does does not just build a company. they set the terms on which every AI-native drug program gets approved for the next thirty years.

layer four: workflow integration depth

the real resource moat is not algorithmic access. it is proprietary biological data generation and wet-lab validation capacity. organisations that depend on public data have no competitive differentiation. the structural advantage is the ability to invest in data generation at scale, not access to algorithms.

Anthropic acquired two ex-Genentech ML founders for $400M. the tell is the acquisition, not the model release. Recursion merged with Exscientia specifically to integrate phenomic screening with automated precision chemistry into an end-to-end platform. the moat is depth. the companies doing it well are not publishing. they are embedding.

the layer nobody is building

the entire current wave of AI for biology is being built on Western biobank infrastructure. Western regulatory relationships. Western hospital data. the genomic foundation models, the virtual cell initiatives, the drug discovery platforms, trained on data from populations that are majority European ancestry.

as of 2021, 86.3% of GWAS studies were conducted in individuals of European descent. South Asian participation was 0.8%, and that proportion has stagnated or decreased despite more studies being conducted overall. India has the largest genetically distinct population on earth almost entirely absent from that foundation.

the South Asian biobank does not exist at scale. the India-specific regulatory genome map has not been built. the clinical trial infrastructure connected to India's patient population is not wired into the global drug discovery stack.

the data is geographically and biologically specific. it cannot be generated remotely. it requires local relationships, local clinical infrastructure, local regulatory engagement. it compounds with every patient enrolled, every genome sequenced, every trial run.

the autonomous lab is being built in San Francisco. the virtual cell initiative is American. the genomic data infrastructure for the world's largest genetically distinct population, and the one most underserved by every drug designed against European genetics, is still unbuilt.

the model companies will commoditise. the data infrastructure companies will compound. and the most valuable data infrastructure in biology is the one that does not exist yet, built on the population that has been systematically absent from every previous wave.

/article

style

related-articles

Related articles