Health Technology

New AI models possible game-changers within protein science and healthcare

Researchers have developed new AI models that can vastly improve accuracy and discovery within protein science. Potentially, the models will assist the medical sciences in overcoming present challenges within, e.g. personalised medicine, drug discovery, and diagnostics.

New AI models improve therapeutic sequencing, discover novel peptides, detect unreported organisms, and significantly enhance proteomics searches. AI-generated illustration: InstaDeep.

Facts

InstaNovo is a transformer-based model designed for de novo peptide sequencing. Developed in collaboration between InstaDeep and DTU Bioengineering, it translates fragment ion peaks from mass spectrometry data into peptide sequences with unprecedented precision.

Unlike traditional methods that rely on pre-existing databases, InstaNovo identifies peptides that have never been documented before—expanding the landscape of proteomic discovery.

A key innovation of the InstaNovo models is InstaNovo+, a diffusion-based iterative refinement model that enhances sequence accuracy by mimicking how researchers manually refine peptide predictions. InstaNovo+ begins with an initial sequence—either derived from InstaNovo or generated at random—and improves it, step by step.

When paired with InstaNovo, InstaNovo+ significantly reduces false discovery rates (FDR) and improves sequence accuracy, not just by refining predictions, but by exploring a broader range of potential peptide sequences.

Unlike autoregressive models such as InstaNovo and others, which predict peptide sequences one amino acid at a time, InstaNovo+ processes entire sequences holistically, enabling greater accuracy and higher detection rates.

Together, InstaNovo and InstaNovo+ enhance de novo peptide sequencing, striking a balance between precision and exploration to accelerate biological discovery.

Source: InstaDeep.

See also the scientific paper in Nature Machine Intelligence:

InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale proteomics experiments | Nature Machine Intelligence