Monday, October 27, 2025

Nature Methods Paper Leverages PacBio Sequencing Technology to Develop the Platinum Pedigree Benchmark, a New Standard for Accurate Characterization of Variation in the Human Genome that Improves Training for AI Models

PacBio
PacBio

The most comprehensive, family-based variant dataset ever published will improve variant classification using AI-based tools

MENLO PARK, Calif., Aug. 04, 2025 (GLOBE NEWSWIRE) — PacBio (NASDAQ: PACB), a leading provider of high-quality, highly accurate sequencing platforms, today announced the results of a study published in Nature Methods describing a new, comprehensive truth-set of genomic variation which characterizes simple and complex variation. These improved benchmarks were used to retrain Google’s DeepVariant, a popular AI-based variant calling tool, resulting in a 34% reduction in erroneously called variants. This resource (the Platinum Pedigree) was built by scientists from PacBio in collaboration with researchers at the University of Washington, the University of Utah, and several other institutions.

Combining inheritance-based validation with long-read sequencing, this benchmark accurately characterizes variants, even in difficult, repeat rich regions of the genome, producing the most complete view of validated genetic variation to date.

“Comprehensive benchmarking datasets that include all variant types are foundational to progress in genomics methods development and the application of AI-driven tools, as well as to our understanding of genomic variation for both research and diagnostic purposes,” said Zev Kronenberg, lead author and Senior Manager at PacBio. “The Platinum Pedigree benchmark doesn’t just include simple variants in easy-to-sequence regions, it includes variants from across the entire genome, including regions that were previously excluded from benchmarks due to their complex nature.”

The Platinum Pedigree dataset was developed using deep sequencing from three sequencing platforms across a 28-member, multi-generational family (CEPH-1463). By tracking the inheritance of genetic variants from parents to multiple children, the study confidently catalogs over 37 Mb of genetic variation segregating within the family from single nucleotide variants to large structural variants.

The dataset introduces the first large pedigree-validated tandem repeat and structural variant truth sets. It also adds more than 200 million bases extending the benchmark regions to 2.77 Gb, including difficult-to-map areas such as segmental duplications and low-complexity regions.

A Benchmark Built for the Dark Genome

As a demonstration of the value of improved benchmarks to improve AI and ML methods, the researchers retrained Google’s DeepVariant – a popular software tool that employs deep learning to identify genetic variants – using the Platinum Pedigree benchmark data. This updated DeepVariant model reduced errors by up to 34% genome-wide, including even higher gains in the most challenging regions of the genome.

Source link

Latest Topics

Can Strong Search and Cloud Growth Drive GOOGL’s Q3 Earnings?

Alphabet’s GOOGL third-quarter 2025 results, scheduled...

The BoF Podcast | The Great Luxury E-Commerce Reckoning

The author has shared a Podcast.You will need...

Will Higher Ad Revenues Aid Meta Platforms Stock in Q3 Earnings?

Meta Platforms’ META third-quarter 2025 results,...

The BoF Podcast | How Skims and On Create Cultural Relevance

The author has shared a Podcast.You will need...

Trump Claims Recent M.R.I. Scan Was ‘Perfect’

new video loaded: Trump Claims Recent M.R.I. Scan Was...

Related Articles

spot_img