A genome is a set of DNA instructions that help all living things develop and function. The genome sequence varies slightly between individuals. In the case of humans, the genomes of two people are more than 99% identical. The small differences that remain contribute to the uniqueness of each person and offer, for example, information about health, helping to diagnose diseases and develop treatments.
To understand these genomic differences, scientists created a sequence reference sequence of the human genome (called GRCh38) through digital splicing to use as a “standard” in aligning, assembling and studying other sequences in our genome.
Despite its importance and continuous improvement, GRCh38 has limitations when it comes to representing human species variation, as it consists of only about 20 human genomes and most of the reference sequence is from only one of them.
The human pangenome offers a more complete and refined collection of genomic sequences that better reflect human diversity
Now, the Human Pangenome Reference Consortium (HPRC) is publishing in the journal ability A new and sophisticated collection of sequences that improves on this “standard” genome and captures a much greater diversity than previously possible. This is the new reference pangenome.
So far, a first draft has been submitted that includes the genomic sequences of 47 people from different parts of the world and different ancestries (African, American, Asian and European), but the researchers plan to increase this number to 350 by mid-2018. -2024. Because humans carry chromosomes in pairs, the current reference includes 94 different genomic sequences, and the goal is to reach 700 different ones when the project is complete.
Over 100 million new “letters” in DNA
Compared to the human genome, the pangenome adds 119 million base pairs or “letters” of DNA and 1,115 gene duplications (mutations in which a region of DNA containing a gene is duplicated) and increases the number of structural variants identified by 104%. , which provides a more complete picture of the genetic diversity of the human genome.
Since the first Human Genome Project was published, many projects have worked to improve its quality and complete this reference (for example, The last project Telomere-to-Telomere, T2T). However, a single and linear reference still does not correctly model the genomic diversity of our species, because there are many genomic variants that are not common to all people”, explains one of the authors who participated in this work for SINC, Santiago. Marco Sola, from the Autonomous University of Barcelona.
Genomic sequences from 47 people of different ancestry have been used, but this number will increase to 350 by 2024.
“The proposed solution was to model a non-linear reference that contains the genomic variations in the population – he explains – taking into account the genomic diversity of our species. This is called a pangenome and uses a graph structure (or diagram) to model the genomic variation found in different individuals.
Help from supercomputers
Marco Sola, who is also involved with the Barcelona Supercomputing Center (BSC-CNS), notes that this project would not be possible without supercomputers: “If building a linear genomic reference (such as GRCh38) requires hundreds of billions of alignments and assembly. DNA, a pangenomic reference, requires several orders of magnitude more information to be processed.”
Pangenome uses a graph structure to model the genomic variation found in different individuals of our species.
Santiago Marco alone
The methods were developed on the BSC-CNS MareNostrum 4 supercomputer and later incorporated into the Pangenome project, although the computation and processing of the presented final results have now been performed on another international supercomputer infrastructure.
Applications in biomedicine and health
“The human pan-genome reference will allow us to identify tens of thousands of new genomic variants in previously inaccessible regions of the genome,” says co-author and researcher Wen-Wei Liao of Yale University (USA), “and with it, us.” Can accelerate clinical research by improving understanding of the relationship between genes and disease characteristics.
“Everyone has a unique genome, so using a single reference genomic sequence for each person can bias genomic analyses,” said co-author Adam Filippi of the National Human Genome Research Institute (NHGRI, part of the US National Institutes of Health). ) from which this project is headed, “and, for example, genetic disease prediction may not work for people whose genomes are more divergent from the reference genome.”
Hence the importance of the new pangenome. “Basic researchers and clinicians using genomics need access to reference sequences that reflect the amazing diversity of the human population. This will help make the guideline useful for all people, helping to reduce the chances of widespread health disparities,” said Eric Green, director of NHGRI.
Basic researchers and clinicians using genomics need access to similar reference sequences that reflect the diversity of the human population.
– (Director of NHGRI)
“Creating and improving the human pangenome reference aligns with our institute’s goal of striving for global diversity in all aspects of genomics research, which is critical to advancing genomic knowledge and the effective implementation of genomic medicine.” fair,” he adds.
Ethics in the project
In line with these efforts, the Human Pangenome Reference Consortium includes an ethics group that seeks to anticipate challenges that may arise and guide participants’ informed consent, prioritize the study of diverse samples, and examine regulatory issues of adoption. Both working with international experts and local communities to include their genomic sequences.
The work of this international consortium has a budget of approximately $40 million over five years, which includes efforts to create a human pangenome reference, improve DNA sequencing technology, operate a coordinating center, disseminate activities, and generate resources for the scientific community to create. can use this new reference.
In fact, along with the main article ability Two more have been published, with results from the Human Pangenome Project: Map Millions single nucleotide variations (SNV, single letter differences in DNA) previously unknown and another study Recombination patterns between the short arms of certain chromosomes.
The authors are confident that more research and advances in personalized medicine will soon emerge thanks to new access to the human pangenome.
Source: El Diario