Researchers from the University of Oxford’s Big Data Institute have taken a major step towards mapping the entirety of genetic relationships among humans: a single genealogy that traces the ancestry of all of us. The study has been published today in Science.
- New genealogical network of human genetic diversity reveals how individuals across the world are related to each other, in unprecedented detail
- The research predicts common ancestors, including approximately when and where they lived
- The analysis recovers key events in human evolutionary history, including the migration out of Africa
- The underlying method could have widespread applications in medical research, for instance identifying genetic predictors of disease risk
Human genetic research has made amazing strides over the last two decades, producing genomic data for hundreds of thousands of people, including thousands of ancient people. This opens up the intriguing potential of tracing the roots of human genetic variation in order to create an accurate map of the relationships between people all across the globe.
Up until now, the biggest obstacles to achieving this goal have been finding a mechanism to merge genomic sequences from several databases and creating algorithms to deal with such large amounts of data. The Big Data Institute at the University of Oxford has developed a brand-new technique that makes it simple to mix data from many sources and scales to handle millions of genomic sequences.
Dr. Yan Wong, an evolutionary geneticist at the Big Data Institute, and one of the principal authors, explained: “We have basically built a huge family tree, a genealogy for all of humanity that models as exactly as we can the history that generated all the genetic variation we find in humans today. This genealogy allows us to see how every person’s genetic sequence relates to every other, along all the points of the genome.”
Since individual genomic regions are only inherited from one parent, either the mother or the father, the ancestry of each point on the genome can be thought of as a tree. The set of trees, known as a “tree sequence” or “ancestral recombination graph,” links genetic regions back through time to ancestors where the genetic variation first appeared.
Lead author Dr. Anthony Wilder Wohns, who undertook the research as part of his PhD at the Big Data Institute and is now a postdoctoral researcher at the Broad Institute of MIT and Harvard, said: “Essentially, we are reconstructing the genomes of our ancestors and using them to form a vast network of relationships. We can then estimate when and where these ancestors lived. The power of our approach is that it makes very few assumptions about the underlying data and can also include both modern and ancient DNA samples.”
The study integrated data on modern and ancient human genomes from eight different databases and included a total of 3,609 individual genome sequences from 215 populations. The ancient genomes included samples found across the world with ages ranging from 1,000s to over 100,000 years. The algorithms predicted where common ancestors must be present in the evolutionary trees to explain the patterns of genetic variation. The resulting network contained almost 27 million ancestors.
After adding location data on these sample genomes, the authors used the network to estimate where the predicted common ancestors had lived. The results successfully recaptured key events in human evolutionary history, including the migration out of Africa.
The study team intends to continue adding genetic data as it becomes available, even though the genealogy map is currently a very comprehensive resource. The dataset may readily include millions of extra genomes since tree sequences store data in a very effective manner.
Dr. Wong said: “This study is laying the groundwork for the next generation of DNA sequencing. As the quality of genome sequences from modern and ancient DNA samples improves, the trees will become even more accurate and we will eventually be able to generate a single, unified map that explains the descent of all the human genetic variation we see today.”
Dr. Wohns added: “While humans are the focus of this study, the method is valid for most living things; from orangutans to bacteria. It could be particularly beneficial in medical genetics, in separating out true associations between genetic regions and diseases from spurious connections arising from our shared ancestral history.”
Reference: “A unified genealogy of modern and ancient genomes” by Anthony Wilder Wohns, Yan Wong, Ben Jeffery, Ali Akbari, Swapan Mallick, Ron Pinhasi, Nick Patterson, David Reich, Jerome Kelleher and Gil McVean, 24 February 2022, Science.