How do you study a group of organisms with over 300,000 species, dispersed across all seven continents, and with up to 50 times as much DNA content as the human genome?
This is the question posed to biologists studying the evolutionary history of flowering plants, called angiosperms, whose rapid diversification was so convoluted a problem that Darwin referred to it as the ‘abominable mystery.’
This month, both the American Journal of Botany (AJB) and Applications in Plant Sciences (APPS) are devoting their July issues to what has recently become a turning point in the way scientists study the relationships among flowering plants. Dubbed Angiosperms353, the initiative combines new and innovative DNA sequencing techniques with genetic information from 1KP, a massive data resource with DNA from more than 1,000 species that took an international team over a decade to complete.
“Using these gene sequences as a common tool opens up new questions that could not have been looked at before,” said Dr. Matthew Johnson, assistant professor and herbarium director at Texas Tech University and one of the original architects of Angiosperms353.
The Greater Phylogenetic Good
Until now, geneticists have had to choose between two options when designing a study: either obtain small amounts of DNA for a large number of organisms or the reverse.
After DNA sequencing was originally developed in the mid-1960s, scientists primarily went with the first option. They began stitching together the tree of life by comparing genetic sequences shared widely among species. Named after its founder, Sanger sequencing was used to assemble trees by examining just a small number of genes, somewhat like trying to understand a country by only visiting its capital.
With the advent of next-generation sequencing at the turn of the century, some researchers began specializing in the opposite approach, meticulously assembling a single organism’s entire genetic code. The first test case, the Human Genome Project, was completed in 2003, spurring the new age of genomics.
Today, next-generation sequencing has largely replaced older methods in most labs. However, costs remain prohibitively high for many researchers. And while knowing the genetic code of an organism’s entire genome comes in handy when trying to answer specific questions, such as how proteins and cells function at a molecular level, comparing genomes is an inefficient way of piecing together relationships.
To overcome these challenges, researchers have adopted a technique called target sequence capture, which leverages the advantages of next-generation sequencing while focusing in on defined sets of hundreds of genes. This method of retrieving DNA has boomed in popularity in the past few years, allowing scientists filling in the branches and leaves on the tree of life to probe both deeply and widely within and between species.
But target sequence capture still has one major drawback in that, unlike its Sanger counterpart, there hasn’t yet been a widely standardized set of sequences with which to compare across multiple studies and to build upon their results. Every time a researcher wants to analyze evolutionary patterns in a group of organisms, they have to design new probes to extract genetic information.
“These increasingly popular genomic methods allow scientists to fish out hundreds of genes; however, the probes needed to do this are expensive and complex to design, and usually only work for a narrowly defined group,” said Dr. William Baker, a Senior Research Leader at the Royal Botanic Gardens, Kew, and a lead guest editor for the AJB special issue.
This limitation has hampered the development of large studies on the evolutionary history of plants, but is an issue scientists identified early on and have worked diligently over the past decade to avoid. Starting in 2019 with the release of two combined probe sets — Angiosperms353 for flowering plants and GoFlag for groups including ferns and mosses — they’re now starting to reap the rewards of their labor.
“Angiosperms353 targets a standardized set of genes, which means published data can be re-used and synthesized across studies for the ‘greater phylogenetic good,'” Baker said.
The Splash Zone
Plant biologists haven’t wasted any time in putting the Angiosperms353 probes to use. The 20 studies published in these special issues span the breadth of angiosperm diversity, encompassing over 500 genera and several times as many species. And because of the broad utility of the probes, each study also zooms in on a particular group at different magnifications.
Many of the genetic sequences the probes correspond to have been relatively stable throughout the 140-million-year history of flowering plants. These DNA strands accumulate mutations at a glacial pace and are thus useful in constructing the main branches of the angiosperm tree of life.
Other sequences mutate at a much faster clip, to the extent that no two are alike in any given species. And while most of the probes correspond to DNA actively used by cells to create proteins, they also adhere to small portions of DNA that flank either end of a protein-coding strand, regions emblematically referred to as ‘the splash zone.’
These flanking regions don’t actively code for proteins; in fact, scientists are still unsure exactly what they do. What they do know is this non-coding DNA mutates quickly, similar to the types of genes used for forensic testing in crime labs. In plants, they can be used to illuminate close relationships among closely related species or to reveal patterns of genetic diversity among individuals, filling in the small twigs and leaves on the tree of life and providing an important roadmap for conservation efforts.
Past, Present, and Future
Sequence capture also has an important advantage over previous techniques in that it can be reliably used to retrieve old DNA. This feature is extremely important in a field where some estimates suggest the majority of the 70,000 or so plant species yet to be discovered have already been collected and stored in herbaria. Some species, such as Miconia abscondita, were only discovered through genetic analysis of herbarium tissue after they’d gone extinct in the wild. And analyses of plant communities from ages past have been used in multiple cases to study how plants are responding to climate change.
The studies in these issues offer a glimpse into the future of plant phylogenetics, one in which researchers can obtain immense quantities of data in a fraction of the time it would have taken them just 20 years ago. For Baker, who will be publishing Angiosperms353 data for over 7,000 flowering plant genera later this year, that future looks bright. In concert with the Royal Botanic Gardens, Kew, he and several colleagues have been using the new probe set to construct the plant tree of life through the PAFTOL project. He’s also helped launch a free repository called the Kew Tree of Life Explorer to store and distribute the growing amounts of genetic data from researchers around the world who are using the probes.
“The standardization of these targeted genes will pay dividends for decades to come, as we inch towards our collective goal of a complete tree of life for all species,” Baker said.
ARTICLES INCLUDED IN THESE ISSUES
Antonelli, A., J. J. Clarkson, K. Kainulainen, O. Maurin, G. E. Brewer, A. P. Davis, N. Epitawalage, et al. 2021. Settling a family feud: a high-level phylogenomic framework for the Gentianales based on 353 nuclear genes and partial plastomes. American Journal of Botany 108(7): 1142-1164.
Baker, W. J., S. Dodsworth, F. Forest, S. W. Graham, M. G. Johnson, A. McDonnell, L. Pokorny, et al. 2021b. Exploring Angiosperms353: an open, community toolkit for collaborative phylogenomic research on flowering plants. American Journal of Botany 108(7): 1058-1064.
Buerki, S., M. W. Callmander, P. Acevedo-Rodriguez, P. P. Lowry II, J. Munzinger, P. Bailey, O. Maurin, et al. 2021. An updated infra-familial classification of Sapindaceae based on targeted enrichment data. American Journal of Botany 108(7): 1233-1250.
Clarkson, J. J., A. R. Zuntini, O. Maurin, S. R. Downie, G. M. Plunkett, A. N. Nicolas, J. F. Smith, et al. 2021. A higher-level nuclear phylogenomic study of the carrot family (Apiaceae). American Journal of Botany 108(7): 1251-1268.
Eserman, L. A., S. K. Thomas, E. E. D. Coffey, and J. H. Leebens-Mack. 2021. Target sequence capture in orchids: developing a kit to sequence hundreds of single-copy loci. Applications in Plant Sciences 9(7): e11416.
Hendriks, K. P., T. Mandáková, N. M. Hay, E. Ly, A. Hooft van Huysduynen, R. Tamrakar, S. K. Thomas, et al. The best of both worlds: combining lineage-specific and universal bait sets in target-enrichment hybridization reactions. Applications in Plant Sciences 9(7): e11438.
Lee, A. K., I. S. Gilman, M. Srivastav, A. D. Lerner, M. J. Donoghue, and W. L. Clement. 2021. Reconstructing Dipsacales phylogeny using Angiosperms353: issues and insights. American Journal of Botany 108(7): 1121-1141.
Maurin, O., A. Anest, S. Bellot, E. Biffin, G. Brewer, T. Charles-Dominique, R. S. Cowan, et al. 2021. A nuclear phylogenomic study of the angiosperm order Myrtales, exploring the potential and limitations of the universal Angiosperms353 probe set. American Journal of Botany 108(7): 1086-1110.
McDonnell, A. J., W. J. Baker, S. Dodsworth, F. Forest, S. W. Graham, M. G. Johnson, L. Pokorny, J. Tate, S. Wicke, and N. J. Wickett. 2021. Exploring Angiosperms353: developing and applying a universal toolkit for flowering plant phylogenomics. Applications in Plant Sciences 9(7): e11443.
McLay, T. G. B., J. L. Birch, B. F. Gunn, W. Ning, J. A. Tate, L. Nauheimer, E. M. Joyce, et al. 2021. New targets acquired: improving locus recovery from the Angiosperms353 probe set. Applications in Plant Sciences 9(7): e11420
Nauheimer, L., N. Weigner, E. Joyce, D. Crayn, C. Clarke, and K. Nargar. 2021. HybPhaser: a workflow for the detection and phasing of hybrids in target capture data sets. Applications in Plant Sciences 9(7): e11441.
Ottenlips, M. V., D. H. Mansfield, S. Buerki, M. A. E. Feist, S. R. Downie, S. Dodsworth, F. Forest, et al. 2021. Resolving species boundaries in a recent radiation with the Angiosperms353 probe set: The Lomatium packardiae/L. anomalum clade of the L. triternatum (Apiaceae) complex. American Journal of Botany 108(7): 1216-1232.
Pérez-Escobar, Ó. A., S. Dodsworth, D. Bogarín, S. Bellot, J. A. Balbuena, R. J. Schley, I. A. Kikuchi, et al. 2021. Hundreds of nuclear and plastid loci yield novel insights into orchid relationships. American Journal of Botany 108(7): 1165-1179.
Pillon, Y., H. C. F. Hopkins, O. Maurin, N. Epitawalage, J. Bradford, Z. S. Rogers, W. J. Baker, and F. Forest. 2021. Phylogenomics and biogeography of Cunoniaceae (Oxalidales) with complete generic sampling and taxonomic realignments. American Journal of Botany 108(7): 1180-1199.
Shah, T., J. V. Schneider, G. Zizka, O. Maurin, W. Baker, F. Forest, G. E. Brewer, V. Savolainen, et al. 2021. Joining forces in Ochnaceae phylogenomics: A tale of two targeted sequencing probe kits. American Journal of Botany 108(7): 1200-1215.
Siniscalchi, C. M., O. Hidalgo, L. Palazzesi, J. Pellicer, L. Pokorny, O. Maurin, I. J. Leitch, et al. 2021. Lineage-specific vs. universal: a comparison of the Compositae1061 and Angiosperms353 enrichment panels in the sunflower family. Applications in Plant Sciences 9(7): e11422.
Slimp, M., L. D. Williams, H. Hale, and M. G. Johnson. 2021. On the potential of Angiosperms353 for population genomic studies. Applications in Plant Sciences 9(7): e11419.
Thomas, A. E., J. Igea, H. M. Meudt, D. C. Albach, W. G. Lee, and A. J. Tanentzap. 2021. Using target sequence capture to improve the phylogenetic resolution of a rapid radiation in New Zealand Veronica. American Journal of Botany 108(7): 1288-1305.
Thomas, S. K., X. Liu, Z.-Y. Du, Y. Dong, A. Cummings, L. Pokorny, Q.-Y. Xiang, and J. H. Leebens-Mack. 2021. Comprehending Cornales: phylogenetic reconstruction of the order using the Angiosperms353 probe set. American Journal of Botany 108(7): 1111-1120.
Ufimov, R., V. Zeisek, S. Píšová, W. J. Baker, T. Fér, M. van Loo, C. Dobeš, and R. Schmickl. 2021. Relative performance of customized and universal probe sets in target enrichment: A case study in subtribe Malinae. Applications in Plant Sciences 9(7): e11442.
Wenzell, K. E., A. J. McDonnell, N. J. Wickett, J. B. Fant, and K. A. Skogen. 2021. Incomplete reproductive isolation and low genetic differentiation despite floral divergence across varying geographic scales in Castilleja. American Journal of Botany 108(7): 1269-1287.
Zuntini, A. R., L. P. Frankel, L. Pokorny, F. Forest, and W. J. Baker. 2021. A comprehensive phylogenomic study of the monocot order Commelinales, with a new classification of Commelinaceae. American Journal of Botany 108(7): 1065-1085.