Skip to main content

Methods

The flowering plant tree of life presented in the Kew Tree of Life Explorer is based on data generated by target sequence capture using the universal Angiosperms353 probe set, augmented with data retrieved from public sources.

Our Approach

Target sequence capture uses RNA probes to enrich sequencing libraries for targeted orthologous genes (Dodsworth et al. 2019; Faircloth et al., 2012; Lemmon et al., 2012). In effect, the RNA probes “fish out” a specific set of genes of interest. Angiosperms353 (Johnson et al., 2019) is a universal probe set that targets 353 single-copy loci, derived from a broader set of orthologous loci identified by the One Thousand Plants (OneKP) project (Leebens-Mack et al., 2019). The Angiosperms353 probe set recovers a median of 161 kbp of coding region per sample across the diversity of angiosperms (Baker et al., 2022). The probe set works efficiently from poor quality DNA, including DNA extracted from herbarium material (Brewer et al., 2019). The Angiosperms353 probe set and loci have been extensively explored in special issues of American Journal of Botany, Applications in Plant Sciences and other scientific papers.

The full workflow for the generation of data and computation of trees is described in detail by Baker et al. (2022), with updates documented in release notes available via our secure FTP (SFTP) site. A summary of these methods is given below.

Sampling

Our goal is to sample at least one species of each of the ca. 13,600 genera of flowering plants. To guide the selection of samples, we follow the global list of plant names in the World Checklist of Vascular Plants (Govaerts et al. 2021) available via the Plants of the World Online.

DNA was obtained from a wide variety of sources, including PAFTOL’s network of collaborators and partners. In most cases, samples were sourced from the collections of the Royal Botanic Gardens, Kew (Herbarium, DNA Bank, Tissue Bank, Living Collections and Millennium Seed Bank). To be included, the sample must have been 1) legally sourced and available for use in phylogenomic studies, 2) verified by a relevant expert, and 3) ideally collected in the wild. The identity of samples was substantiated by voucher specimens deposited in herbaria, as far as was practically achievable. All voucher specimen information, including links to specimen images where available, is provided in the Tree of Life Explorer.

DNA Extraction

DNA was extracted from 20 mg of silica dried material (Chase & Hills, 1991) or 40 mg herbarium material using a modified CTAB extraction method (Doyle & Doyle, 1987), followed by magnetic bead clean-up.

Target Sequence Capture

Libraries were prepared using a standard library preparation kit. Depending on DNA quality, total DNA was sonicated before library preparation, aiming for fragment sizes between 350 and 450 bp. After pooling, libraries were hybridised with the Angiosperms353 probe kit following the manufacturer’s protocol. DNA sequencing was performed using Illumina MiSeq, HiSeq and NovaSeq platforms.

Gene Recovery

Up to 353 genes were recovered from each sample by searching for matching orthologous sequences from the Angiosperms353 target gene set (Johnson et al, 2019). An in-house pipeline was used to assemble Illumina reads from target sequence capture data and then identify gene orthologues corresponding to the Angiosperm353 target gene set. Illumina reads of the PAFTOL data are available from the European Nucleotide Archive (ENA) under project number PRJEB35285. Recovered gene sequences for each sample are presented in gene-wise or species-wise fashion and are available via this site and RBG Kew's secure FTP site, together with the gene alignments and trees. All samples were validated using both DNA barcode and phylogenetic tests to verify the accuracy of their family identifications.

Public data retrieval

To augment data generated by the PAFTOL project, we retrieved the Angiosperms353 loci from publicly available data and from data shared with us by our collaborators. The main sources of data were:

  • Genomics for Australian Plants (GAP)
  • The One Thousand Plant (OneKP) Transcriptomes Initiative (Leebens-Mack et al., 2019)
  • The Sequence Read Archive (SRA)
  • Published annotated and unannotated genomes.

Accession numbers for public data are provided.

Tree of Life Reconstruction

Sequences for the same gene from each sample were aligned with UPP (Nguyen et al., 2015) and gene trees built with Fast Tree (Price et al., 2010). The navigable Tree of Life presented here was estimated with ASTRAL III (Zhang et al, 2018) from these gene trees described above and rooted with a set of gymnosperm species. The tree is annotated with local posterior probabilities (LPP) as indicators of branch support.

This dataset has been further analysed by Zuntini, Carruthers et al. (2024) using an alternative pipeline to explore the evolution and diversification of angiosperms. For full details, please refer to this paper.

References

Baker W.J., Bailey P., Barber V., Barker A., Bellot S., Bishop D., Botigue L.R., Brewer G., Carruthers T., Clarkson J.J., Cook J., Cowan R.S., Dodsworth S., Epitawalage N., Francoso E., Gallego B., Johnson M., Kim J.T., Leempoel K., Maurin O., McGinnie C., Pokorny L., Roy S., Stone M., Toledo E., Wickett N.J., Zuntini A.R., Eiserhardt W.L., Kersey P.J., Leitch I.J. & Forest F. 2022. A Comprehensive Phylogenomic Platform for Exploring the Angiosperm Tree of Life. Systematic Biology 71: 301–319. https://doi.org/10.1093/sysbio/syab035

Brewer, G.E., Clarkson, J.J., Maurin, O., Zuntini, A.R., Barber, V., Bellot, S., Biggs, N., Cowan, R.S., Davies, N.M.J., Dodsworth, S., Edwards, S.L., Eiserhardt, W.L., Epitawalage, N., Frisby, S., Grall, A., Kersey, P.J., Pokorny, L., Leitch, I.J., Forest, F. & Baker, W.J. 2019. Factors Affecting Targeted Sequencing of 353 Nuclear Genes From Herbarium Specimens Spanning the Diversity of Angiosperms. Frontiers in Plant Science 10: 1102. https://doi.org/10.3389/fpls.2019.01102

Chase, M.W. & Hills, H.H. 1991. Silica gel: An ideal material for field preservation of leaf samples for DNA studies. Taxon 40: 215-220. https://doi.org/10.2307/1222975

Dodsworth, S., Pokorny, L., Johnson, M.G., Kim, J.T., Maurin, O., Wickett, N.J., Forest, F. & Baker, W.J. 2019. Hyb-Seq for Flowering Plant Systematics. Trends in Plant Science 24: 887-891. https://doi.org/10.1016/j.tplants.2019.07.011

Doyle, J. J., & Doyle, J. L. 1987. A Rapid DNA Isolation Procedure for Small Quantities of Fresh Leaf Tissue. Phytochemical Bulletin 19: 11–15.

Faircloth, B.C., McCormack, J.E., Crawford, N.G., Harvey, M.G., Brumfield, R.T. & Glenn, T.C. 2012. Ultraconserved Elements Anchor Thousands of Genetic Markers Spanning Multiple Evolutionary Timescales. Systematic Biology 61: 717-726. https://doi.org/10.1093/sysbio/sys004

Govaerts, R., Nic Lughadha, E., Black, N., Turner, R. & Paton, A. 2021. The World Checklist of Vascular Plants, a continuously updated resource for exploring global plant diversity. Scientific Data 8: 215. https://doi.org/10.1038/s41597-021-00997-6

Johnson, M.G., Pokorny, L., Dodsworth, S., Botigué, L.R., Cowan, R.S., Devault, A., Eiserhardt, W.L., Epitawalage, N., Forest, F., Kim, J.T., Leebens-Mack, J.H., Leitch, I.J., Maurin, O., Soltis, D.E., Soltis, P.S., Wong, G.K.-s., Baker, W.J. & Wickett, N.J. 2018. A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering. Systematic Biology 68: 594-606. https://doi.org/10.1093/sysbio/syy086

Leebens-Mack, J.H. et al. 2019. One Thousand Plant Transcriptomes and the Phylogenomics of Green Plants. Nature 574: 679–685 https://doi.org/10.1038/s41586-019-1693-2

Lemmon, A.R., Emme, S.A. & Lemmon, E.M. 2012. Anchored Hybrid Enrichment for Massively High-Throughput Phylogenomics. Systematic Biology 61: 727-744. https://doi.org/10.1093/sysbio/sys049

Nguyen, N.-P.D., Mirarab, S., Kumar, K. & Warnow, T. 2015. Ultra-large alignments using phylogeny-aware profiles. Genome Biology 16: 124. https://doi.org/10.1186/s13059-015-0688-z

Price, M.N., Dehal, P.S., and Arkin, A.P. 2010. FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE 5(3): e9490.

Zhang, C., Rabiee, M., Sayyari, E. & Mirarab, S. 2018. ASTRAL-III: Polynomial Time Species Tree Reconstruction from Partially Resolved Gene Trees. BMC Bioinformatics 19: 153. https://doi.org/10.1186/s12859-018-2129-y

Zuntini, A.R., Carruthers, T., Maurin, O., Bailey, P.C., Leempoel, K., Brewer, G.E., Epitawalage, N., Françoso, E., Gallego-Paramo, B., McGinnie, C., Negrão, R., Roy, S.R., Simpson, L., Toledo Romero, E., Barber, V.M.A., Botigué, L., Clarkson, J.J., Cowan, R.S., Dodsworth, S., Johnson, M.G., Kim, J.T., Pokorny, L., Wickett, N.J., Antar, G.M., DeBolt, L., Gutierrez, K., Hendriks, K.P., Hoewener, A., Hu, A.-Q., Joyce, E.M., Kikuchi, I.A.B.S., Larridon, I., Larson, D.A., de Lírio, E.J., Liu, J.-X., Malakasi, P., Przelomska, N.A.S., Shah, T., Viruel, J., Allnutt, T.R., Ameka, G.K., Andrew, R.L., Appelhans, M.S., Arista, M., Ariza, M.J., Arroyo, J., Arthan, W., Bachelier, J.B., Bailey, C.D., Barnes, H.F., Barrett, M.D., Barrett, R.L., Bayer, R.J., Bayly, M.J., Biffin, E., Biggs, N., Birch, J.L., Bogarín, D., Borosova, R., Bowles, A.M.C., Boyce, P.C., Bramley, G.L.C., Briggs, M., Broadhurst, L., Brown, G.K., Bruhl, J.J., Bruneau, A., Buerki, S., Burns, E., Byrne, M., Cable, S., Calladine, A., Callmander, M.W., Cano, Á., Cantrill, D.J., Cardinal-McTeague, W.M., Carlsen, M.M., Carruthers, A.J.A., de Castro Mateo, A., Chase, M.W., Chatrou, L.W., Cheek, M., Chen, S., Christenhusz, M.J.M., Christin, P.-A., Clements, M.A., Coffey, S.C., Conran, J.G., Cornejo, X., Couvreur, T.L.P., Cowie, I.D., Csiba, L., Darbyshire, I., Davidse, G., Davies, N.M.J., Davis, A.P., van Dijk, K.-j., Downie, S.R., Duretto, M.F., Duvall, M.R., Edwards, S.L., Eggli, U., Erkens, R.H.J., Escudero, M., de la Estrella, M., Fabriani, F., Fay, M.F., Ferreira, P.d.L., Ficinski, S.Z., Fowler, R.M., Frisby, S., Fu, L., Fulcher, T., Galbany-Casals, M., Gardner, E.M., German, D.A., Giaretta, A., Gibernau, M., Gillespie, L.J., González, C.C., Goyder, D.J., Graham, S.W., Grall, A., Green, L., Gunn, B.F., Gutiérrez, D.G., Hackel, J., Haevermans, T., Haigh, A., Hall, J.C., Hall, T., Harrison, M.J., Hatt, S.A., Hidalgo, O., Hodkinson, T.R., Holmes, G.D., Hopkins, H.C.F., Jackson, C.J., James, S.A., Jobson, R.W., Kadereit, G., Kahandawala, I.M., Kainulainen, K., Kato, M., Kellogg, E.A., King, G.J., Klejevskaja, B., Klitgaard, B.B., Klopper, R.R., Knapp, S., Koch, M.A., Leebens-Mack, J.H., Lens, F., Leon, C.J., Léveillé-Bourret, É., Lewis, G.P., Li, D.-Z., Li, L., Liede-Schumann, S., Livshultz, T., Lorence, D., Lu, M., Lu-Irving, P., Luber, J., Lucas, E.J., Luján, M., Lum, M., Macfarlane, T.D., Magdalena, C., Mansano, V.F., Masters, L.E., Mayo, S.J., McColl, K., McDonnell, A.J., McDougall, A.E., McLay, T.G.B., McPherson, H., Meneses, R.I., Merckx, V.S.F.T., Michelangeli, F.A., Mitchell, J.D., Monro, A.K., Moore, M.J., Mueller, T.L., Mummenhoff, K., Munzinger, J., Muriel, P., Murphy, D.J., Nargar, K., Nauheimer, L., Nge, F.J., Nyffeler, R., Orejuela, A., Ortiz, E.M., Palazzesi, L., Peixoto, A.L., Pell, S.K., Pellicer, J., Penneys, D.S., Perez-Escobar, O.A., Persson, C., Pignal, M., Pillon, Y., Pirani, J.R., Plunkett, G.M., Powell, R.F., Prance, G.T., Puglisi, C., Qin, M., Rabeler, R.K., Rees, P.E.J., Renner, M., Roalson, E.H., Rodda, M., Rogers, Z.S., Rokni, S., Rutishauser, R., de Salas, M.F., Schaefer, H., Schley, R.J., Schmidt-Lebuhn, A., Shapcott, A., Al-Shehbaz, I., Shepherd, K.A., Simmons, M.P., Simões, A.O., Simões, A.R.G., Siros, M., Smidt, E.C., Smith, J.F., Snow, N., Soltis, D.E., Soltis, P.S., Soreng, R.J., Sothers, C.A., Starr, J.R., Stevens, P.F., Straub, S.C.K., Struwe, L., Taylor, J.M., Telford, I.R.H., Thornhill, A.H., Tooth, I., Trias-Blasi, A., Udovicic, F., Utteridge, T.M.A., Del Valle, J.C., Verboom, G.A., Vonow, H.P., Vorontsova, M.S., de Vos, J.M., Al-Wattar, N., Waycott, M., Welker, C.A.D., White, A.J., Wieringa, J.J., Williamson, L.T., Wilson, T.C., Wong, S.Y., Woods, L.A., Woods, R., Worboys, S., Xanthos, M., Yang, Y., Zhang, Y.-X., Zhou, M.-Y., Zmarzty, S., Zuloaga, F.O., Antonelli, A., Bellot, S., Crayn, D.M., Grace, O.M., Kersey, P.J., Leitch, I.J., Sauquet, H., Smith, S.A., Eiserhardt, W.L., Forest, F. & Baker, W.J. 2024 Phylogenomics and the rise of the angiosperms. Nature 629: 843–850. https://doi.org/10.1038/s41586-024-07324-0