Skip to main content

Methods

The flowering plant tree of life presented in the Kew Tree of Life Explorer is based on data generated by target sequence capture using the universal Angiosperms353 probe set.

Our Approach

Target sequence capture uses RNA probes to enrich sequencing libraries for targeted orthologous genes (Dodsworth et al. 2019; Faircloth et al., 2012; Lemmon et al., 2012). In effect, the RNA probes “fish out” a set of specific genes of interest. Angiosperms353 (Johnson et al., 2019) is a universal probe set that selects for a standardised set of 353 loci, derived from a broader set of orthologous loci identified by the One Thousand Plants (1KP) project (Leebens-Mack et al., 2019). The Angiosperms353 probe set recovers a median of 161 kbp of coding region per sample across the diversity of angiosperms (Baker et al., 2021). The probe set works efficiently from poor quality DNA, including DNA extracted from herbarium material (Brewer et al., 2019).

The full workflow for the generation of data and computation of trees will be described in detail by Baker et al. (2021), with updates documented in release notes available via our secure FTP (SFTP) site. A summary of these methods is given below.

Sampling

Our long-term goal is to sample at least one species of each of the ca. 13,900 genera of flowering plants. In the current phase, we aim to achieve at least 50% sampling of genera in all families, with genera selected to ensure appropriate phylogenetic representation of flowering plant lineages overall. We prioritised genera that have never been sequenced before. In most instances, we did not generate new data from genera for which public genomic or transcriptomic resources were already available. Plant names were standardised to the classification presented in the World Checklist of Vascular Plants (WCVP 2020).

DNA was obtained from a wide variety of sources, including PAFTOL’s network of collaborators. Primarily, samples were sourced from the Royal Botanic Gardens, Kew (DNA Bank, Tissue Bank, Living Collections, Millennium Seed Bank and Herbarium). To be included, the sample must have been 1) legally sourced and usable in phylogenomic studies, 2) verified to species level, preferably by a relevant expert, and 3) ideally collected in the wild. The identity of samples was substantiated by voucher specimens deposited in herbaria, as far as was practically achievable. All voucher specimen information, including links to specimen images where available, is provided in the Tree of Life Explorer.

DNA Extraction

DNA was extracted from 20 mg of silica dried material (Chase & Hills, 1991) or 40 mg herbarium material using a modified CTAB extraction method (Doyle & Doyle, 1987), followed by magnetic bead clean-up.

Target Sequence Capture

Libraries were prepared using a standard library preparation kit. Depending on DNA quality, total DNA was sonicated before library preparation, aiming for fragment sizes between 350 and 450 bp. After pooling, libraries were hybridised with the Angiosperms353 probe kit following the manufacturer’s protocol. DNA sequencing was performed using Illumina MiSeq and HiSeq platforms.

Gene Recovery

Up to 353 genes were recovered from each sample by searching for matching orthologous sequences from the Angiosperms353 target gene set (Johnson et al, 2019). In addition to PAFTOL data, samples were also included from the following sources: the One Thousand Plant (1KP) transcriptomes Initiative (Leebens-Mack et al, 2019) and published annotated genomes. An in-house pipeline was used to assemble Illumina reads from target sequence capture data and then identify gene orthologues corresponding to the Angiosperm353 target gene set. Illumina reads of the PAFTOL data are available from the European Nucleotide Archive (ENA) under project number PRJEB35285. Recovered gene sequences for each sample are presented in gene-wise or species-wise fashion and are available via this site and RBG Kew's secure FTP site, together with the gene alignments and trees. All samples were validated using both DNA barcode and phylogenetic tests to verify the accuracy of their family identifications (see Baker et al. 2021 for more details).

Tree of Life Reconstruction

Sequences for the same gene from each sample were aligned with UPP (Nguyen et al., 2015) and gene trees built with IQ-TREE 2 (Minh et al., 2020). The navigable Tree of Life presented here was estimated with ASTRAL III (Zhang et al, 2018) from these gene trees described above and rooted with a set of gymnosperm species. The tree is annotated with local posterior probabilities (LPP) as indicators of branch support. Full details are given by Baker et al. (2021).

References

Baker W.J., Bailey P., Barber V., Barker A., Bellot S., Bishop D., Botigue L.R., Brewer G., Carruthers T., Clarkson J.J., Cook J., Cowan R.S., Dodsworth S., Epitawalage N., Francoso E., Gallego B., Johnson M., Kim J.T., Leempoel K., Maurin O., McGinnie C., Pokorny L., Roy S., Stone M., Toledo E., Wickett N.J., Zuntini A.R., Eiserhardt W.L., Kersey P.J., Leitch I.J. & Forest F. 2021. A Comprehensive Phylogenomic Platform for Exploring the Angiosperm Tree of Life. bioRxiv:2021.2002.2022.431589. https://doi.org/10.1101/2021.02.22.431589

Brewer, G.E., Clarkson, J.J., Maurin, O., Zuntini, A.R., Barber, V., Bellot, S., Biggs, N., Cowan, R.S., Davies, N.M.J., Dodsworth, S., Edwards, S.L., Eiserhardt, W.L., Epitawalage, N., Frisby, S., Grall, A., Kersey, P.J., Pokorny, L., Leitch, I.J., Forest, F. & Baker, W.J. 2019. Factors Affecting Targeted Sequencing of 353 Nuclear Genes From Herbarium Specimens Spanning the Diversity of Angiosperms. Frontiers in Plant Science 10: 1102. https://doi.org/10.3389/fpls.2019.01102

Chase, M.W. & Hills, H.H. 1991. Silica gel: An ideal material for field preservation of leaf samples for DNA studies. Taxon 40: 215-220. https://doi.org/10.2307/1222975

Dodsworth, S., Pokorny, L., Johnson, M.G., Kim, J.T., Maurin, O., Wickett, N.J., Forest, F. & Baker, W.J. 2019. Hyb-Seq for Flowering Plant Systematics. Trends in Plant Science 24: 887-891. https://doi.org/10.1016/j.tplants.2019.07.011

Doyle, J. J., & Doyle, J. L. 1987. A Rapid DNA Isolation Procedure for Small Quantities of Fresh Leaf Tissue. Phytochemical Bulletin 19: 11–15.

Faircloth, B.C., McCormack, J.E., Crawford, N.G., Harvey, M.G., Brumfield, R.T. & Glenn, T.C. 2012. Ultraconserved Elements Anchor Thousands of Genetic Markers Spanning Multiple Evolutionary Timescales. Systematic Biology 61: 717-726. https://doi.org/10.1093/sysbio/sys004

Johnson, M.G., Pokorny, L., Dodsworth, S., Botigué, L.R., Cowan, R.S., Devault, A., Eiserhardt, W.L., Epitawalage, N., Forest, F., Kim, J.T., Leebens-Mack, J.H., Leitch, I.J., Maurin, O., Soltis, D.E., Soltis, P.S., Wong, G.K.-s., Baker, W.J. & Wickett, N.J. 2018. A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering. Systematic Biology 68: 594-606. https://doi.org/10.1093/sysbio/syy086

Leebens-Mack, J.H. et al. 2019. One Thousand Plant Transcriptomes and the Phylogenomics of Green Plants. Nature 574: 679–685 https://doi.org/10.1038/s41586-019-1693-2

Lemmon, A.R., Emme, S.A. & Lemmon, E.M. 2012. Anchored Hybrid Enrichment for Massively High-Throughput Phylogenomics. Systematic Biology 61: 727-744. https://doi.org/10.1093/sysbio/sys049

Minh, B.Q., Schmidt, H.A., Chernomor, O., Schrempf, D., Woodhams, M.D., Von Haeseler, A. & Lanfear, R. 2020. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Molecular Biology and Evolution 37: 1530-1534. https://doi.org/10.1093/molbev/msaa015

Nguyen, N.-P.D., Mirarab, S., Kumar, K. & Warnow, T. 2015. Ultra-large alignments using phylogeny-aware profiles. Genome Biology 16: 124. https://doi.org/10.1186/s13059-015-0688-z

WCVP. 2020. World Checklist of Vascular Plants, version 2.0. Facilitated by the Royal Botanic Gardens, kew. Published on the internet; http://wcvp.science.kew.org/, retrieved 18 November 2020.

Zhang, C., Rabiee, M., Sayyari, E. & Mirarab, S. 2018. ASTRAL-III: Polynomial Time Species Tree Reconstruction from Partially Resolved Gene Trees. BMC Bioinformatics 19: 153. https://doi.org/10.1186/s12859-018-2129-y