NSF Proposal - 2. Introduction

The tree of life is inherently fractal. Look closely at one lineage of a phylogeny and it dissolves into many separate lineages, and so on down to a very fine scale. The nature of both OTU's ("operational taxonomic units", the "twigs" of the tree in any particular analysis) and characters (markers that serve as evidence for the past existence of a lineage) change as one goes up and down this fractal scale. A robust reconstruction of the whole tree of life will require strategies that are powerful and flexible enough to encompass these phenomena. Although a great body of phylogenetic research has provided numerous tools applicable at particular (usually fairly constrained) scales, these tools have left many phylogenetic questions unanswered. We think they will remain unanswered until problems associated with the "scaling" have been addressed and applied to management and analysis of large datasets.

Our goal is to develop and test tools for phylogenetic reconstruction that address "scaling" and other large-dataset issues. To do this, we need a suitable system. "Suitable" implies that the system, a lineage of organisms, has sufficient diversity and a sufficiently long evolutionary history to provide a variety of different phylogenetic scales for examination. The system should be adequately studied to provide a reasonable phylogenetic framework, should be based on studies at scales for which the existing tools are relevant, and should identify discrete, unresolved domains for which hypotheses can be tested using new approaches. The system should interest a body of informed and networked investigators who are competent to tackle the various tasks associated with generating and analyzing large datasets for addressing important phylogenetic questions.

We argue here that the green plant lineage is the most suitable system at present, and the people who have gathered to study it within the framework of the Green Plant Phylogeny Research Coordination Group (GPPRCG, or "Deep Green") are best placed to develop and test general new tools needed to resolve the Tree of Life:

  • This branch of the Tree is one of the most diverse in number of taxa (ca. 5 x 105 species), habitats, morphological types, reproductive strategies, and secondary chemistries;
  • At a minimum age of ca. 109 years, it is one of the oldest lineages of "crown" eukaryotes;
  • It contains good examples of the known phylogenetic problems, including deep and shallow branches, ulses of radiation/asymmetric extinction, heterogeneous evolutionary rates, and horizontal gene transfer;
  • t has a better fossil record than most other branches of comparable depth and diversity;
  • Its living representatives are of great importance to all aspects of human affairs;
  • It has already been the focus of much coordinated phylogenetic research - the GPPRCG is an interactive, cooperative community that can productively address the several outstanding phylogenetic and methodological questions.

In the pages that follow, we describe the classes of phylogenetic problems that require attention. We identify several unresolved "deep" nodes of green plant phylogeny that represent selected examples of these problems, and detail the hypotheses to be tested in relation to them. We describe the procedures by which exemplars will be selected for analysis, and by which large datasets of morphological/ ultrastructural and molecular/genomic characters will be assembled, annotated, and archived. We set down what computational tools will be developed for analyzing these datasets, and how we will use them. We indicate how this work will link to other ongoing work on green plants at various scales, and will lead to concatenation of our datasets with theirs and the exploration of whether our scaling tools are adequate to generate robust phylogenetic reconstructions from these concatenated datasets. Finally, we propose training, education, and outreach strategies that will distribute the activities of our group and the progress and results of our research to the scientific community and the public.

Overall Objectives: To resolve the primary pattern of evolutionary diversification among green plants and establish a model for doing so that will be applicable to other groups of organisms with long evolutionary histories. A solid backbone based on genomic and ultrastructural data for relatively few taxa will enable the integration of previous and ongoing studies of many more taxa into a comprehensive picture of green plant phylogeny. In the course of obtaining this objective, we will achieve the following:

Genomic characterization.
We will complete a matrix of whole genome sequences for chloroplasts and mitochondria and develop Bacterial Artificial Chromosome (BAC) nuclear genome libraries (where feasible given genome size) for ca. 50 representatives of the critical deep-branching lineages of green plants. 0228655
Morphological characterization.
We will produce a comprehensive set of morphological data for these same taxa, with emphasis on global cellular and ultrastructural features.
Integration of existing phylogenetic research.
We will incorporate inferences from across the phylogenetic hierarchy in green plants using methods designed to permit scaling across studies.

[previous] [next]