Original NSF Proposal
"From the genome to the tree of life"
NSF Proposal Body Bibliography Initial Core Participant's statements
1. Results from Prior Support 5. Examples: Research Integrating Genomics / Phylogenetics
2. Background: Phylogenetics / Evolution 6. Proposed Coordination Activities
3. Background: Genomics 7. Management / Coordination Mechanisms
4. Theme: Research Coordination Group 8. Significance

Section 5: Examples of Research Integrating Genomics and Phylogenetics

The roles of gene duplications and transposable elements in nuclear genome evolution.

It is already clear that gene duplications and transposable elements have played a large role in shaping the diversity of plant alleles. Transposable elements can modulate promoter activities, splicing decisions, and protein activity (Walbot, 1991). Transposable elements may actually facilitate the rapid restructuring of genomes following polyploidization. Matzke and Matzke (1998) suggest that polyploidy permits extensive gene modification by transposons because polyploid genomes, with their multiple copies of all genes, are buffered from the deleterious consequences of transposition (reviewed in Soltis and Soltis, 1999). When transposons excise from alleles, they create tremendous diversity: many of the resulting alleles are non-functional, but the functional alleles contain diverse additions, deletions, and point mutations at the original transposon site (Nordborg and Walbot, 1995). In maize the high frequency of indels compared to point mutations in standard alleles points to the prior impact of transposons in shaping allelic diversity. In fact, there are so many transposons in angiosperms, it is a research "hot topic" to understand how they are epigenetically controlled (Martienssen, 1998). Are there bursts of transposon activity that yield many new alleles, followed by a quiescent period in which natural selection sorts out the products? Two possible reasons that plants can survive active transposons are: first, there is gametophytic selection that eliminates lethal alleles and second, plants contain many duplicate genes.

In contrast to yeast, nematode, and even human, plant genomes contain many more duplicate loci. In some cases, such duplications arose through polyploidy, a process that has occurred in the history of more than half of the angiosperms (Stebbins, 1971; Grant, 1981; Masterson, 1994). Maize is a good example of a species in which the polyploidization events have been dated and the divergence of the pairs of genes at homologous loci have been mapped; indeed, recognition of the polyploid nature of maize allows a more comprehensive syntenic map to be derived among the grasses (Moore et al., 1995). More importantly, detailed studies of the range of alleles present at pairs of duplicate loci indicate that in almost every case there are now distinct patterns of gene expression. That is, the duplicate genes have functionally diverged since genome doubling. It is inferred that gene duplication provides much of the "raw material" on which natural selection can act and that in plants these selective pressures result in alleles with ever more specialized domains of expression. These duplicate loci are likely to contribute to both morphological and chemical diversity.

Even in "true diploids" (i.e., those plants lacking evidence of recent polyploidization) gene duplication is an important factor. About 15% of all Arabidopsis genes have a nearby duplication, and in some gene families multiple copies are present. For example, glutathione S-transferase genes, which are crucial in stress responses and for the management of plant secondary metabolites, exist in two clusters with 3 genes and one cluster of 7 tandem copies of functional genes (Edwards et al., 2000). Similarly, plant defense genes are typically present in complex families arrayed within a chromosomal region (Noel et al., 1999; Meyers et al., 1999); intralocus recombination generates many new combinations. Thus, the very arrangement of duplicate loci within the chromosome is likely to be a selected feature.
Plants are remarkably more resistant to ionizing radiation than animals. It is possible that the lack of a germ line and continuous exposure to fluctuating UV are strong selective forces for compact genes in plants; not only are promoters short, but so are the introns and 5' and 3' untranslated regions. As a consequence of this, the introduction of short motifs, often by transposable elements, can profoundly alter the expression pattern of a plant gene (Walbot, 1996).

The new phylogenomics collaboration will stimulate examination of several important topics in this area. (1) Identifying members of gene families against a background of allelic variation at individual loci within and between species. It is possible that allelic variation within a locus will be higher than between-locus variation in comparisons of specific alleles. (2) Developing vocabulary and learning the emerging database tools for recognizing plant transposons within genes. (3) Examining whether rapid morphological and chemical evolution is preceded or accompanied by transposon activities and/or widespread duplication. Minor changes in allele coding regions or promoters may be the foundation for diversification.

Evolution of gene order, arrangement, and function in the nuclear genome.

The full potential of the complete Arabidopsis sequence will only be realized when its genome structure, gene content, and gene functions can be understood in relationship to its own evolutionary history and to that of other plant species. It is through comparative genomics that researchers will deduce the mechanisms and pathways by which plant genes and genomes have diverged to give the diversity of form, function, and adaptation that now characterize the world's flora. On the practical side, it is expected that the genomic sequence of Arabidopsis can be used to predict gene content and gene function in crop species, most of whose genomes are too large for genomic sequencing any time in the near future.

There are two underlying assumptions required for extrapolating genomic information from Arabidopsis to other plant species: (1) Arabidopsis and all other plants have inherited gene order and gene content, with modifications, through common ancestry. (2) The individual genes, now present in modern-day plant species, can be used to reconstruct ancestral gene order and content. These assumptions have already been tested and largely verified for species within plant families. For example, in the grass family (Poaceae), which contains such familiar crops as corn, wheat, rice and millet, gene order has been conserved in large blocks, often comprising entire chromosomes or chromosome arms (Gale & Devos, 1998). Comparative sequencing and cross hybridization of cDNA clones in the grasses have also demonstrated that gene content and often gene function are also conserved (Van Deynze et al., 1998; Chen et al., 1997). Similar results have been obtained with studies in the nightshade family (potato, tomato, and pepper) (Tanksley et al.,1992, Livingstone et al., 1999). However, in the mustard family (Brassicaceae), which includes arabidopsis, cabbage, and broccoli, genomes seem to have evolved differently from the grasses or nightshades. While gene content is conserved, the genomes of mustard species are often highly rearranged relative to one another, and gene duplications are common, at least some of which are due to polyploidy (Lagercrantz & Lydiate, 1996; Lagercrantz, 1998).

While comparisons among genomes have been quite common for species within plant families, comparisons between plant families have been rare and fraught with technical difficulties. Specifically, reduced gene similarities between plant families has made comparative mapping, via common probes and Southern hybridization, problematic if not impossible. Nonetheless, research by Paterson et al. (1996) suggests that blocks of linked genes are conserved across angiosperm families and even between highly divergent monocots and eudicots. Comparative sequencing data of Arabidopsis and rice have been used to both refute (Devos et al., 1999) and support (van Dodeweerd et al., 1999) this hypothesis.

Tomato and Arabidopsis belong to two clades that diverged relatively early in the radiation of eudicots. Based on fossil evidence the two families separated more than 90 million years ago (MYA) (Gandolfo et al., 1998). Mitochondrial DNA sequence comparisons place the divergence at 112-156 MYA (Yang et al., 1999). Because of their early divergence, a comparison of the tomato and Arabidopsis genomes should provide a glimpse of gene and genome evolution since the radiation of eudicots, and provide information relevant to the large number of species (including many crop plants) that fall within this clade.

As described above, Arabidopsis and most flowering plants are likely ancient polyploids and, as comparisons are made across greater and greater phylogenetic distances, the likelihood increases that polyploidy has occurred in the lineage of one or both species being compared. If this prediction is correct, comparisons across families of plants will not result in matches between single homologous segments, but rather in matches among sets of homologous genes and duplicated gene segments. If polyploidy was a factor in the evolution of the Arabidopsis genome, the gene number for the progenitor of Arabidopsis and tomato (and hence many other angiosperm families) could have been considerably less than the 20,000-25,000 genes estimated for Arabidopsis (Meinke et al., 1998; Somerville & Somerville, 1999). Ku et al. (in press) have recently estimated that the ancestral genome for Arabidopsis and tomato would have contained approximately one half the number of genes seen in the present-day Arabidopsis genome.

Syntenic relationships can be detected between the Arabidopsis genome and the genomes of other families of angiosperms based on gene homologies (Ku et al., in press), but matches tying Arabidopsis to other plant genomes will not likely be based on single ortholog pairs, but rather networks of homologous genes created by multiple rounds of polyploidy followed by gene divergence and gene loss. Establishing genome relationships among divergent plant families on a gene for gene basis may therefore be more complicated than originally expected. However, such analyses will eventually allow for an understanding of the events and mechanisms that have molded plant genome evolution and the exchange of sequence and functional information among species over very long periods of evolutionary time.

Organellar genomics and origin and early evolution of land plants.

The organellar genomes, in comparison to their nuclear counterpart, have two advantages for phylogeny reconstruction: (1) a much higher coding/non-coding sequence ratio, thus higher information content (noncoding sequences are subject to elimination during evolution and thus lack homologous regions across a wide variety of organisms), and (2) less complicated genome dynamics, e.g., they lack gene and genome duplication, DNA segment translocation, and mobile elements (hence it is easier to identify homologous regions for phylogenetic analysis).

We may reach a limit to the resolving power of simple nucleotide sequence comparisons for phylogenetic reconstruction at the deeper levels in the tree of life. Recent empirical and theoretical studies have shown that a length much longer than that of a typical single gene (1-2 kb) is required to reconstruct a complicated phylogeny with high resolution and confidence (Hillis, 1996; Qiu et al., 1999; Soltis et al., 1999). Thus, the prospect of adding significant numbers of new structural genomic characters to deep-level phylogenetic analyses is exciting. One major strength of organellar genomic structural changes in phylogenetics is that they occur much less frequently compared to point mutations. Furthermore, exactly parallel structural changes in two different lineages are much less likely than parallel point mutations. This is not to say that genomic structural characters are completely homoplasy-free; they are like any other characters (morphological, metabolic, and nucleotide substitution) subject to parallelism, reversal, and convergent evolution. However, theoretical considerations suggest that genomic rearrangements have good properties for use as phylogenetic markers at deep levels (slow rate of change, many possible character states; Mishler 1994), and empirical studies have shown them to very important characters for marking deep branches.

For example, based on completely sequenced chloroplast genomes, it is known that the entire set of ~20 introns in land plants were acquired during the stage of charophyte evolution, because they are not present in either Mesostigma (an early divergent lineage either sister to the rest of the green plants, Lemieux et al., 2000, or to the streptophytes, Qiu & Lee, submitted), or the chlorophyte algae Nephroselmis (Turmel et al., 1999a) and Chlorella (Wakasugi et al., 1997), but are present in Marchantia (Ohyama et al., 1986), pine (Wakasugi et al., 1994), rice (Hiratsuka et al., 1989), maize (Maier et al., 1995), and tobacco (Shinozaki et al., 1986). Hence, determining the points of intron acquisition is likely to add information to resolve relationships among charophytes and to identify the algal sister lineage of land plants. Furthermore, while the chloroplast genome exhibits structural conservation across green plants, several operons were disrupted or formed during charophyte evolution. Identifying the points of operon disruption and formation will also help resolve some of the most contentious issues in green plant phylogenetics.

Comparative genomics of the mitochondria can also play a very important role in phylogeny reconstruction. By comparison to its chloroplast counterpart, the mitochondrial genome shows little structural conservation in land plants after the basal split with liverworts, yet there are a number of conserved gene clusters that were disrupted or formed between Marchantia and Arabidopsis, the only two land plants whose mitochondrial genomes have been sequenced (Oda et al., 1992; Unseld et al., 1997). Moreover, several group II introns evolved from cis- to trans-splicing during pteridophyte and angiosperm evolution (Malek & Knoop, 1998; Y. Qiu & J. D. Palmer, unpublished data). Angiosperm mitochondrial genomes contain approximately 25 introns (Unseld et al., 1997), whereas the Marchantia mitochondrial genome has its own set of 32 introns (Oda et al., 1992; Y. Qiu, unpublished data). Almost all of these introns must have been acquired since the streptophytes split from chlorophytes, because they are not found in Nephroselmis (Turmel et al., 1999b), Pedimonas (Turmel et al., 1999b), or Prototheca (Wolff et al., 1994), based on complete genome sequencing. A previous study of three angiosperm specific introns has shown that they extend their distribution to hornworts and mosses, but not to liverworts, thus providing evidence to support the view of liverworts being the sister group of the rest of the land plants (Qiu et al, 1998). Survey of more angiosperm-specific introns across land plants is in progress (Malek & Knoop, 1998; V. Knoop & Y. Qiu, unpublished data), and identifying points of acquisition or transposition of these intronsor other structural changes is likely to answer some of the key questions in charophyte and early land plant phylogeny.

There are several ways in which green plant phylogenetics can in turn contribute to genomics. First, evolutionary interpretation of observations made based on model plants like Marchantia and Arabidopsis has to be based on a robust phylogeny built using total evidence (i.e., genomic, developmental, metabolic, morphological, and ecological characters) A recent study uncovering 26 independent transfers of the mitochondrial gene rps10 to the nucleus in 276 species of angiosperms surveyed vividly demonstrates the power of phylogeny in helping distinguish between two alternative hypotheses: many recent independent transfers versus one single ancient transfer (Adams et al., 2000). Second, to further investigate questions that arise from studies of model organisms, a phylogenetically oriented approach offers a predictable and efficient way of experimental design and execution. The investigation of mitochondrial genome evolution in eukaryotes by sampling well chosen organisms across the eukaryote phylogeny is an excellent example of this approach (Gray et al., 1999; Lang et al., 1999). Third, to understand the knowledge gap between model organisms, one has to look at non-model organisms, the selection of which should be guided by phylogeny.

Evolution of desiccation tolerance.

The most fundamental problem of being a land plant is how to maintain a well-hydrated protoplasm in an atmosphere that is nearly always drastically lower in water potential. A major selective force in the evolution of the larger land plants appears to be structural mechanisms to get and keep water (e.g., roots, water conducting tissues, cuticles, and stomates). Some land plants, however, use a different physiological strategy that requires no special means for gathering or holding water: the desiccation-tolerant plants dry out, cease normal metabolism, rehydrate, and resume metabolism.

Recent synthetic phylogenetic analyses support an important finding that vegetative desiccation tolerance was primitively present in the land plants (because of its widespread occurrence in the "bryophytes" -- the basal-most living clades of land plants), but was then lost in the evolution of tracheophytes (Oliver et al., in press). The initial evolution of vegetative desiccation-tolerance was a crucial step required for the colonization of the land by primitive plants from fresh water, but that tolerance came at a cost, since metabolic rates are low in tolerant plants as compared to plants that do not maintain costly mechanisms for tolerance. Thus, the loss of tolerance might have been favored along with the internalization of water relationships that happened as the vascular plants became more complex. However, at least one independent re-evolution of vegetative desiccation tolerance occurred in Selaginella and in the ferns, plus at least eight independent times within the angiosperms. Importantly from a human economic perspective, the same genes that had evolved for cellular protection and repair in earlier vegetative tissues appear to have been recruited for different but related processes in the evolution of seeds. Using these evolutionarily and mechanistically important plants and tissues, we can isolate genes involved in desiccation-related signaling pathways and begin to characterize their function.

The use of microarrays will contribute significantly to this and other areas of research. The microarray technology permits groups of coregulated genes to be identified and, in conjunction with sequencing data, pathways important in the establishment of desiccation tolerance to be uncovered. From such knowledge, additional choices of marker genes can be identified that may be useful in studying evolutionary phenomena such as the reacquisition of desiccation tolerance in angiosperms. In addition, the grouping of coregulated genes via the microarray technology facilitates identification of cis-acting elements involved in transcriptional regulation of the gene group (e.g., regulatory elements that have evolved to coordinate desiccation responses). From such cis-acting elements, regulatory transcription factors can be sought (i.e., as cis-element DNA binding proteins among other methods). A comparison of the relative importance of such regulatory sequences and regulatory proteins relative to structural genes/proteins for a trait in phylogenetic relationships may be informative in understanding how changes in major lineage defining traits, such as desiccation tolerance, have evolved.

Evolution of reproductive biology.

Several major life history changes occurred during the evolution of land plants from their green algal ancestors (Graham, 1985; 1993). One of the most dramatic changes was the formation of a multicellular embryo, involving both the shift from zygotic to sporic meiosis, and establishment of cellular interactions between cells of the different generations (reviewed in Graham 1993). A number of important trends in reproductive biology occurred in the adaptation of land plants to increasingly drier environments; including the switch to the diploid-dominant life cycle and subsequent reduction of the gametophyte.

A series of investigations into the evolution of aspects of reproductive biology are now possible, encompassing ultrastructural studies of cell division, assessment of homology of reproductive structures and processes in seed plants, and a comparative exploration of the genes involved in reproduction from bryophytes to angiosperms. The gene system responsible for initiation of floral organs in Arabidopsis (the ABC model; e.g., Coen, 1991) is well characterized, and its extension to other groups of angiosperms, including basal angiosperms, has been/is being investigated (e.g., Kramer et al., 1998; Kramer and Irish, 1999; ongoing studies in labs of V. Irish and D. and P. Soltis). Evidence of expression of related homeotic genes in reproductive structures of conifers has also been reported. Of the genes identified in organ initiation, the B class genes are the best characterized (Samach et al., 1997). pistillata together with apetela3 form the B class genes of the ABC model of flower development as described in Arabidopsis. These two genes are critical in organ initiation of the petals and stamens (Jack et al., 1992; Riechmann et al., 1997; Bowman et al., 1991). We need to explore whether the role of these genes in plant reproduction extends to non-flowering, especially non-seed, plants by searching for homologous genes in other seed plants, pteridophytes, and bryophytes. If homologues are found, in situ expression studies can be performed to evaluate the potential role of these genes in plant reproduction.

back to Deep Gene home |
Announcements | News | Original NSF Proposal | Previous Meeting
Governance | Deep Green | links | Webmaster |