Section 5: Examples of Research Integrating Genomics and Phylogenetics
The roles of gene duplications and transposable elements in nuclear
It is already clear that gene duplications and transposable elements
have played a large role in shaping the diversity of plant alleles. Transposable
elements can modulate promoter activities, splicing decisions, and protein
activity (Walbot, 1991). Transposable elements may actually facilitate
the rapid restructuring of genomes following polyploidization. Matzke
and Matzke (1998) suggest that polyploidy permits extensive gene modification
by transposons because polyploid genomes, with their multiple copies of
all genes, are buffered from the deleterious consequences of transposition
(reviewed in Soltis and Soltis, 1999). When transposons excise from alleles,
they create tremendous diversity: many of the resulting alleles are non-functional,
but the functional alleles contain diverse additions, deletions, and point
mutations at the original transposon site (Nordborg and Walbot, 1995).
In maize the high frequency of indels compared to point mutations in standard
alleles points to the prior impact of transposons in shaping allelic diversity.
In fact, there are so many transposons in angiosperms, it is a research
"hot topic" to understand how they are epigenetically controlled
(Martienssen, 1998). Are there bursts of transposon activity that yield
many new alleles, followed by a quiescent period in which natural selection
sorts out the products? Two possible reasons that plants can survive active
transposons are: first, there is gametophytic selection that eliminates
lethal alleles and second, plants contain many duplicate genes.
In contrast to yeast, nematode, and even human, plant genomes contain
many more duplicate loci. In some cases, such duplications arose through
polyploidy, a process that has occurred in the history of more than half
of the angiosperms (Stebbins, 1971; Grant, 1981; Masterson, 1994). Maize
is a good example of a species in which the polyploidization events have
been dated and the divergence of the pairs of genes at homologous loci
have been mapped; indeed, recognition of the polyploid nature of maize
allows a more comprehensive syntenic map to be derived among the grasses
(Moore et al., 1995). More importantly, detailed studies of the range
of alleles present at pairs of duplicate loci indicate that in almost
every case there are now distinct patterns of gene expression. That is,
the duplicate genes have functionally diverged since genome doubling.
It is inferred that gene duplication provides much of the "raw material"
on which natural selection can act and that in plants these selective
pressures result in alleles with ever more specialized domains of expression.
These duplicate loci are likely to contribute to both morphological and
Even in "true diploids" (i.e., those plants lacking evidence
of recent polyploidization) gene duplication is an important factor. About
15% of all Arabidopsis genes have a nearby duplication, and in some gene
families multiple copies are present. For example, glutathione S-transferase
genes, which are crucial in stress responses and for the management of
plant secondary metabolites, exist in two clusters with 3 genes and one
cluster of 7 tandem copies of functional genes (Edwards et al., 2000).
Similarly, plant defense genes are typically present in complex families
arrayed within a chromosomal region (Noel et al., 1999; Meyers et al.,
1999); intralocus recombination generates many new combinations. Thus,
the very arrangement of duplicate loci within the chromosome is likely
to be a selected feature.
Plants are remarkably more resistant to ionizing radiation than animals.
It is possible that the lack of a germ line and continuous exposure to
fluctuating UV are strong selective forces for compact genes in plants;
not only are promoters short, but so are the introns and 5' and 3' untranslated
regions. As a consequence of this, the introduction of short motifs, often
by transposable elements, can profoundly alter the expression pattern
of a plant gene (Walbot, 1996).
The new phylogenomics collaboration will stimulate examination of several
important topics in this area. (1) Identifying members of gene families
against a background of allelic variation at individual loci within and
between species. It is possible that allelic variation within a locus
will be higher than between-locus variation in comparisons of specific
alleles. (2) Developing vocabulary and learning the emerging database
tools for recognizing plant transposons within genes. (3) Examining whether
rapid morphological and chemical evolution is preceded or accompanied
by transposon activities and/or widespread duplication. Minor changes
in allele coding regions or promoters may be the foundation for diversification.
Evolution of gene order, arrangement, and function in the nuclear genome.
The full potential of the complete Arabidopsis sequence will only be
realized when its genome structure, gene content, and gene functions can
be understood in relationship to its own evolutionary history and to that
of other plant species. It is through comparative genomics that researchers
will deduce the mechanisms and pathways by which plant genes and genomes
have diverged to give the diversity of form, function, and adaptation
that now characterize the world's flora. On the practical side, it is
expected that the genomic sequence of Arabidopsis can be used to predict
gene content and gene function in crop species, most of whose genomes
are too large for genomic sequencing any time in the near future.
There are two underlying assumptions required for extrapolating genomic
information from Arabidopsis to other plant species: (1) Arabidopsis and
all other plants have inherited gene order and gene content, with modifications,
through common ancestry. (2) The individual genes, now present in modern-day
plant species, can be used to reconstruct ancestral gene order and content.
These assumptions have already been tested and largely verified for species
within plant families. For example, in the grass family (Poaceae), which
contains such familiar crops as corn, wheat, rice and millet, gene order
has been conserved in large blocks, often comprising entire chromosomes
or chromosome arms (Gale & Devos, 1998). Comparative sequencing and
cross hybridization of cDNA clones in the grasses have also demonstrated
that gene content and often gene function are also conserved (Van Deynze
et al., 1998; Chen et al., 1997). Similar results have been obtained with
studies in the nightshade family (potato, tomato, and pepper) (Tanksley
et al.,1992, Livingstone et al., 1999). However, in the mustard family
(Brassicaceae), which includes arabidopsis, cabbage, and broccoli, genomes
seem to have evolved differently from the grasses or nightshades. While
gene content is conserved, the genomes of mustard species are often highly
rearranged relative to one another, and gene duplications are common,
at least some of which are due to polyploidy (Lagercrantz & Lydiate,
1996; Lagercrantz, 1998).
While comparisons among genomes have been quite common for species within
plant families, comparisons between plant families have been rare and
fraught with technical difficulties. Specifically, reduced gene similarities
between plant families has made comparative mapping, via common probes
and Southern hybridization, problematic if not impossible. Nonetheless,
research by Paterson et al. (1996) suggests that blocks of linked genes
are conserved across angiosperm families and even between highly divergent
monocots and eudicots. Comparative sequencing data of Arabidopsis and
rice have been used to both refute (Devos et al., 1999) and support (van
Dodeweerd et al., 1999) this hypothesis.
Tomato and Arabidopsis belong to two clades that diverged relatively
early in the radiation of eudicots. Based on fossil evidence the two families
separated more than 90 million years ago (MYA) (Gandolfo et al., 1998).
Mitochondrial DNA sequence comparisons place the divergence at 112-156
MYA (Yang et al., 1999). Because of their early divergence, a comparison
of the tomato and Arabidopsis genomes should provide a glimpse of gene
and genome evolution since the radiation of eudicots, and provide information
relevant to the large number of species (including many crop plants) that
fall within this clade.
As described above, Arabidopsis and most flowering plants are likely ancient
polyploids and, as comparisons are made across greater and greater phylogenetic
distances, the likelihood increases that polyploidy has occurred in the
lineage of one or both species being compared. If this prediction is correct,
comparisons across families of plants will not result in matches between
single homologous segments, but rather in matches among sets of homologous
genes and duplicated gene segments. If polyploidy was a factor in the
evolution of the Arabidopsis genome, the gene number for the progenitor
of Arabidopsis and tomato (and hence many other angiosperm families) could
have been considerably less than the 20,000-25,000 genes estimated for
Arabidopsis (Meinke et al., 1998; Somerville & Somerville, 1999).
Ku et al. (in press) have recently estimated that the ancestral genome
for Arabidopsis and tomato would have contained approximately one half
the number of genes seen in the present-day Arabidopsis genome.
Syntenic relationships can be detected between the Arabidopsis genome
and the genomes of other families of angiosperms based on gene homologies
(Ku et al., in press), but matches tying Arabidopsis to other plant genomes
will not likely be based on single ortholog pairs, but rather networks
of homologous genes created by multiple rounds of polyploidy followed
by gene divergence and gene loss. Establishing genome relationships among
divergent plant families on a gene for gene basis may therefore be more
complicated than originally expected. However, such analyses will eventually
allow for an understanding of the events and mechanisms that have molded
plant genome evolution and the exchange of sequence and functional information
among species over very long periods of evolutionary time.
Organellar genomics and origin and early evolution of land plants.
The organellar genomes, in comparison to their nuclear counterpart, have
two advantages for phylogeny reconstruction: (1) a much higher coding/non-coding
sequence ratio, thus higher information content (noncoding sequences are
subject to elimination during evolution and thus lack homologous regions
across a wide variety of organisms), and (2) less complicated genome dynamics,
e.g., they lack gene and genome duplication, DNA segment translocation,
and mobile elements (hence it is easier to identify homologous regions
for phylogenetic analysis).
We may reach a limit to the resolving power of simple nucleotide sequence
comparisons for phylogenetic reconstruction at the deeper levels in the
tree of life. Recent empirical and theoretical studies have shown that
a length much longer than that of a typical single gene (1-2 kb) is required
to reconstruct a complicated phylogeny with high resolution and confidence
(Hillis, 1996; Qiu et al., 1999; Soltis et al., 1999). Thus, the prospect
of adding significant numbers of new structural genomic characters to
deep-level phylogenetic analyses is exciting. One major strength of organellar
genomic structural changes in phylogenetics is that they occur much less
frequently compared to point mutations. Furthermore, exactly parallel
structural changes in two different lineages are much less likely than
parallel point mutations. This is not to say that genomic structural characters
are completely homoplasy-free; they are like any other characters (morphological,
metabolic, and nucleotide substitution) subject to parallelism, reversal,
and convergent evolution. However, theoretical considerations suggest
that genomic rearrangements have good properties for use as phylogenetic
markers at deep levels (slow rate of change, many possible character states;
Mishler 1994), and empirical studies have shown them to very important
characters for marking deep branches.
For example, based on completely sequenced chloroplast genomes, it is
known that the entire set of ~20 introns in land plants were acquired
during the stage of charophyte evolution, because they are not present
in either Mesostigma (an early divergent lineage either sister to the
rest of the green plants, Lemieux et al., 2000, or to the streptophytes,
Qiu & Lee, submitted), or the chlorophyte algae Nephroselmis (Turmel
et al., 1999a) and Chlorella (Wakasugi et al., 1997), but are present
in Marchantia (Ohyama et al., 1986), pine (Wakasugi et al., 1994), rice
(Hiratsuka et al., 1989), maize (Maier et al., 1995), and tobacco (Shinozaki
et al., 1986). Hence, determining the points of intron acquisition is
likely to add information to resolve relationships among charophytes and
to identify the algal sister lineage of land plants. Furthermore, while
the chloroplast genome exhibits structural conservation across green plants,
several operons were disrupted or formed during charophyte evolution.
Identifying the points of operon disruption and formation will also help
resolve some of the most contentious issues in green plant phylogenetics.
Comparative genomics of the mitochondria can also play a very important
role in phylogeny reconstruction. By comparison to its chloroplast counterpart,
the mitochondrial genome shows little structural conservation in land
plants after the basal split with liverworts, yet there are a number of
conserved gene clusters that were disrupted or formed between Marchantia
and Arabidopsis, the only two land plants whose mitochondrial genomes
have been sequenced (Oda et al., 1992; Unseld et al., 1997). Moreover,
several group II introns evolved from cis- to trans-splicing during pteridophyte
and angiosperm evolution (Malek & Knoop, 1998; Y. Qiu & J. D.
Palmer, unpublished data). Angiosperm mitochondrial genomes contain approximately
25 introns (Unseld et al., 1997), whereas the Marchantia mitochondrial
genome has its own set of 32 introns (Oda et al., 1992; Y. Qiu, unpublished
data). Almost all of these introns must have been acquired since the streptophytes
split from chlorophytes, because they are not found in Nephroselmis (Turmel
et al., 1999b), Pedimonas (Turmel et al., 1999b), or Prototheca (Wolff
et al., 1994), based on complete genome sequencing. A previous study of
three angiosperm specific introns has shown that they extend their distribution
to hornworts and mosses, but not to liverworts, thus providing evidence
to support the view of liverworts being the sister group of the rest of
the land plants (Qiu et al, 1998). Survey of more angiosperm-specific
introns across land plants is in progress (Malek & Knoop, 1998; V.
Knoop & Y. Qiu, unpublished data), and identifying points of acquisition
or transposition of these intronsor other structural changes is likely
to answer some of the key questions in charophyte and early land plant
There are several ways in which green plant phylogenetics can in turn
contribute to genomics. First, evolutionary interpretation of observations
made based on model plants like Marchantia and Arabidopsis has to be based
on a robust phylogeny built using total evidence (i.e., genomic, developmental,
metabolic, morphological, and ecological characters) A recent study uncovering
26 independent transfers of the mitochondrial gene rps10 to the nucleus
in 276 species of angiosperms surveyed vividly demonstrates the power
of phylogeny in helping distinguish between two alternative hypotheses:
many recent independent transfers versus one single ancient transfer (Adams
et al., 2000). Second, to further investigate questions that arise from
studies of model organisms, a phylogenetically oriented approach offers
a predictable and efficient way of experimental design and execution.
The investigation of mitochondrial genome evolution in eukaryotes by sampling
well chosen organisms across the eukaryote phylogeny is an excellent example
of this approach (Gray et al., 1999; Lang et al., 1999). Third, to understand
the knowledge gap between model organisms, one has to look at non-model
organisms, the selection of which should be guided by phylogeny.
Evolution of desiccation tolerance.
The most fundamental problem of being a land plant is how to maintain
a well-hydrated protoplasm in an atmosphere that is nearly always drastically
lower in water potential. A major selective force in the evolution of
the larger land plants appears to be structural mechanisms to get and
keep water (e.g., roots, water conducting tissues, cuticles, and stomates).
Some land plants, however, use a different physiological strategy that
requires no special means for gathering or holding water: the desiccation-tolerant
plants dry out, cease normal metabolism, rehydrate, and resume metabolism.
Recent synthetic phylogenetic analyses support an important finding that
vegetative desiccation tolerance was primitively present in the land plants
(because of its widespread occurrence in the "bryophytes" --
the basal-most living clades of land plants), but was then lost in the
evolution of tracheophytes (Oliver et al., in press). The initial evolution
of vegetative desiccation-tolerance was a crucial step required for the
colonization of the land by primitive plants from fresh water, but that
tolerance came at a cost, since metabolic rates are low in tolerant plants
as compared to plants that do not maintain costly mechanisms for tolerance.
Thus, the loss of tolerance might have been favored along with the internalization
of water relationships that happened as the vascular plants became more
complex. However, at least one independent re-evolution of vegetative
desiccation tolerance occurred in Selaginella and in the ferns, plus at
least eight independent times within the angiosperms. Importantly from
a human economic perspective, the same genes that had evolved for cellular
protection and repair in earlier vegetative tissues appear to have been
recruited for different but related processes in the evolution of seeds.
Using these evolutionarily and mechanistically important plants and tissues,
we can isolate genes involved in desiccation-related signaling pathways
and begin to characterize their function.
The use of microarrays will contribute significantly to this and other
areas of research. The microarray technology permits groups of coregulated
genes to be identified and, in conjunction with sequencing data, pathways
important in the establishment of desiccation tolerance to be uncovered.
From such knowledge, additional choices of marker genes can be identified
that may be useful in studying evolutionary phenomena such as the reacquisition
of desiccation tolerance in angiosperms. In addition, the grouping of
coregulated genes via the microarray technology facilitates identification
of cis-acting elements involved in transcriptional regulation of the gene
group (e.g., regulatory elements that have evolved to coordinate desiccation
responses). From such cis-acting elements, regulatory transcription factors
can be sought (i.e., as cis-element DNA binding proteins among other methods).
A comparison of the relative importance of such regulatory sequences and
regulatory proteins relative to structural genes/proteins for a trait
in phylogenetic relationships may be informative in understanding how
changes in major lineage defining traits, such as desiccation tolerance,
Evolution of reproductive biology.
Several major life history changes occurred during the evolution of land
plants from their green algal ancestors (Graham, 1985; 1993). One of the
most dramatic changes was the formation of a multicellular embryo, involving
both the shift from zygotic to sporic meiosis, and establishment of cellular
interactions between cells of the different generations (reviewed in Graham
1993). A number of important trends in reproductive biology occurred in
the adaptation of land plants to increasingly drier environments; including
the switch to the diploid-dominant life cycle and subsequent reduction
of the gametophyte.
A series of investigations into the evolution of aspects of reproductive
biology are now possible, encompassing ultrastructural studies of cell
division, assessment of homology of reproductive structures and processes
in seed plants, and a comparative exploration of the genes involved in
reproduction from bryophytes to angiosperms. The gene system responsible
for initiation of floral organs in Arabidopsis (the ABC model; e.g., Coen,
1991) is well characterized, and its extension to other groups of angiosperms,
including basal angiosperms, has been/is being investigated (e.g., Kramer
et al., 1998; Kramer and Irish, 1999; ongoing studies in labs of V. Irish
and D. and P. Soltis). Evidence of expression of related homeotic genes
in reproductive structures of conifers has also been reported. Of the
genes identified in organ initiation, the B class genes are the best characterized
(Samach et al., 1997). pistillata together with apetela3 form the B class
genes of the ABC model of flower development as described in Arabidopsis.
These two genes are critical in organ initiation of the petals and stamens
(Jack et al., 1992; Riechmann et al., 1997; Bowman et al., 1991). We need
to explore whether the role of these genes in plant reproduction extends
to non-flowering, especially non-seed, plants by searching for homologous
genes in other seed plants, pteridophytes, and bryophytes. If homologues
are found, in situ expression studies can be performed to evaluate the
potential role of these genes in plant reproduction.