Green Plant Phylogeny Research Coordination Group

Summary report of Workshop #1:
Current status of the phylogeny of the
charophyte green algae and the embryophytes

University and Jepson Herbaria, University of California, Berkeley,
June 24-28, 1995

This workshop was the first of a series planned by the Green Plant Phylogeny Research Coordination Group (GPPRCG), sponsored by a grant from the DOE/NSF/USDA Joint Program on Collaborative Research in Plant Biology (USDA grant no. 94-37105-0713). The GPPRCG was set up in September, 1994 with the aim of initiating and facilitating interaction world-wide among independent research groups interested in major patterns of relationship in green plants. This initiative is based on the insight that further progress in green plant phylogeny requires substantial, coordinated data collection from selected key taxa (exemplars). Specific objectives of the group include coordination of data gathering; establishment of phylogenetic databases for use by researchers, teachers or students; stimulation of creative new approaches to investigating green plant phylogeny and related macroevolutionary issues; and encouragement of collaboration among research groups.

Ten workshops are scheduled over a five year period, and these will alternate between major North American scientific conferences and other independent venues. Coordination of research activity will be achieved principally through publication, via the World Wide Web, of data availability tables for exemplar taxa. These tables will provide a readily accessible and up-to-date summary of the state of current knowledge on phylogenetically important plants. They are also intended to highlight shortcomings in current knowledge and thereby provide a guide to researchers in the field.

Whereas this initiative encourages coordination of research activity, the GPPRCG will not allocate discreet tasks to individuals or labs. All researchers interested in plant phylogeny are encouraged to participate freely and equally and to contribute results to the data availability tables. Investigators will publish and justify their own data in the normal way, so everybody will get credit for their own data (only independently published data will end up in the final data matrix). Contributors will be invited to participate in an eventual publication that will attempt a complete and well-supported high- level phylogenetic analysis of green plants. This publication will take the form of a book, including multiauthored chapters along with the data matrix in electronic form (with all contributors acknowledged), which is scheduled for completion by the International Botanical Congress in St Louis, 1999. This initiative is seen as complimentary to other World Wide Web based projects such as the Tree of Life (; and Tree Base (WWW address not yet specified).

The focus of this first workshop in Berkeley (hosted at the University and Jepson Herbaria by B.D. Mishler) was the Streptophyte clade which comprises embryophytes (land plants) and closely related charophycean green algae. A second workshop on the Chlorophyte clade (i.e., the remainder of the green algae) was held in conjunction with the Phycological Society of America meeting at Breckenridge, CO (August 6-12, 1995). Financial considerations limited attendance at both workshops to invited participants who were chosen to represent a broad spectrum of expertise in systematics, data collection and data analysis. Specific goals of both meetings included defining a list of key exemplar taxa for further data gathering, the establishment of an outline data matrix for these taxa, and the coordination of efforts at improving and analysing these data. For streptophytes, these main goals were successfully achieved at the Berkeley workshop. Also discussed at that meeting were the objectives and strategy of the GPPRCG, accessibility and dissemination of data on green plant phylogeny, potential group products (e.g., publications), contacts with other major coordinated research efforts in the physics and biological communities, and potential additional sources of funding. Following is a summary of the main findings from the Berkeley meeting (attended by 35 participants from 7 countries, see list below).

Choice of exemplar taxa: The use of exemplar taxa in phylogenetic analysis is a necessary compromise between sampling total biological diversity and setting achievable goals in terms of data collection and analysis. A figure of 200-300 taxa was deemed sufficient for a coarse-grained analysis that would resolve broad level relationships in streptophytes. Exemplar choice was based on current phylogenetic hypotheses with the following additional requirements: i) even sampling of diversity, ii) priority to taxa with predominantly plesiomorphic states (i.e., exclusion of highly autapomorphic taxa), iii) inclusion of phylogenetically controversial taxa, iv) inclusion of taxa used as model systems in other branches of biology (e.g., crop plants such as Oryza, Zea), v) inclusion of nomenclatural types, and vi) current availability of material and data. Four groups of systematists formulated a provisional list of exemplars (living and fossil) totalling some 320 taxa from charophycean algae (c. 20 taxa), liverworts and hornworts (c. 41 taxa), mosses (c. 60 taxa), and tracheophytes (c. 200 taxa). These 320 exemplars (to be listed on this WWW site) form a nucleus of taxa for coordinated data gathering activities in streptophytes. In the context of the general analysis of streptophyte phylogeny, the exemplar list errs on the side of generosity (i.e., it probably over-samples group diversity). The relative importance of some of the exemplars was discussed and some were identified as being of secondary importance in the overall analysis.

Outline data matrix: Renewed interest in phylogenetic relationships among green plants - stimulated by recent advances in systematic theory, data analysis and molecular biology - has led to a large, diverse and rapidly expanding phylogenetic database. The problem of coordinating data gathering was addressed through the establishment of a data availability matrix (DAM) that is freely available through the Internet (but not yet). Initially, the green plant DAM will take the form of a simple table that indicates the existence of data (published and unpublished) for all exemplar taxa (living and extinct). The scope of this database will be extended in the future to include fields for comments and pointers to other entities such as literature citations and individual specialists or labs. Data domains are based on the interests of particular specialist groups (e.g., molecular, morphological). These domains are further subdivided along speciality (e.g., rbcL (chloroplast), 18S rRNA (nuclear), spermatozoid ultrastructure, spore morphology, wood anatomy, floral morphology, etc.). The green plant DAM summarises known data for exemplar taxa and highlights shortcomings in current knowledge and areas in need of further investigation.

Additional databases: General discussion at the Berkeley workshop highlighted the need for a number of additional databases. These will also be included eventually on the World Wide Web page. Researchers interested in green plant phylogeny are invited to contribute data to the following: 1. A DNA availability matrix which will list published and unpublished sequences as well as information on planned sequencing projects including names and addresses of investigators. This source of information is intended as a guide to researchers planning new sequencing projects. 2. A primer list will document the availability of primers and list publications where they were first used. This database is intended to help researchers obtain the necessary primers for sequencing studies. The primer list is being compiled by Bill Hahn of the Smithsonian Institution and Chuck Delwiche of Indiana University. 3. A culture availability matrix will list the location of cultures of green algae. This database is intended to aid researchers in obtaining difficult taxa.

Analytical and theoretical considerations: The main goal of the GPPRCG is to develop a complete and well supported high-level phylogenetic hypothesis for green plants based on the total available evidence. Workshop participants recognized that this goal presents a significant analytical challenge. One estimate placed the size of the 1999 St Louis data matrix for green plants at 300-400 taxa with as many as 4,000-10,000 characters per taxon. Parsimony-based tree searching algorithms are currently unable to guarantee a most parsimonious solution with data matrices of this size. Furthermore, it is unclear how best to approach the analysis using heuristic procedures. The ability to generate data is fast outstripping the capacity for analysis.

Other areas of difficulty with potential consequences for analysis include missing data and inapplicable character states. Missing data are likely to be a significant problem when dealing with fossils, primarily because molecular sequences are absent for these taxa. The large number of molecular characters in the green plant DAM will lead to large blocks of missing data for fossils. Several participants noted that the problem of inapplicable morphological characters is difficult to avoid because they are a natural consequence of evolution in green plants. One clear example of this problem relates to multicellular characters of more derived plants which do not apply in many basal lineages. How are seeds to be scored in Spirogyra and, carpels in Coleochaete? A number of ideas were considered including treating these as missing character states, as large multistate characters, or ignoring these data altogether in favour of "global characters" that are scorable in all taxa (e.g., molecular sequences, subcellular morphology, physiological and biochemical pathways). Each of these three alternatives is problematic. Treating inapplicable characters as missing data allows states to be assigned to non-existent characters which may lead to spurious relationships. Reformulating characters to avoid inapplicability leads to the apparently absurd notion of scoring the entire multicellular morphology of an angiosperm as a multitude of states within one character. Ignoring these problematic characters was an option that was strongly rejected by most participants because this would exclude relevant data, including the entire fossil record, and would remove most of the biologically interesting information from the analysis.

The group recommended that a future workshop should be devoted to addressing these and other analytical/theoretical problems and that every effort should be made to encourage research in this area.

Forthcoming meetings: Four future meetings of the GPPRCG will focus in greater detail on streptophyte subgroups. These meetings will be organised as workshops alongside symposia at major N. American conferences. Additional workshops will be held on a number of related topics including the fossil record, data analysis and research co-ordination. Please click here for more information

All interested parties are urged to examine the GPPRCG WWW Page, contribute data, and contact one of the PIs for further information.

Green Plant Phylogeny Research Coordination Group : Buchheim, M. A. (Principal Investigator, University of Tulsa) Chapman, R. L. (Co-Principal Investigator, Louisiana State University) Mishler, B. D. (Co-Principal Investigator, University of California, Berkeley)

World Wide Web Page (University of California, Berkeley): Mishler, B. D., Speer, B. R.

Contributions to the various World Wide Web based databases should be addressed to the following co-ordinators:

Participants at Berkeley workshop: Baldwin, B., Buchheim, M. A., Chapman, R. L., Crandall-Stotler, B., Crane, P. R., De Luna, E., Delwiche, C., Doyle, J. A., Gensel, P., Goffinet, B., Graham, L., Hahn, W., Hedderson, T. A. J., Huss, V., Hyvonen, J., Kenrick, P., Lemmon, B., Maddison, D., Manhart, J., McCourt, R. M., Mishler, B. D., Newton, A., Olmstead, R., Pryer, K., Renzaglia, K., Sanderson, M., Smith, A., Soltis, D., Speer, B. R., Stotler, R., Vitt, D., Waters, D. A., Withey, A., Zander, R. H., Zimmer, E. A.

[report prepared by Paul Kenrick]