Green Plant Phylogeny Research Coordination Group

Summary report of Workshop #3:
Theory and practice in
analysis of large data sets

Louisiana State University, Baton Rouge,
February 15-16, 1996

A fuller account of of the workshop is given in the Minutes of this meeting.


[revised 7 January 1997 -- Summary of third meeting of the Green Plant Phylogeny Research Coordination Group, February 15-16, 1996, hosted at Louisiana State University in Baton Rouge by Russ Chapman.]


This workshop focused on the theory and practice of analysis of large data sets, and while we didn't solve any of the vexing problems outright, we certainly managed to lay out the problems in clear detail and discussed some promising approaches. We hope that individuals and small groups will continue to address these issues in anticipation of later stages of the group effort when the large-scale green plant data set is ready for analysis.

We examined both major phases of phylogenetic analysis; the initial phase of defining the scope of the analysis, OTU's and characters (basically setting up the data matrix) and the secondary phase of inferring a branching diagram (phlyogenetic tree) from the data matrix. In discussing the first phase, we talked about such issues as defining OTU's (should they be exemplars or composites? If the former, how should they be sampled? If the latter, how should their states be assigned?), choosing appropriate characters for a given level (with consideration of important criteria such as rate of change of a character, number of detectable character states, ease of homologizing across study, amount of missing data and polymorphism), deciding when to combine characters in one analysis, and whether it is feasible/desirable to break a large-scale global analysis into parts (compartments) for improved local analyses. In discussing the second phase, we talked about such issues as justifiying underlying assumptions of parsimony versus maximum liklihood, search strategies for finding optimal and near-optimal trees, possible short-cuts to finding the best supported major groups in a large-scale analysis, application of parallel processing, etc.

We also discussed supporting data-basing needs, including GenBank, Tree of Life, and TreeBase, among others.


Participants at Baton Rouge workshop: Brent Mishler--Co-PI (BDM), Russell Chapman--Co-PI (RLC), Mark Buchheim--PI (MAB), Debra Waters (DW), David Swofford (DS), John Huelsenbeck (JPH), Gary Olsen (GO), Chuck Delwiche (CD), Rick McCourt (RM), Tandy Warnow (TW), Michael Donoghue (MD), Michael Sanderson (MS), Victor Albert (VA), Junhyong Kim (JK), Paul Lewis (POL), Richard Olmstead (DO), Kenneth Rice (KR), Pamela Soltis (PS), Chris Henze (CH), Ken Karol (KK), Paul Kores (PK), Detlef Leipe (DL), Lena Struve (LS), Elizabeth Sweedyk (ES), Sean Turner (ST), Xuhua Xia (XX), Juan M. López-Bautista


[report prepared by Brent Mishler]