the how's and why's of generating random trees

PART I.  Generating random data using Macclade;

1) What about random data? Why would we care about such randomness? One might care to use such a feature of Macclade if one cares to determine if the particular observed values of some statistic differ from those obtained under a particular random model. (these random tools use a random number generator translated from C source code (Swofford, 1991)).

2) How to specify the model?  The model you choose for generating characters is based on whether or not you choose to assign states randomly to taxa without reference to a tree or evolve states using some stochastic model of change on a given tree.

3) Assigning states randomly to a group of taxa.

 a) create a matrix with 15 taxa and 300 characters (specify these characters as DNA using the ìFormatî window in the data editor). Name your taxa (Individs 1-15)
 b) now highlight your matrix (select all rows and columns using your mouse).
 c) under utilities choose ìFill Randomî--a dialogue box should appear.
 d) you may play with the frequencies of each nucleotide base, but remember that the frequencies must sum to one. (ask yourself about the relative randomness when conducting this excercise.  Are we creating truly random data?)
 e) save your work to the Desktop as ìThe RANDOM LABî
 f) Since Macclade does not analyze data, create a default ladder tree by selecting ìTree Windowî under ìDisplayî--
 g) create the following Topology:

 h)save the tree file under whatever name you choose.

 Now letís enter into the real (?) world...

1) Assume now that the data you have generated randomly are actual cpDNA sequences from a number of plant individuals occupying different, but neighboring habitats in a bottomland hardwood forest (with a bayou giving way to a forest fore-ridge, leading to a forest back-ridge and finally into fresh-water marsh).  You are interested in testing whether or not there is some restriction to gene flow among these populations.

2)The number of migration events required by a particular tree may be a good statistic to measure gene flow (as suggested by Slatkin and Maddison, 1989; Maddison and Slatkin, 1991).  We will use this measure today with our own data samples.

3)Create a Macclade data file listing all of the specimens and treat locality as a four-state unordered character.  Provide the following habitat data to your taxa (taxa 1,2,4,5 inhabit locality Bayou ; 6 and 8 inhabit fore-ridge; 7, 9,10, 15 inhabit backridge; and  3, 11-14 inhabit the marsh.

4)Trace this character to determine the minimum number of migrations among your localities.

5)Next create a random tree chart  for 1,000 randomly generated random joining trees (that is, create a chart containing trees of variable lengths for the character traced).  This chart should look something like this:

N.B.: There are a couple of ways in which you can generate the above chart - one is by going to the Chart option under the Chart menu and asking Macclade to generate and chart 1,000 random joining trees. Or by going to the Tree window and selecting ìRandom Trees.î Try both, but in any event create 1,000 random joining trees. See the attached page from Macclade for definitions)

6) Slatkin and Maddison have proposed that if the members of our four localities were part of a single panmictic population, the minimum number of migrations among localities should have the same probability as the number of steps on random joining trees.

How unlikely are yor migratory events?  Do you have samples from a panmictic population, or have some barriers to gene flow been established?

7) Now just for fun and if thereís time, open your randomly generated cpDNA file in PAUP and do a simple heuristic search (REMEBER TO CHECK THE DATE BEFORE OPENING PAUP! SET YOUR DATES BACK TO FEB 1 OR EARLIER. THE LATEST VERSION HAS NOT YET ARRIVED.).

8) Now compute a consensus and save to a tree file (or select one among your most parimonious, assuming that you have multiple trees)

9) Open that tree file in Macclade and carry through steps 1-6 once again.  What has changed?

Now, hereís that re-occurring question: What are the assumptions associated with random tree (or data) generation?

Slatkin and Maddison 1989. Genetics 123:603-613
Maddison and Slatkin. 1991. Evolution 45:1184-1197