The sooner EndNote dies, the better
The reference manager software Endnote is the single crappiest piece of software I have ever used. The sooner it dies a nasty, horrible, painful death, the better.
R-phylo.org launched
R (yes, the letter) is a seriously powerful statistics programming language. A recently announced project, R-Phylo has been started to help people use R for phylogenetic work:
All organisms are linked together by the tree of life. We can use this tree along with trait data, to understand many aspects of biology: does specialization lead to increased speciation? do body size and brain size coevolve? how have genome sizes changed over time? and more. R has many functions to address such questions. This website has tutorials on how to do these analyses in R and an overview of what is available in R. There is also a mailing list for asking questions about using and developing comparative methods in R
There's already been some interest in R and phylogenetics already (e.g. APE and a book), so it's great to see that continuing. There's a short list of tutorials/how-to guides already up on the site, and they look good so far. (via).
A General Comparison of Relaxed Molecular Clock Models
The December issue of Molecular Biology and Evolution is out, and my pick is the new paper by Thomas Lepage, Dave Bryant, Hervé Philippe and Nicolas Lartillot which reviews and compares the recent work on Relaxed Clock models for phylogenetic dating. These relaxed clock models are rapidly becoming the weapon of choice for dating events in phylogenetics and I've only seen them implemented in BEAST, so far, but this paper looks to be a good update on the progress here.
Abstract:
Several models have been proposed to relax the molecular clock in order to estimate divergence times. However, it is unclear which model has the best fit to real data and should therefore be used to perform molecular dating. In particular, we do not know whether rate autocorrelation should be considered or which prior on divergence times should be used. In this work, we propose a general bench mark of alternative relaxed clock models.
We have reimplemented most of the already existing models, including the popular lognormal model, as well as various prior choices for divergence times (birth–death, Dirichlet, uniform), in a common Bayesian statistical framework. We also propose a new autocorrelated model, called the "CIR" process, with well-defined stationary properties. We assess the relative fitness of these models and priors, when applied to 3 different protein data sets from eukaryotes, vertebrates, and mammals, by computing Bayes factors using a numerical method called thermodynamic integration.
We find that the 2 autocorrelated models, CIR and lognormal, have a similar fit and clearly outperform uncorrelated models on all 3 data sets. In contrast, the optimal choice for the divergence time prior is more dependent on the data investigated. Altogether, our results provide useful guidelines for model choice in the field of molecular dating while opening the way to more extensive model comparisons.
The Drummond Lab Blog: Computational Biology and Evolution
I'd like to be the first to welcome Alexei Drummond's research group to the world blogging arena. Their new blog, Computational Biology and Evolution, proposes to be "a heady mix of computational science, evolutionary biology and other things that matter".
Alexei's interests are:
- Statistical models and algorithms for understanding biomolecular sequence evolution, structure and function
- Genomic sequence analysis
- Coalescent-based population genetics
- Virus evolution
- Evolutionary theory, complexity theory and their intersection
- Bioinformatics software
He has also been behind some of the shiniest new bioinformatics software out there (e.g. BEAST and Geneious) and has a good line-up of students working on interesting problems, so this is definitely a blog to watch.
Phylogenetics round-up: Taxon-adding, R, Triangles and branch-lengths
New issues of Systematic Biology, Molecular Biology and Evolution and Bioinformatics mean that it's about time I did a phylogenetics round-up.
- The first study, "Experimental Design Criteria in Phylogenetics: Where to Add Taxa" by Geuten et al, investigates the best places to add taxa to enhance phylogenetic accuracy. This is quickly becoming a non-trivial problem in phylogenetics work, as the amount of sequence data we have far surpasses the amount of computing iron we have to throw at it. Using a Fisher information approach, the authors conclude that:
(our) criteria show a general preference for augmenting the tree at deep internal nodes connected to long branches, increasing information about the more uncertain regions of the tree. For more extreme phylogenetic trees, with combinations of branch lengths that make them difficult to reconstruct accurately (...), the information criteria do not necessarily choose nodes as their optimal location in the tree for targeted taxon addition. In these cases, targeting a long branch for subdivision can be the most optimal strategy.
- This is followed by Barbara Holland reviewing Emmanuel Paradis' new book Analysis of Phylogenetics and Evolution with R. Paradies has written the APE (Analysis of Phylogenetics and Evolution) library for the R statistical framework, and this book appears to be a expanded tutorial to both R and APE etc.Barbara argues that the book, APE and R provide a really good opportunity to step away from the black boxes of PAUP* and Phylip and get real hands-on experience doing phylogenetics, as well as a good framework for implementing new algorithms. Sounds fun.
- Just to cherry-pick a few more interesting articles from Syst. Bio this month, Brown and Lemmon who explore "The Importance of Data Partitioning and the Utility of Bayes Factors in Bayesian Phylogenetics", and Nicolas Galtier's A Model of Horizontal Gene Transfer and the Bacterial Phylogeny Problem which shows that supertree methods are actually quite robust to horizontal gene transfer. I'm adding both of these to my "to cite" pile.
- Ok, on to MBE, where some of the Alan Wilson center folk (including the aforementioned Barbara Holland) introduce Treeness Triangles, a method of visualizing the loss of phylogenetic signal. The software is here.
- ..and another paper touching on a really important issue that I've been encountering recently, the accuracy of Bayesian phylogenetic methods at estimating posterior probabilities and branch lengths. Bryan Kolaczkowski and Joseph W. Thornton investigate the Effects of Branch Length Uncertainty on Bayesian Posterior Probabilities for Phylogenetic Hypotheses and find that:
The pattern of branch lengths on the true tree determines whether integrating over uncertainty pushes posterior probabilities upward or downward. The magnitude of the effect depends on the specific prior distributions used and the length of the sequences analyzed. Under realistic conditions, however, even extraordinarily long sequences are not enough to prevent frequent inference of incorrect clades with strong support.
- Finally, AWTY or (Are We There Yet?) is a piece of software that a number of people have recommended to me recently, as a means of assessing convergence in Bayesian MCMC runs:
AWTY is a system for graphical exploration of Markov chain Monte Carlo (MCMC) convergence in Bayesian phylogenetic inference. The graphics produced by AWTY are designed to help assess whether an MCMC analysis has run long enough, such that tree topologies are being sampled in proportion to their true posterior probability distribution. In other words, "Are We There Yet?" or AWTY for short. Admittedly, the results generated by AWTY will never be able to answer this question with a definitive yes; however, in some cases results will point confidently to the answer no.
The full write-up is available here: AWTY (Are We There Yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics, and the online web-app is available at AWTY online.