Book Review by Hamid Pezeshk

 

(NEWSLETTER of  International Society for Clinical

 

   Biostatistics – ISCB No. 44 December 2007)

 

 

Statistical Methods in Molecular Evolution

 

Rasmus Nielsen

               Editor

 

 (2005) Springer  0-387-22333-9

 

 

 

 

This book is the outcome of a large amount of scholarly effort. It is a reference book for statistical methods in molecular evolution.  A comprehensive review of many important statistical problems arising in the field is presented. Each chapter of the book has been written by the researchers who are active in the field of molecular evolution. The contributors cover a wide range of expertise. Statistical procedures based on stochastic modeling along with computational optimization are discussed assuming no advanced mathematical skills beyond basic calculus. Some knowledge of probability theory is needed. So this is a well written book for both the statisticians interested in genomics and bioinformatics and molecular evolutionary biologists interested in obtaining deeper insight of statistical analysis applied to the field.

 

 

 

There are 17 chapters in four parts. Part I contains four chapters on introduction. The four chapters of part II are on the practical approaches for data analysis. The four chapters of part III form an important part of the book on models of molecular evolution and the five chapters of part IV are on inferences on molecular evolution.  There are several references at the end of each chapter for readers interested in further reading.

 

 

Chapter 1 looks at Markov models in molecular evolution. Necessary mathematical background for DNA sequence evolution including continuous-time transition rates, stationary distribution, trees and likelihood are described. Frequently used Markov models of sequence evolution as Juke-Cantor model, amino acid model, nonhomogeneous model are briefly reviewed. Markov models for phylogenetic analysis with an stress on simulation of data for comparing tree-building methods and hypotheses testing are discussed. There are 46 references at the end of chapter 1.

 

 

 

Applications of the likelihood function (LF) in molecular evolution at an introductory level are presented in chapter 2. Maximum likelihood estimates (MLE's) and generalized

 

 

likelihood ratio test (GLRT) together with their properties are briefly explained.  The focus is on reconstructing the phylogenetic history of evolution. Some of the applications of  LF as in testing Hardy-Weinberg equilibrium, maximizing the likelihood for a given tree and computation of LF in phylogenetics are reviewed. There are 23 reference at the end of chapter 2.

 

 

 

Chapter 3 gives a good account of computational methods named Markov Chain Monte Carlo (MCMC) in the field of molecular evolution. The Metropolis-Hasting and Gibbs sampler are mentioned. The convergence of the limiting distribution of the Markov chain together with Burn-in period, trace plots and idea of running several simultaneous chains on a certain state space are briefly reviewed. There are 27 references at the end of the chapter.

 

 

As stated by author of chapter 4, the aim of the chapter is to provide an introduction of population genetics theory that are relevant to current research in molecular evolution. Some of the major predictions of neutral and nearly neutral models are discussed. The classical Wright-Fisher model of population genetics and their relationships to neutral theory of molecular evolution are mentioned. Ancestral polymorphism and neutral molecular evolution including average pairwise distance and lineage sorting are reviewed. Natural selection is discussed and it is demonstrated how comparing the rate of substitution of a putatively selected class of mutations to a neutrally evolving class can be used to infer the signature of natural selection from sequence data. The effects of linkage and selection on rates of molecular evolution are also discussed. There is an extensive list of 126 references at the end of the chapter.

 

 

 

Chapter 5 is the first chapter of part II of the book. Maximum likelihood (ML) methods for detecting adaptive protein evolution are discussed in this chapter. Phylogenetic Analysis by Maximum Likelihood (PALM) which is a package of programs for analysis of DNA or protein sequences by using ML methods is mentioned. MLE of selective pressure for pairs of sequences, focusing on Markov model of codon evolution and MLE for  (the ratio of Nonsynonymous mutation to synonymous one) are discussed. Phulogenetic estimation of selective pressure concerning likelihood calculation for multiple sequence on a phylogeny, modeling variable selective pressure among lineage and among sites are considered. Testing statistical hypotheses about the nature of selective pressure by using likelihood ratio test (LRT) is also reviewed. There are 43 references at the end of the chapter for further reading.

 

 

 

Authors of chapter 6 provide a detailed overview of the basic features and use of the HyPhy system. This is a high-level programming language designed for implementation of statistical methods used in molecular evolution. The focus of the chapter is to describe various features of HyPhy. However, as noted by authors, some features of the package such as model editor for describing new stochastic models to be used in analyses and the graphical user interface providing a mechanism to define arbitrary constraints among parameters for construction of LRT have not been mentioned. The chapter ends up with 14 references.

 

 

 

There are four sections and three appendices for chapter 7 concentrating on Bayesian estimation of evolutionary parameters. The program MrBayes has been mentioned to indicate how one might investigate the important questions in a Bayesian framework.

 

 

Three uses of Bayesian methods in molecular evolution, namely; phylogeny estimation, analysis of complex data and estimating divergence times are discussed. The efficiency of the Bayesian MCMC methodology in addressing complex model is also mentioned. There are 76 references at the end of this chapter.

 

 

 

Chapter 8 is on estimation of divergence times from molecular sequence data. The authors of this chapter have presented a good review on divergence-time estimation methods including Bayesian ones developed by themselves. Branch lengths as products of rates and times, classical molecular clock, uncertainties in the estimated divergence times, multigene analyses and stochasticity in rates of evolution over time are discussed. 53 references are also presented for further reading.

 

 

Part III begins with chapter 9 which is on Markov models of protein sequence of evolution. The focus is on models that can be used for inferring the evolutionary history of related proteins (the phylogenetic tree) and determining the physicochemical factors that have been important to the function and evolution of a protein family. The models are assumed to treat evolution as a Markov chain with transitions between amino acid states. Estimation of parameters of the models based on MLEs and counts, modeling heterogeneity across sites and over time, and modeling correlated evolution between sites are discussed. The chapter ends up with a list of 79 references.

 

 

The small repeated patterns of DNA, known as microsatellites are used in different areas of genetic studies. Models of microsatellite evolution and statistical inferences based on these models are discussed in chapter 10. Akaike information criterion (AIC) as one of the statistical tools to compute a score for a model is briefly mentioned. There are 58 references for this rather short chapter.

 

 

 

Chapter 11 is on genome rearrangement. The inversion distance between chromosomes or the smallest number of inversions needed to transform one chromosome into another and reciprocal translocations between chromosomes or chromosomal fission and fusions are discussed. These are models for analyzing whole-genome evolution using the aforementioned rearrangements. Some examples are presented. The chapter ends up with a list of 22 references.

 

 

 

Phylogenetic hidden Markov models together with some examples are presented in chapter 12. The authors of this chapter discuss how hidden Markov models (HMMs), phylogenetic models and phylo-HMMs all can b considered as special cases of general "graphical models". Formal definition of phylo-HMM, higher-order Markov models for

 

 

 

emission and a brief introduction to graphical models are discussed. There are 53 references at the end of the chapter.

 

 

Further methods of inference in molecular evolution are presented in Part IV which begins with chapter 13. This is on the evolutionary causes and consequences of base composition variation. This chapter relates the Markov models of molecular evolution to the population genetics in the context of variation of nucleotides composition among species. Empirical patterns of base composition variation, both within and among

genomes are presented. The impact of ignoring base composition variation on estimating evolutionary divergence between DNA sequences is considered. There are 57 references for further reading at the end of the chapter.

 

 

 

Recent progress, new application and challenges of statistical alignment are in the title of chapter 14. Pairwise and multiple statistical alignment together with related algorithms and their relationships to evolutionary models and trees are discussed. Multiple hidden Markov models (HMMs), multiple forward-backward and multiple Viterbi algorithms are briefly reviewed. A continuous-time evolutionary for sequence insertions and deletion and substitutions proposed by Thorne, Kishino and Felsenstein (1991) is also reviewed.

47 references are presented at the end of the chapter.

 

 

 

In chapter 15, estimating substitution matrices for use in alignment problem is discussed. Markov substitution processes, PAM (Accepted Point Mutations) matrices, BLOSUM (BLOck Substitution Matrix) matrices, and DNA substitution matrices are explained. Comparisons of  methods of estimating substitution matrices are presented. There are 38 references at the end of chapter 15.

 

 

 

Chapter 16 is on posterior mapping and posterior predictive distributions. This chapter discusses issues related to Bayesian approach to statistical inference. The recent methods for estimating the history of mutations from DNA sequence data reviewed. Posterior mapping as a powerful method to address questions in which detailed data is required, is mentioned. To address the questions on hypothesis testing, the author presents a good amount of details on predictive distributions. There are 51 references at the end of chapter for further reading.   

 

 

The last chapter of the book, chapter 17, discusses the methods for assessing the uncertainty in phylogenetics inference. Classical or frequentists methods together with Bayesian ones are reviewed. Estimating tree topology and substitution process of molecular sequence known as model selection is discussed. Several measures of confidence used in phylogenetics including Bayesian posterior probabilities, bootstrap probabilities and the approximately unbiased test are mentioned. Testing methods, including parametric bootstrap, the Kishino-Hasegawa, multiple-comparisons tests are reviewed. There is a list of 57 references at the end of the chapter.

 

 

 

In conclusion, this book is a valuable source book for comprehensive reviews of current developments in statistical methods in molecular evolution. It is very readable and provides a wealth of references and pointers for the more interested readers. This very well presented book achieves its aims of being a reference book, written at an introductory level. I recommend it for library purchase for any statistician intending to work in the area of molecular evolution and for any biologist who seeks to learn more on statistical methods useful in the field. 

 

 

 

 

Hamid Pezeshk (D.Phil)

 

Associate Professor of Statistics

 

Center of Excellence in Biomathematics and

 

School of Mathematics, Statistics, and Computer Science

 

University College of Science

 

University of Tehran

 

and

 

Bioinformatics Group

 

School of Computer Science

 

Institute for Studies in Theoretical Physics and Mathematics (IPM)

 

Back to Home!