next up previous contents
Next: Nucleotide substitution models implemented Up: Nucleotide substitution models Previous: A Markov model of   Contents


Transition matrices

The mathematical expression of a DNA Markov model uses a matrix $Q$ of substitution rates in which each element $r_{ij}$ represents the rate of substitution from nucleotide $i$ to nucleotide $j$. The diagonal elements of the instantaneous rate matrix must satisfy the equation
\begin{displaymath}
r_{ii} = - \sum_{j \neq i} r_{ij}
\end{displaymath} (1)

so that each row of $Q$ sums to zero. The process must be homogeneous and stationary; if $\pi_A$, $\pi_C$, $\pi_G$ and $\pi_T$ are the four equilibrium bases frequencies then the rates must obey the following constraint:
\begin{displaymath}
\pi_i r_{ij} = \pi_j r_{ji} \quad \forall i,j
\end{displaymath} (2)

also known as the time-reversibility constraint. To enforce this constraint we define ${\alpha_{ij}}$ so that
\begin{displaymath}
Q_{ij} = r_{ij} = m_r \times \pi_j \alpha_{ij} \quad \forall i, \forall j\neq i
\end{displaymath} (3)

where $m_r$ is a constant factor described later. The time-reversibility condition is satisfied with a symmetric choice of $\alpha_{ij}$. In practice, PHASE uses one of these $\alpha_{ij}$ parameters as a reference and sets its value to $1.0$. Depending on the model, other parameters (we call them rate ratios) are fixed or inferred during an analysis.

With $Q$ we can compute the transition probability matrix over time[*] $t$.

\begin{eqnarray*}
\frac{dP(t)}{dt} & = & P(t) \times Q\\
P(t) & = & \exp(Qt)\\
& = & \exp(\{\pi_j \alpha_{ij}\} \times m_r \times t)
\end{eqnarray*}

The transition probability matrix $P(t)=\{p_{ij}(t)\}$ is used to compute the probability that nucleotide $i$ will be nucleotide $j$ after time $t$ ($i$ can be equal to $j$). The ``rate ratios'' matrix in PHASE refers to the matrix ${\alpha_{ij}}$ and the ``transition rates'' matrix refers to $Q$.

Inference methods used do not permit the separation of $mr$, a factor proportional to the average substitution rate of the model, and $t$, branch lengths of the evolutionary tree which reflect an amount of change. The longer the branch, the bigger the evolutionary distance between its two incident nodes. We have to impose a scaling on the branch length. In practice, we fix the average rate of substitutions of our model to be one per ``unit of time''. This is done by adding a constraint for the factor $m_r$.

\begin{displaymath}
m_r \times \sum_{i=1}^{nb_{states}} \sum_{j \neq i} \pi_i r_{ij} = 1.0
\end{displaymath} (4)

This last constraint does not hold when multiple substitution models are used simultaneously in the MIXED model. The average substitution rate of the first model is still fixed equal to 1.0 but the average substitution rate of other models is now a free parameter.


next up previous contents
Next: Nucleotide substitution models implemented Up: Nucleotide substitution models Previous: A Markov model of   Contents
Gowri-Shankar Vivek 2003-04-24