Variable-order Markov model

Example

Summarize

Perspective

Consider for example a sequence of random variables, each of which takes a value from the ternary alphabet ${a, b, c}$ . Specifically, consider the string constructed from infinite concatenations of the sub-string $aaabc$ : $aaabcaaabcaaabcaaabc\dotsaaabc$ .

In this example, $Pr(c | ab) = Pr(c | b) = 1.0$ ; therefore, the shorter context $b$ is sufficient to determine the next character. Similarly, the VOM model of maximal order 3 can generate the string exactly using only five conditional probability components, which are all equal to 1.0.

In practical settings there is seldom sufficient data to accurately estimate the exponentially increasing number of conditional probability components as the order of the Markov chain increases.

The variable-order Markov model assumes that in realistic settings, there are certain realizations of states (represented by contexts) in which some past states are independent from the future states; accordingly, "a great reduction in the number of model parameters can be achieved."^[1]

Remove ads

Definition

Summarize

Perspective

Let $A$ be a state space (finite alphabet) of size $|A|$ .

Consider a sequence with the Markov property $x_{1}^{n}=x_{1}x_{2}\dots x_{n}$ of $n$ realizations of random variables, where $x_{i}\in A$ is the state (symbol) at position $i$ $\scriptstyle (1\leq i\leq n)$ , and the concatenation of states $x_{i}$ and $x_{i+1}$ is denoted by $x_{i}x_{i+1}$ .

Given a training set of observed states, $x_{1}^{n}$ , the construction algorithm of the VOM models^[3]^[4]^[5] learns a model $P$ that provides a probability assignment for each state in the sequence given its past (previously observed symbols) or future states.

Specifically, the learner generates a conditional probability distribution $P(x_{i}\mid s)$ for a symbol $x_{i}\in A$ given a context $s\in A^{*}$ , where the * sign represents a sequence of states of any length, including the empty context.

VOM models attempt to estimate conditional distributions of the form $P(x_{i}\mid s)$ where the context length $|s|\leq D$ varies depending on the available statistics. In contrast, conventional Markov models attempt to estimate these conditional distributions by assuming a fixed contexts' length $|s|=D$ and, hence, can be considered as special cases of the VOM models.

Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order Markov models that leads to a better variance-bias tradeoff of the learned models.^[3]^[4]^[5]

Remove ads

Application areas

Various efficient algorithms have been devised for estimating the parameters of the VOM model.^[4]

VOM models have been successfully applied to areas such as machine learning, information theory and bioinformatics, including specific applications such as coding and data compression,^[1] document compression,^[4] classification and identification of DNA and protein sequences,^[6] ^[3] statistical process control,^[5] spam filtering,^[7] haplotyping,^[8] speech recognition,^[9] sequence analysis in social sciences,^[2] and others.

Variable-order Markov model

Example

Definition

Application areas

See also

References

Wikiwand - on