Top Qs
Timeline
Chat
Perspective
Maximal entropy random walk
From Wikipedia, the free encyclopedia
Remove ads
A maximal entropy random walk (MERW) is a popular type of biased random walk on a graph, in which transition probabilities are chosen accordingly to the principle of maximum entropy, which says that the probability distribution which best represents the current state of knowledge is the one with largest entropy. While a standard random walk samples for every vertex a uniform probability distribution of outgoing edges, locally maximizing entropy rate, MERW maximizes it globally (average entropy production) by sampling a uniform probability distribution among all paths in a given graph.
| This article needs additional citations for verification.  (May 2025) | 
MERW is used in various fields of science. A direct application is choosing probabilities to maximize transmission rate through a constrained channel, analogously to Fibonacci coding. Its properties also made it useful for example in analysis of complex networks,[1] like link prediction,[2] community detection,[3] robust transport over networks[4] and centrality measures.[5] It is also used in image analysis, for example for detecting visual saliency regions,[6] object localization,[7] tampering detection[8] or tractography problem.[9]
Additionally, it recreates some properties of quantum mechanics, suggesting a way to repair the discrepancy between diffusion models and quantum predictions, like Anderson localization.[10]
Remove ads
Basic model
Summarize
Perspective

Right: example of their evolution on the same inhomogeneous 2D lattice with cyclic boundary conditions – probability density after 10, 100 and 1000 steps while starting from the same vertex. The small boxes represent defects: all vertices but the marked ones have additional self-loop (edge to itself). For regular lattices (no defects), GRW and MERW are identical. While defects do not strongly affect the local behavior, they lead to a completely different global stationary probability here. While GRW (and based on it standard diffusion) leads to nearly uniform stationary density, MERW has strong localization property, imprisoning the walkers in entropic wells in analogy to electrons in defected lattice of semi-conductor.
Consider a graph with vertices, defined by an adjacency matrix : if there is an edge from vertex to , 0 otherwise. For simplicity, assume it is an undirected graph, which corresponds to a symmetric ; however, MERWs can also be generalized for directed and weighted graphs (for example Boltzmann distribution among paths instead of uniform).
We would like to choose a random walk as a Markov process on this graph: for every vertex and its outgoing edge to , choose probability of the walker randomly using this edge after visiting . Formally, find a stochastic matrix (containing the transition probabilities of a Markov chain) such that
- for all and
- for all .
Assuming this graph is connected and not periodic, ergodic theory says that evolution of this stochastic process leads to some stationary probability distribution such that .
Using Shannon entropy for every vertex and averaging over probability of visiting this vertex (to be able to use its entropy), we get the following formula for average entropy production (entropy rate) of the stochastic process:
This definition turns out to be equivalent to the asymptotic average entropy (per length) of the probability distribution in the space of paths for this stochastic process.
In the standard random walk, referred to here as generic random walk (GRW), we naturally choose that each outgoing edge is equally probable: For a symmetric it leads to a stationary probability distribution with It locally maximizes entropy production (uncertainty) for every vertex, but usually leads to a suboptimal averaged global entropy rate .
MERW chooses the stochastic matrix which maximizes , or equivalently assumes uniform probability distribution among all paths in a given graph. Its formula is obtained by first calculating the dominant eigenvalue and corresponding eigenvector of the adjacency matrix, i.e. the largest with corresponding such that . Then the stochastic matrix and stationary probability distribution are given by for which every possible path of length from the -th to -th vertex has probability Its entropy rate is and the stationary probability distribution is
In contrast to GRW, the MERW transition probabilities generally depend on the structure of the entire graph, making it nonlocal. Hence, they should not be imagined as directly applied by the walker – if random-looking decisions are made based on the local situation, like a person would make, the GRW approach is more appropriate. MERW is based on the principle of maximum entropy, making it the safest assumption when we do not have any additional knowledge about the system. For example, it would be appropriate for modelling our knowledge about an object performing some complex dynamics – not necessarily random, like a particle.
Sketch of derivation
Assume for simplicity that the considered graph is undirected, connected and aperiodic, allowing to conclude from the Perron–Frobenius theorem that the dominant eigenvector is unique. Hence can be asymptotically () approximated by (or in bra–ket notation).
MERW requires a uniform distribution along paths. The number of paths with length and vertex in the center is hence for all ,
Analogously calculating probability distribution for two succeeding vertices, one obtains that the probability of being at the -th vertex and next at the -th vertex is Dividing by the probability of being at the -th vertex, i.e. , gives for the conditional probability of the -th vertex being next after the -th vertex
Weighted MERW: Boltzmann path ensemble
We have assumed that , yielding a MERW corresponding to the uniform ensemble among paths. However, the above derivation works for any real nonnegative for which the Perron-Frobenius theorem applies. Given , the probability of a particular length- path is as follows: which is the same as the Boltzmann distribution of paths with energy defined as the sum of over the edges of the path. For example, this can be used with the transfer matrix to calculate the probability distribution of patterns in the Ising model.
Remove ads
Examples
Summarize
Perspective

Let us first look at a simple nontrivial situation: Fibonacci coding, where we want to transmit a message as a sequence of 0s and 1s, but not using two successive 1s: after a 1 there has to be a 0. To maximize the amount of information transmitted in such sequence, we should assume a uniform probability distribution in the space of all possible sequences fulfilling this constraint.
To practically use such long sequences, after 1 we have to use 0, but there remains the freedom of choosing the probability of 0 after 0. Let us denote this probability . Entropy coding allows encoding a message using this chosen probability distribution. The stationary probability distribution of symbols for a given turns out to be . Hence, entropy produced is , which is maximized for , known as the golden ratio. In contrast, a standard random walk would choose the suboptimal . While choosing a larger reduces the amount of information produced after 0, it also reduces the frequency of 1, after which we cannot write any information.
A more complex case is the defected one-dimensional cyclic lattice, for example, a ring with 1000 connected nodes, for which all nodes but the defects have a self-loop (edge to itself). In a standard random walk (GRW), the stationary probability distribution would have the defect probability be 2/3 of probability of the non-defect vertices – there is nearly no localization, also analogously for standard diffusion, which is the infinitesimal limit of a GRW. For a MERW, we have to first find the dominant eigenvector of the adjacency matrix – maximizing in:
for all positions , where for defects, 0 otherwise. Substituting and multiplying the equation by −1 we get:
where is minimized now, becoming the analog of energy. The formula inside the bracket is discrete Laplace operator, making this equation a discrete analogue of the stationary Schrödinger equation. As in quantum mechanics, MERWs predict that the probability distribution is that of the quantum ground state: with its strongly localized density (in contrast to standard diffusion). Taking the infinitesimal limit, we can get the standard continuous stationary (time-independent) Schrödinger equation ( for ) here.[11]
Remove ads
See also
References
External links
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads
