Markov chains appear everywhere: they are used as a computational tool within Bayesian statistics, and a theoretical tool in other areas such as optimal control and reinforcement learning. Conditions under which a general Markov chain eventually converges to a stationary distribution are well-studied, and can largely be considered classical results. These are, informally, as follows.
- -irreducibility: the chain can transition from any state into any other state.
- Aperiodicity: the chain’s transition behavior is the same at time as time .
- Harris recurrence: the chain returns to some set infinitely often.
Henceforth, let be the chain, and its stationary distribution of interest. Some care is needed when defining these conditions formally1 for general Markov chains: for example, in a chain defined over an uncountable state space any given point will have probability zero. To formulate irreducibility precisely we need to talk about sets of states by, say, introducing a topology.
Convergence to stationarity in finite time
Proving convergence of a Markov chain is often the first step taken as part of a larger analysis. In fact, this can be done for broad classes of chains—for instance, for Metropolis-Hastings chains with appropriately well-behaved transition proposals.2 This makes stationarity analysis of MCMC algorithms used in Bayesian statistics close to a non-issue.
Unfortunately, from convergence of a Markov chain alone, we can say little about the chain’s distribution after a given number of steps. This makes it interesting to study chains which are geometrically ergodic—such chains converge at a prescribed rate given by
where is the initial distribution, is the chain’s Markov operator, is the current iteration, is the stationary distribution, and are constants, and is the total variation norm over the Banach space of signed measures.
Convergent chains that are not geometrically ergodic are not well-behaved. Such chains converge infinitely slowly, and functionals of their output don’t necessarily satisfy a Central Limit Theorem. In particular, the distribution of can be heavy-tailed, which can cause to be infinite even if is finite. This can be problematic.
Minorizing measures and regeneration
How does one know whether or not a chain is geometrically ergodic? For simplicity, let’s consider a simpler case, where we set be constant with respect to the initial distribution . Such a chain is called uniformly ergodic. This will occur for a chain defined over a state space with compact support, and we comment on the general case once the issues in this setting are clear.
We begin by considering two chains, and with same stationary distribution . We think of as the chain of interest with initial distribution , and as an auxiliary chain. Now, we make two assumptions.
- and share the same random number generator.
- starts from the stationary distribution .
This setup can be visualized via the diagram
where vertical bars indicate shared random numbers, and we’ve used the identically distributed symbol in opposite of its usual order.
Given this setup, if both chains make a draw from the same distribution during their one-step-ahead transitions, we can conclude that and so the chain has converged.
How is this condition ever possible? Suppose that we can write the distribution of  as a mixture of a distribution  with mixture probability  and some other distribution with probability . Then at each time point, with probability , both chains draw from the same distribution. Let’s draw a picture.
Here, both one-step-ahead transition distributions possess an overlapping shaded region. The probability of each chain landing in this region is , and the distribution within that region is . We say that is a minorizing measure.3 It can be shown4 that to prove uniform ergodicity, it suffices to exhibit such a minorizing measure.
For a non-compact state space and arbitrary current states, such a measure can be impossible to find. However, by virtue of convergence, our Markov chain should eventually spend most of its time in a compact state space. One can make this intuition precise and develop techniques for proving geometric ergodicity, by introducing a suitable Lyapunov drift condition1 which, once satisfied, largely reduces the analysis to the preceding case.
Once one has the existence of a minorizing measure, there will be a set of random times at which the chain will draw from the minorizing measure and forget its current location. This allows the analysis of the averages to be reduced, in some sense, to the IID case. Following this line of thought, one can eventually prove a Central Limit Theorem for the ergodic averages, even though successive are not independent. The study of techniques originating from this observation is called regeneration theory.
Concluding remarks
Here, we examined one technique by which it’s possible to prove that a Markov chain converges quickly. The analysis is general and provides insights of practical interest in Bayesian models. For instance, if using a Gibbs sampler for a hierarchical model, one can examine the full conditionals to see whether or not a minorization condition is present. Even if the minorization constant is not calculated, this gives some idea as to the chain’s numerical performance before ever running it. In a practical application, this can be useful.
Other techniques for analyzing the mixing rate of a Markov chain are also available. In particular, reversible chains possess Markov operators which are self-adjoint: this allows one to study their spectral properties, and relate them to mixing times.5 I hope this article has provided a useful overview as to the need to consider finite-time mixing properties, and illustrated the key idea behind minorization techniques.
References
S. Meyn and R. Tweedie. Markov Chains and Stochastic Stability. 1993.
G. Roberts and J. Rosenthal. General state space Markov chains and MCMC algorithms. Probability Surveys, 2004.
Rather than working with a pair where and is a probability measure, most technical papers equivalently work with a single finite measure. We use here because it is intuitive and lends well to visualization.
J. Rosenthal. Minorization conditions and convergence rates for Markov chain Monte Carlo. JASA 90(430). 1995.
G. Roberts and J. Rosenthal. Geometric ergodicity and hybrid Markov chains. ECP 2(2). 1997.