Loading AI tools
Probability distribution From Wikipedia, the free encyclopedia
In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:
Probability mass function  
Cumulative distribution function  
Parameters  success probability (real)  success probability (real)  

Support  k trials where  k failures where  
PMF  
CDF 
for , for 
for , for  
Mean  
Median 

 
Mode  
Variance  
Skewness  
Excess kurtosis  
Entropy  
MGF 
for 
for  
CF  
PGF  
Fisher information 
These two different geometric distributions should not be confused with each other. Often, the name shifted geometric distribution is adopted for the former one (distribution of ); however, to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the support explicitly.
The geometric distribution gives the probability that the first occurrence of success requires independent trials, each with success probability . If the probability of success on each trial is , then the probability that the th trial is the first success is
for
The above form of the geometric distribution is used for modeling the number of trials up to and including the first success. By contrast, the following form of the geometric distribution is used for modeling the number of failures until the first success:
for
The geometric distribution gets its name because its probabilities follow a geometric sequence. It is sometimes called the Furry distribution after Wendell H. Furry.^{[1]}^{: 210 }
The geometric distribution is the discrete probability distribution that describes when the first success in an infinite sequence of independent and identically distributed Bernoulli trials occurs. Its probability mass function depends on its parameterization and support. When supported on , the probability mass function iswhere is the number of trials and is the probability of success in each trial.^{[2]}^{: 260–261 }
The support may also be , defining . This alters the probability mass function intowhere is the number of failures before the first success.^{[3]}^{: 66 }
An alternative parameterization of the distribution gives the probability mass functionwhere and .^{[1]}^{: 208–209 }
An example of a geometric distribution arises from rolling a sixsided die until a "1" appears. Each roll is independent with a chance of success. The number of rolls needed follows a geometric distribution with .
The geometric distribution is the only memoryless discrete probability distribution.^{[4]} It is the discrete version of the same property found in the exponential distribution.^{[1]}^{: 228 } The property asserts that the number of previously failed trials does not affect the number of future trials needed for a success.
Because there are two definitions of the geometric distribution, there are also two definitions of memorylessness for discrete random variables.^{[5]} Expressed in terms of conditional probability, the two definitions are
and
where and are natural numbers, is a geometrically distributed random variable defined over , and is a geometrically distributed random variable defined over . Note that these definitions are not equivalent for discrete random variables; does not satisfy the first equation and does not satisfy the second.
The expected value and variance of a geometrically distributed random variable defined over is^{[2]}^{: 261 } With a geometrically distributed random variable defined over , the expected value changes intowhile the variance stays the same.^{[6]}^{: 114–115 }
For example, when rolling a sixsided die until landing on a "1", the average number of rolls needed is and the average number of failures is .
The moment generating function of the geometric distribution when defined over and respectively is^{[7]}^{[6]}^{: 114 }The moments for the number of failures before the first success are given by
where is the polylogarithm function.^{[8]}
The cumulant generating function of the geometric distribution defined over is^{[1]}^{: 216 } The cumulants satisfy the recursionwhere , when defined over .^{[1]}^{: 216 }
Consider the expected value of X as above, i.e. the average number of trials until a success. On the first trial, we either succeed with probability , or we fail with probability . If we fail the remaining mean number of trials until a success is identical to the original mean. This follows from the fact that all trials are independent. From this we get the formula:
which, if solved for , gives:^{[citation needed]}
The expected number of failures can be found from the linearity of expectation, . It can also be shown in the following way:^{[citation needed]}
The interchange of summation and differentiation is justified by the fact that convergent power series converge uniformly on compact subsets of the set of points where they converge.
The mean of the geometric distribution is its expected value which is, as previously discussed in § Moments and cumulants, or when defined over or respectively.
The median of the geometric distribution is when defined over ^{[9]} and when defined over .^{[3]}^{: 69 }
The mode of the geometric distribution is the first value in the support set. This is 1 when defined over and 0 when defined over .^{[3]}^{: 69 }
The skewness of the geometric distribution is .^{[6]}^{: 115 }
The kurtosis of the geometric distribution is .^{[6]}^{: 115 } The excess kurtosis of a distribution is the difference between its kurtosis and the kurtosis of a normal distribution, .^{[10]}^{: 217 } Therefore, the excess kurtosis of the geometric distribution is . Since , the excess kurtosis is always positive so the distribution is leptokurtic.^{[3]}^{: 69 } In other words, the tail of a geometric distribution decays faster than a Gaussian.^{[10]}^{: 217 }
Entropy is a measure of uncertainty in a probability distribution. For the geometric distribution that models the number of failures before the first success, the probability mass function is:
The entropy for this distribution is defined as:
The entropy increases as the probability decreases, reflecting greater uncertainty as success becomes rarer.
Fisher information measures the amount of information that an observable random variable carries about an unknown parameter . For the geometric distribution (failures before the first success), the Fisher information with respect to is given by:
Proof:
Fisher information increases as decreases, indicating that rarer successes provide more information about the parameter .
For the geometric distribution modeling the number of trials until the first success, the probability mass function is:
The entropy for this distribution is given by:
Entropy increases as decreases, reflecting greater uncertainty as the probability of success in each trial becomes smaller.
Fisher information for the geometric distribution modeling the number of trials until the first success is given by:
Proof:
The true parameter of an unknown geometric distribution can be inferred through estimators and conjugate distributions.
Provided they exist, the first moments of a probability distribution can be estimated from a sample using the formulawhere is the th sample moment and .^{[16]}^{: 349–350 } Estimating with gives the sample mean, denoted . Substituting this estimate in the formula for the expected value of a geometric distribution and solving for gives the estimators and when supported on and respectively. These estimators are biased since as a result of Jensen's inequality.^{[17]}^{: 53–54 }
The maximum likelihood estimator of is the value that maximizes the likelihood function given a sample.^{[16]}^{: 308 } By finding the zero of the derivative of the loglikelihood function when the distribution is defined over , the maximum likelihood estimator can be found to be , where is the sample mean.^{[18]} If the domain is , then the estimator shifts to . As previously discussed in § Method of moments, these estimators are biased.
Regardless of the domain, the bias is equal to
which yields the biascorrected maximum likelihood estimator,^{[citation needed]}
In Bayesian inference, the parameter is a random variable from a prior distribution with a posterior distribution calculated using Bayes' theorem after observing samples.^{[17]}^{: 167 } If a beta distribution is chosen as the prior distribution, then the posterior will also be a beta distribution and it is called the conjugate distribution. In particular, if a prior is selected, then the posterior, after observing samples , is^{[19]}Alternatively, if the samples are in , the posterior distribution is^{[20]}Since the expected value of a distribution is ,^{[11]}^{: 145 } as and approach zero, the posterior mean approaches its maximum likelihood estimate.
The geometric distribution can be generated experimentally from i.i.d. standard uniform random variables by finding the first such random variable to be less than or equal to . However, the number of random variables needed is also geometrically distributed and the algorithm slows as decreases.^{[21]}^{: 498 }
Random generation can be done in constant time by truncating exponential random numbers. An exponential random variable can become geometrically distributed with parameter through . In turn, can be generated from a standard uniform random variable altering the formula into .^{[21]}^{: 499–500 }^{[22]}
The geometric distribution is used in many disciplines. In queueing theory, the M/M/1 queue has a steady state following a geometric distribution.^{[23]} In stochastic processes, the Yule Furry process is geometrically distributed.^{[24]} The distribution also arises when modeling the lifetime of a device in discrete contexts.^{[25]} It has also been used to fit data including modeling patients spreading COVID19.^{[26]}