Quantile regression

Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares estimates the conditional mean of the response variable across values of the predictor variables, quantile regression estimates the conditional median (or other quantiles) of the response variable. [There is also a method for predicting the conditional geometric mean of the response variable, ^[1].] Quantile regression is an extension of linear regression used when the conditions of linear regression are not met.

Remove ads

Advantages and applications

One advantage of quantile regression relative to ordinary least squares regression is that the quantile regression estimates are more robust against outliers in the response measurements. However, the main attraction of quantile regression goes beyond this and is advantageous when conditional quantile functions are of interest. Different measures of central tendency and statistical dispersion can be used to more comprehensively analyze the relationship between variables.^[2]

In ecology, quantile regression has been proposed and used as a way to discover more useful predictive relationships between variables in cases where there is no relationship or only a weak relationship between the means of such variables. The need for and success of quantile regression in ecology has been attributed to the complexity of interactions between different factors leading to data with unequal variation of one variable for different ranges of another variable.^[3]

Another application of quantile regression is in the areas of growth charts, where percentile curves are commonly used to screen for abnormal growth.^[4]^[5]

Remove ads

History

Summarize

Perspective

The idea of estimating a median regression slope, a major theorem about minimizing sum of the absolute deviances and a geometrical algorithm for constructing median regression was proposed in 1760 by Ruđer Josip Bošković, a Jesuit Catholic priest from Dubrovnik.^[2]^: 4^[6] He was interested in the ellipticity of the earth, building on Isaac Newton's suggestion that its rotation could cause it to bulge at the equator with a corresponding flattening at the poles.^[7] He finally produced the first geometric procedure for determining the equator of a rotating planet from three observations of a surface feature. More importantly for quantile regression, he was able to develop the first evidence of the least absolute criterion and preceded the least squares introduced by Legendre in 1805 by fifty years.^[8]

Other thinkers began building upon Bošković's idea such as Pierre-Simon Laplace, who developed the so-called "methode de situation." This led to Francis Edgeworth's plural median^[9] - a geometric approach to median regression - and is recognized as the precursor of the simplex method.^[8] The works of Bošković, Laplace, and Edgeworth were recognized as a prelude to Roger Koenker's contributions to quantile regression.

Median regression computations for larger data sets are quite tedious compared to the least squares method, for which reason it has historically generated a lack of popularity among statisticians, until the widespread adoption of computers in the latter part of the 20th century.

Remove ads

Background: quantiles

Summarize

Perspective

Quantile regression expresses the conditional quantiles of a dependent variable as a linear function of the explanatory variables. Crucial to the practicality of quantile regression is that the quantiles can be expressed as the solution of a minimization problem, as we will show in this section before discussing conditional quantiles in the next section.

Quantile of a random variable

Let $Y$ be a real-valued random variable with cumulative distribution function $F_{Y}(y)=P(Y\leq y)$ . The $\tau$ th quantile of Y is given by

q_{Y}(\tau )=F_{Y}^{-1}(\tau )=\inf \left\{y:F_{Y}(y)\geq \tau \right\}

where $0<\tau <1$ .

Define the loss function as

\rho _{\tau }(u)=u(\tau -\mathbb {I} _{(u<0)})={\begin{cases}(\tau -1)u,&{\text{if }}u<0,\\\tau u,&{\text{if }}u\geq 0,\end{cases}}

where $0<\tau <1$ , $\mathbb {I} (\cdot )$ is an indicator function. It is observed that $\tau >(1-\tau )$ for $\tau >0.5$ , and $\tau <(1-\tau )$ for $\tau <0.5$ . The intuition is that for higher quantiles ( $\tau >0.5$ ), we penalize the positive residuals more than negative residuals, and vice versa, and the loss is unsymmetric. However, for $\tau =0.5$ the penalization is symmetric (hence, this results in a median estimator). A specific quantile can be found by minimizing the expected loss of $Y-u$ with respect to $u$ :^[2](pp. 5–6):

{\begin{array}{lll}q_{Y}(\tau )&=&{\underset {u}{\mbox{arg min}}}\mathbb {E} (\rho _{\tau }(Y-u))\\&=&{\underset {u}{\mbox{arg min}}}\int _{-\infty }^{+\infty }\rho _{\tau }(y-u)dF_{Y}(y)\\&=&{\underset {u}{\mbox{arg min}}}{\biggl \{}(\tau -1)\int _{-\infty }^{u}(y-u)dF_{Y}(y)+\tau \int _{u}^{\infty }(y-u)dF_{Y}(y){\biggr \}}.\end{array}}

This can be shown by computing the derivative of the expected loss with respect to $u$ via an application of the Leibniz integral rule, setting it to 0, and letting $q_{\tau }$ be the solution of

0=(1-\tau )\int _{-\infty }^{q_{\tau }}dF_{Y}(y)-\tau \int _{q_{\tau }}^{\infty }dF_{Y}(y).

This equation reduces to

0=F_{Y}(q_{\tau })-\tau ,

and then to

F_{Y}(q_{\tau })=\tau .

If the solution $q_{\tau }$ is not unique, then we have to take the smallest such solution to obtain the $\tau$ th quantile of the random variable Y.

Example

Let $Y$ be a discrete random variable that takes values $y_{i}=i$ with $i=1,2,\dots ,9$ with equal probabilities. The task is to find the median of Y, and hence the value $\tau =0.5$ is chosen. Then the expected loss of $Y-u$ is

{\begin{array}{lll}L(u)&=&E(\rho _{\tau }(Y-u))={\frac {(\tau -1)}{9}}\sum _{y_{i}<u}(y_{i}-u)+{\frac {\tau }{9}}\sum _{y_{i}\geq u}(y_{i}-u)\\&=&{\frac {0.5}{9}}{\Bigl (}-\sum _{y_{i}<u}(y_{i}-u)+\sum _{y_{i}\geq u}(y_{i}-u){\Bigr )}.\end{array}}

Since ${0.5/9}$ is a constant, it can be taken out of the expected loss function (this is only true if $\tau =0.5$ ). Then, at u=3,

L(3)\propto \sum _{i=1}^{2}

-(i-3)

+\sum _{i=3}^{9}

(i-3)

=[(2+1)+(0+1+2+...+6)]=24.

Suppose that u is increased by 1 unit. Then the expected loss will be changed by $(3)-(6)=-3$ on changing u to 4. If, u=5, the expected loss is

L(5)\propto \sum _{i=1}^{4}i+\sum _{i=0}^{4}i=20,

and any change in u will increase the expected loss. Thus u=5 is the median. The Table below shows the expected loss (divided by ${0.5/9}$ ) for different values of u.

u	1	2	3	4	5	6	7	8	9
Expected loss	36	29	24	21	20	21	24	29	36

Intuition

Consider $\tau =0.5$ and let q be an initial guess for $q_{\tau }$ . The expected loss evaluated at q is

L(q)=-0.5\int _{-\infty }^{q}(y-q)dF_{Y}(y)+0.5\int _{q}^{\infty }(y-q)dF_{Y}(y).

In order to minimize the expected loss, we move the value of q a little bit to see whether the expected loss will rise or fall. Suppose we increase q by 1 unit. Then the change of expected loss would be

\int _{-\infty }^{q}1dF_{Y}(y)-\int _{q}^{\infty }1dF_{Y}(y).

The first term of the equation is $F_{Y}(q)$ and second term of the equation is $1-F_{Y}(q)$ . Therefore, the change of expected loss function is negative if and only if $F_{Y}(q)<0.5$ , that is if and only if q is smaller than the median. Similarly, if we reduce q by 1 unit, the change of expected loss function is negative if and only if q is larger than the median.

In order to minimize the expected loss function, we would increase (decrease) q if q is smaller (larger) than the median, until q reaches the median. The idea behind the minimization is to count the number of points (weighted with the density) that are larger or smaller than q and then move q to a point where q is larger than $100\tau$ % of the points.

Sample quantile

The $\tau$ sample quantile can be obtained by using an importance sampling estimate and solving the following minimization problem

{\hat {q}}_{\tau }={\underset {q\in \mathbb {R} }{\mbox{arg min}}}\sum _{i=1}^{n}\rho _{\tau }(y_{i}-q),

={\underset {q\in \mathbb {R} }{\mbox{arg min}}}\left[(\tau -1)\sum _{y_{i}<q}(y_{i}-q)+\tau \sum _{y_{i}\geq q}(y_{i}-q)\right]

where the function $\rho _{\tau }$ is the tilted absolute value function. The intuition is the same as for the population quantile.

Remove ads

Conditional quantile and quantile regression

The $\tau$ th conditional quantile of $Y$ given $X$ is the $\tau$ th quantile of the Conditional probability distribution of $Y$ given $X$ ,

Q_{Y|X}(\tau )=\inf \left\{y:F_{Y|X}(y)\geq \tau \right\}

We use a capital $Q$ to denote the conditional quantile to indicate that it is a random variable.

In quantile regression for the $\tau$ th quantile we make the assumption that the $\tau$ th conditional quantile is given as a linear function of the explanatory variables:

Q_{Y|X}(\tau )=X\beta _{\tau }

Given the distribution function of $Y$ , $\beta _{\tau }$ can be obtained by solving

\beta _{\tau }={\underset {\beta \in \mathbb {R} ^{k}}{\mbox{arg min}}}E(\rho _{\tau }(Y-X\beta )).

Solving the sample analog gives the estimator of $\beta$ .

{\hat {\beta _{\tau }}}={\underset {\beta \in \mathbb {R} ^{k}}{\mbox{arg min}}}\sum _{i=1}^{n}(\rho _{\tau }(Y_{i}-X_{i}\beta )).

Note that when $\tau =0.5$ , the loss function $\rho _{\tau }$ is proportional to the absolute value function, and thus median regression is the same as linear regression by least absolute deviations.

Remove ads

Computation of estimates for regression parameters

Summarize

Perspective

The mathematical forms arising from quantile regression are distinct from those arising in the method of least squares. The method of least squares leads to a consideration of problems in an inner product space, involving projection onto subspaces, and thus the problem of minimizing the squared errors can be reduced to a problem in numerical linear algebra. Quantile regression does not have this structure, and instead the minimization problem can be reformulated as a linear programming problem

{\underset {\beta ,u^{+},u^{-}\in \mathbb {R} ^{k}\times \mathbb {R} _{+}^{2n}}{\min }}\left\{\tau 1_{n}^{'}u^{+}+(1-\tau )1_{n}^{'}u^{-}|X\beta +u^{+}-u^{-}=Y\right\},

where

u_{j}^{+}=\max(u_{j},0)

u_{j}^{-}=-\min(u_{j},0).

Simplex methods^[2]^: 181 or interior point methods^[2]^: 190 can be applied to solve the linear programming problem.

Remove ads

Asymptotic properties

Summarize

Perspective

For $\tau \in (0,1)$ , under some regularity conditions, ${\hat {\beta }}_{\tau }$ is asymptotically normal:

{\sqrt {n}}({\hat {\beta }}_{\tau }-\beta _{\tau }){\overset {d}{\rightarrow }}N(0,\tau (1-\tau )D^{-1}\Omega _{x}D^{-1}),

where

D=E(f_{Y}(X\beta )XX^{\prime })

and

\Omega _{x}=E(X^{\prime }X).

Direct estimation of the asymptotic variance-covariance matrix is not always satisfactory. Inference for quantile regression parameters can be made with the regression rank-score tests or with the bootstrap methods.^[10]

Remove ads

Equivariance

Summarize

Perspective

See invariant estimator for background on invariance or see equivariance.

Scale equivariance

For any $a>0$ and $\tau \in [0,1]$

{\hat {\beta }}(\tau ;aY,X)=a{\hat {\beta }}(\tau ;Y,X),

{\displaystyle {\hat {\beta }}(\tau

Shift equivariance

For any $\gamma \in R^{k}$ and $\tau \in [0,1]$

{\hat {\beta }}(\tau ;Y+X\gamma ,X)={\hat {\beta }}(\tau ;Y,X)+\gamma .

Equivariance to reparameterization of design

Let $A$ be any $p\times p$ nonsingular matrix and $\tau \in [0,1]$

{\hat {\beta }}(\tau ;Y,XA)=A^{-1}{\hat {\beta }}(\tau ;Y,X).

Invariance to monotone transformations

If $h$ is a nondecreasing function on $\mathbb {R}$ , the following invariance property applies:

h(Q_{Y|X}(\tau ))\equiv Q_{h(Y)|X}(\tau ).

Example (1):

If $W=\exp(Y)$ and $Q_{Y|X}(\tau )=X\beta _{\tau }$ , then $Q_{W|X}(\tau )=\exp(X\beta _{\tau })$ . The mean regression does not have the same property since $\operatorname {E} (\ln(Y))\neq \ln(\operatorname {E} (Y)).$

Remove ads

Inference

Interpretation of the slope parameters

The linear model $Q_{Y|X}(\tau )=X\beta _{\tau }$ mis-specifies the true systematic relation $Q_{Y|X}(\tau )=f(X,\tau )$ when $f(\cdot ,\tau )$ is nonlinear. However, $Q_{Y|X}(\tau )=X\beta _{\tau }$ minimizes a weighted distanced to $f(X,\tau )$ among linear models.^[11] Furthermore, the slope parameters $\beta _{\tau }$ of the linear model can be interpreted as weighted averages of the derivatives $\nabla f(X,\tau )$ so that $\beta _{\tau }$ can be used for causal inference.^[12] Specifically, the hypothesis $H_{0}:\nabla f(x,\tau )=0$ for all $x$ implies the hypothesis $H_{0}:\beta _{\tau }=0$ , which can be tested using the estimator ${\hat {\beta _{\tau }}}$ and its limit distribution.

Goodness of fit

The goodness of fit for quantile regression for the $\tau$ quantile can be defined as:^[13] $R^{1}(\tau )=1-{\frac {{\hat {V}}_{\tau }}{{\tilde {V}}_{\tau }}},$ where ${\hat {V}}_{\tau }$ is the minimized expected loss function under the full model, while ${\tilde {V}}_{\tau }$ is the expected loss function under the intercept-only model.

Remove ads

Variants

Summarize

Perspective

Bayesian methods for quantile regression

Because quantile regression does not normally assume a parametric likelihood for the conditional distributions of Y|X, the Bayesian methods work with a working likelihood. A convenient choice is the asymmetric Laplacian likelihood,^[14] because the mode of the resulting posterior under a flat prior is the usual quantile regression estimates. The posterior inference, however, must be interpreted with care. Yang, Wang and He^[15] provided a posterior variance adjustment for valid inference. In addition, Yang and He^[16] showed that one can have asymptotically valid posterior inference if the working likelihood is chosen to be the empirical likelihood.

Machine learning methods for quantile regression

Beyond simple linear regression, there are several machine learning methods that can be extended to quantile regression. A switch from the squared error to the tilted absolute value loss function (a.k.a. the pinball loss^[17]) allows gradient descent-based learning algorithms to learn a specified quantile instead of the mean. It means that we can apply all neural network and deep learning algorithms to quantile regression,^[18]^[19] which is then referred to as nonparametric quantile regression.^[20] Tree-based learning algorithms are also available for quantile regression (see, e.g., Quantile Regression Forests,^[21] as a simple generalization of Random Forests).

Censored quantile regression

If the response variable is subject to censoring, the conditional mean is not identifiable without additional distributional assumptions, but the conditional quantile is often identifiable. For recent work on censored quantile regression, see: Portnoy^[22] and Wang and Wang^[23]

Example (2):

Let $Y^{c}=\max(0,Y)$ and $Q_{Y|X}=X\beta _{\tau }$ . Then $Q_{Y^{c}|X}(\tau )=\max(0,X\beta _{\tau })$ . This is the censored quantile regression model: estimated values can be obtained without making any distributional assumptions, but at the cost of computational difficulty,^[24] some of which can be avoided by using a simple three step censored quantile regression procedure as an approximation.^[25]

For random censoring on the response variables, the censored quantile regression of Portnoy (2003)^[22] provides consistent estimates of all identifiable quantile functions based on reweighting each censored point appropriately.

Censored quantile regression has close links to survival analysis.

Heteroscedastic errors

The quantile regression loss needs to be adapted in the presence of heteroscedastic errors in order to be efficient.^[26]

Remove ads

Implementations

Numerous statistical software packages include implementations of quantile regression:

Matlab function quantreg^[27]
gretl has the quantreg command.^[28]
R offers several packages that implement quantile regression, most notably quantreg by Roger Koenker,^[29] but also gbm,^[30] quantregForest,^[31] qrnn^[32] and qgam^[33]
Python, via Scikit-garden^[34] and statsmodels^[35]
SAS through proc quantreg (ver. 9.2)^[36] and proc quantselect (ver. 9.3).^[37]
Stata, via the qreg command.^[38]^[39]
Vowpal Wabbit, via --loss_function quantile.^[40]
Mathematica package QuantileRegression.m^[41] hosted at the MathematicaForPrediction project at GitHub.
Wolfram Language function QuantileRegression^[42] hosted at Wolfram Function Repository.

Remove ads

Literature

The Wikibook R Programming has a page on the topic of: Quantile Regression

Angrist, Joshua D.; Pischke, Jörn-Steffen (2009). "Quantile Regression". Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press. pp. 269–291. ISBN 978-0-691-12034-8.
Koenker, Roger (2005). Quantile Regression. Cambridge University Press. ISBN 978-0-521-60827-5.

References

Loading content...

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads

Advantages and applications

History

Background: quantiles

Quantile of a random variable

Example

Intuition

Sample quantile

Conditional quantile and quantile regression

Computation of estimates for regression parameters

Asymptotic properties

Equivariance

Scale equivariance

Shift equivariance

Equivariance to reparameterization of design

Invariance to monotone transformations

Inference

Interpretation of the slope parameters

Goodness of fit

Variants

Bayesian methods for quantile regression

Machine learning methods for quantile regression

Censored quantile regression

Heteroscedastic errors

Implementations

See also

Literature

References