# Principal component analysis

The principal components of a collection of points in a real coordinate space are a sequence of ${\displaystyle p}$ unit vectors, where the ${\displaystyle i}$-th vector is the direction of a line that best fits the data while being orthogonal to the first ${\displaystyle i-1}$ vectors. Here, a best-fitting line is defined as one that minimizes the average squared perpendicular distance from the points to the line. These directions constitute an orthonormal basis in which different individual dimensions of the data are linearly uncorrelated. Principal component analysis is the process of computing the principal components and using them to perform a change of basis on the data, sometimes using only the first few principal components and ignoring the rest.
In data analysis, the first principal component of a set of ${\displaystyle p}$ variables, presumed to be jointly normally distributed, is the derived variable formed as a linear combination of the original variables that explains the most variance. The second principal component explains the most variance in what is left once the effect of the first component is removed, and we may proceed through ${\displaystyle p}$ iterations until all the variance is explained. PCA is most commonly used when many of the variables are highly correlated with each other and it is desirable to reduce their number to an independent set.
PCA is used in exploratory data analysis and for making predictive models. It is commonly used for dimensionality reduction by projecting each data point onto only the first few principal components to obtain lower-dimensional data while preserving as much of the data's variation as possible. The first principal component can equivalently be defined as a direction that maximizes the variance of the projected data. The ${\displaystyle i}$-th principal component can be taken as a direction orthogonal to the first ${\displaystyle i-1}$ principal components that maximizes the variance of the projected data.