To extend LDA to non-linear mappings, the data, given as the
points
can be mapped to a new feature space,
via some function
In this new feature space, the function that needs to be maximized is[1]

where

and

Further, note that
. Explicitly computing the mappings
and then performing LDA can be computationally expensive, and in many cases intractable. For example,
may be infinite dimensional. Thus, rather than explicitly mapping the data to
, the data can be implicitly embedded by rewriting the algorithm in terms of dot products and using kernel functions in which the dot product in the new feature space is replaced by a kernel function,
.
LDA can be reformulated in terms of dot products by first noting that
will have an expansion of
the form[5]

Then note that

where

The numerator of
can then be written as:

Similarly, the denominator can be written as

with the
component of
defined as
is the identity matrix, and
the matrix with all entries equal to
. This identity can be derived by starting out with the expression for
and using the expansion of
and the definitions of
and 
![{\displaystyle {\begin{aligned}\mathbf {w} ^{\text{T}}\mathbf {S} _{W}^{\phi }\mathbf {w} &=\left(\sum _{i=1}^{l}\alpha _{i}\phi ^{\text{T}}(\mathbf {x} _{i})\right)\left(\sum _{j=1,2}\sum _{n=1}^{l_{j}}\left(\phi (\mathbf {x} _{n}^{j})-\mathbf {m} _{j}^{\phi }\right)\left(\phi (\mathbf {x} _{n}^{j})-\mathbf {m} _{j}^{\phi }\right)^{\text{T}}\right)\left(\sum _{k=1}^{l}\alpha _{k}\phi (\mathbf {x} _{k})\right)\\&=\sum _{j=1,2}\sum _{i=1}^{l}\sum _{n=1}^{l_{j}}\sum _{k=1}^{l}\left(\alpha _{i}\phi ^{\text{T}}(\mathbf {x} _{i})\left(\phi (\mathbf {x} _{n}^{j})-\mathbf {m} _{j}^{\phi }\right)\left(\phi (\mathbf {x} _{n}^{j})-\mathbf {m} _{j}^{\phi }\right)^{\text{T}}\alpha _{k}\phi (\mathbf {x} _{k})\right)\\&=\sum _{j=1,2}\sum _{i=1}^{l}\sum _{n=1}^{l_{j}}\sum _{k=1}^{l}\left(\alpha _{i}k(\mathbf {x} _{i},\mathbf {x} _{n}^{j})-{\frac {1}{l_{j}}}\sum _{p=1}^{l_{j}}\alpha _{i}k(\mathbf {x} _{i},\mathbf {x} _{p}^{j})\right)\left(\alpha _{k}k(\mathbf {x} _{k},\mathbf {x} _{n}^{j})-{\frac {1}{l_{j}}}\sum _{q=1}^{l_{j}}\alpha _{k}k(\mathbf {x} _{k},\mathbf {x} _{q}^{j})\right)\\&=\sum _{j=1,2}\left(\sum _{i=1}^{l}\sum _{n=1}^{l_{j}}\sum _{k=1}^{l}\left(\alpha _{i}\alpha _{k}k(\mathbf {x} _{i},\mathbf {x} _{n}^{j})k(\mathbf {x} _{k},\mathbf {x} _{n}^{j})-{\frac {2\alpha _{i}\alpha _{k}}{l_{j}}}\sum _{p=1}^{l_{j}}k(\mathbf {x} _{i},\mathbf {x} _{n}^{j})k(\mathbf {x} _{k},\mathbf {x} _{p}^{j})+{\frac {\alpha _{i}\alpha _{k}}{l_{j}^{2}}}\sum _{p=1}^{l_{j}}\sum _{q=1}^{l_{j}}k(\mathbf {x} _{i},\mathbf {x} _{p}^{j})k(\mathbf {x} _{k},\mathbf {x} _{q}^{j})\right)\right)\\&=\sum _{j=1,2}\left(\sum _{i=1}^{l}\sum _{n=1}^{l_{j}}\sum _{k=1}^{l}\left(\alpha _{i}\alpha _{k}k(\mathbf {x} _{i},\mathbf {x} _{n}^{j})k(\mathbf {x} _{k},\mathbf {x} _{n}^{j})-{\frac {\alpha _{i}\alpha _{k}}{l_{j}}}\sum _{p=1}^{l_{j}}k(\mathbf {x} _{i},\mathbf {x} _{n}^{j})k(\mathbf {x} _{k},\mathbf {x} _{p}^{j})\right)\right)\\[6pt]&=\sum _{j=1,2}\mathbf {\alpha } ^{\text{T}}\mathbf {K} _{j}\mathbf {K} _{j}^{\text{T}}\mathbf {\alpha } -\mathbf {\alpha } ^{\text{T}}\mathbf {K} _{j}\mathbf {1} _{l_{j}}\mathbf {K} _{j}^{\text{T}}\mathbf {\alpha } \\[4pt]&=\mathbf {\alpha } ^{\text{T}}\mathbf {N} \mathbf {\alpha } .\end{aligned}}}](//wikimedia.org/api/rest_v1/media/math/render/svg/45ff8395147305e90a7f29a1f711d8ebe9e98b23)
With these equations for the numerator and denominator of
, the equation for
can be rewritten as

Then, differentiating and setting equal to zero gives

Since only the direction of
, and hence the direction of
matters, the above can be solved for
as

Note that in practice,
is usually singular and so a multiple of the identity is added to it [1]

Given the solution for
, the projection of a new data point is given by[1]
