# Probabilistic latent semantic analysis

## From Wikipedia, the free encyclopedia

**Probabilistic latent semantic analysis** (**PLSA**), also known as **probabilistic latent semantic indexing** (**PLSI**, especially in information retrieval circles) is a statistical technique for the analysis of two-mode and co-occurrence data. In effect, one can derive a low-dimensional representation of the observed variables in terms of their affinity to certain hidden variables, just as in latent semantic analysis, from which PLSA evolved.

Compared to standard latent semantic analysis which stems from linear algebra and downsizes the occurrence tables (usually via a singular value decomposition), probabilistic latent semantic analysis is based on a mixture decomposition derived from a latent class model.