Learned sparse retrieval

Learned sparse retrieval (LSR) or sparse neural search is an approach to Information Retrieval which uses a sparse vector representation of queries and documents.^[1] It borrows techniques both from lexical bag-of-words and vector embedding algorithms, and is claimed to perform better than either alone. The best-known sparse neural search systems are SPLADE^[2] and its successor SPLADE v2.^[3] Others include DeepCT,^[4] uniCOIL,^[5] EPIC,^[6] DeepImpact,^[7] TILDE and TILDEv2,^[8] Sparta,^[9] SPLADE-max, and DistilSPLADE-max.^[3]

Multimodal Learned Sparse Retrieval. LSR approaches have also been extended to the vision-language domain, where they are applied to multimodal data, such as the combination of text and images.^[10] This expansion enables the retrieval of relevant content across different modalities, such as finding images based on text queries or vice versa.

Some implementations of SPLADE have similar latency to Okapi BM25 lexical search while giving as good results as state-of-the-art neural rankers on in-domain data.^[11]

The Official SPLADE model weights and training code is released under a Creative Commons NonCommercial license.^[12] But there are other independent implementations of SPLADE++ (a variant of SPLADE models) that are released under permissive licenses.

SPRINT is a toolkit for evaluating neural sparse retrieval systems.^[13]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Learned sparse retrieval

Splade

External links

Notes

Wikiwand - on