Top Qs
Timeline
Chat
Perspective

byte pair encoding

From Wiktionary, the free dictionary

Remove ads

English

Alternative forms

Noun

byte pair encoding (countable and uncountable, plural byte pair encodings)

  1. (computing) A lossless data compression algorithm that iteratively replaces the most frequent pair of adjacent bytes in a sequence with a new byte not already present in the data.
  2. (natural language processing) A subword tokenization method that iteratively merges the most frequent pairs of adjacent characters in a corpus to form longer and more meaningful tokens, typically until a predefined vocabulary size is reached.
Remove ads

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads