Smallest grammar problem

In data compression and the theory of formal languages, the smallest grammar problem is the problem of finding the smallest context-free grammar that generates a given string of characters (but no other string). The size of a grammar is defined by some authors as the number of symbols on the right side of the production rules.^[1] Others also add the number of rules to that.^[2] A grammar that generates only a single string, as required for the solution to this problem, is called a straight-line grammar.^[3]

Every binary string of length $n$ has a grammar of length $O(n/\log n)$ , as expressed using big O notation.^[3] For binary de Bruijn sequences, no better length is possible.^[4]

The (decision version of the) smallest grammar problem is NP-complete.^[1] It can be approximated in polynomial time to within a logarithmic approximation ratio; more precisely, the ratio is $O(\log {\tfrac {n}{g}})$ where $n$ is the length of the given string and $g$ is the size of its smallest grammar. It is hard to approximate to within a constant approximation ratio. An improvement of the approximation ratio to $o(\log n/\log \log n)$ would also improve certain algorithms for approximate addition chains.^[5]

[1]

[2]

[3]

[4]

[5]

Smallest grammar problem

See also

References

External links

Wikiwand - on