Top Qs

Timeline

Chat

Perspective

Generalized suffix tree

From Wikipedia, the free encyclopedia

Generalized suffix tree

Remove ads

In computer science, a generalized suffix tree is a suffix tree for a set of strings. Given the set of strings $D=S_{1},S_{2},\dots ,S_{d}$ of total length $n$ , it is a Patricia tree containing all $n$ suffixes of the strings. It is mostly used in bioinformatics.^[1]

Thumb — Suffix tree for the strings ABAB and BABA. Suffix links not shown.

Remove ads

Functionality

It can be built in $\Theta (n)$ time and space, and can be used to find all $z$ occurrences of a string $P$ of length $m$ in $O(m+z)$ time, which is asymptotically optimal (assuming the size of the alphabet is constant^[2]^: 119).

When constructing such a tree, each string should be padded with a unique out-of-alphabet marker symbol (or string) to ensure no suffix is a substring of another, guaranteeing each suffix is represented by a unique leaf node.

Algorithms for constructing a GST include Ukkonen's algorithm (1995) and McCreight's algorithm (1976).

Remove ads

Example

A suffix tree for the strings ABAB and BABA is shown in a figure above. They are padded with the unique terminator strings $0 and $1. The numbers in the leaf nodes are string number and starting position. Notice how a left to right traversal of the leaf nodes corresponds to the sorted order of the suffixes. The terminators might be strings or unique single symbols. Edges on $ from the root are left out in this example.

Remove ads

Alternatives

An alternative to building a generalized suffix tree is to concatenate the strings, and build a regular suffix tree or suffix array for the resulting string. When hits are evaluated after a search, global positions are mapped into documents and local positions with some algorithm and/or data structure, such as a binary search in the starting/ending positions of the documents.

References

Loading content...

External links

Loading content...

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads

Remove ads