Top Qs
Timeline
Chat
Perspective
Raita algorithm
From Wikipedia, the free encyclopedia
Remove ads
In computer science, the Raita algorithm is a string searching algorithm which improves the performance of Boyer–Moore–Horspool algorithm. This algorithm preprocesses the string being searched for the pattern, which is similar to Boyer–Moore string-search algorithm. The searching pattern of particular sub-string in a given string is different from Boyer–Moore–Horspool algorithm. This algorithm was published by Timo Raita in 1991.[1]
|  | This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
 
 
 | 
Remove ads
Description
Raita algorithm searches for a pattern "P" in a given text "T" by comparing each character of pattern in the given text. Searching will be done as follows. Window for a text "T" is defined as the length of "P".
- First, last character of the pattern is compared with the rightmost character of the window.
- If there is a match, first character of the pattern is compared with the leftmost character of the window.
- If they match again, it compares the middle character of the pattern with middle character of the window.
If everything in the pre-check is successful, then the original comparison starts from the second character to last but one. If there is a mismatch at any stage in the algorithm, it performs the bad character shift function which was computed in pre-processing phase. Bad character shift function is identical to the one proposed in Boyer–Moore–Horspool algorithm.[1]
A modern formulation of a similar pre-check is found in std::string::find, a linear/quadratic string-matcher, in libc++ and libstdc++. Assuming a well-optimized version of memcmp, not skipping characters in the "original comparison" tends to be more efficient as the pattern is likely to be aligned.[2]
Remove ads
C Code for Raita algorithm
#include <limits.h>
#include <stddef.h>
#define ALPHABET_SIZE (1 << CHAR_BITS) /* typically 256 */
/* Preprocessing: the BMH bad-match table. */
static inline void preBmBc(char *pat, size_t lpat, ptrdiff_t bmBc[]) {
  size_t i;
  for (i = 0; i < ALPHABET_SIZE; ++i)
    bmBc[i] = lpat;
  for (i = 0; i < lpat - 1; ++i)
    bmBc[pat[i]] = lpat - i - 1;
}
void RAITA(char *pat, size_t lpat, char *s, size_t n) {
  ptrdiff_t bmBc[ALPHABET_SIZE];
  /* Quick edge cases. */
  if (lpat == 0 || lpat > n)
    return;
  if (lpat == 1) {
    char *match_ptr = s;
    while (match_ptr < s + n) {
      match_ptr = memchr(match_ptr, pat[0], n - (match_ptr - s));
      if (match_ptr != NULL) {
        OUTPUT(match_ptr - s);
        match_ptr++;
      } else
        return;
    }
  }
  preBmBc(pat, lpat, bmBc);
  /* The prematch-window. */
  char firstCh = pat[0];
  char middleCh = pat[lpat / 2];
  char lastCh = pat[lpat - 1];
  /* Searching */
  ptrdiff_t j = 0;
  while (j <= n - m) {
    char c = s[j + lpat - 1];
    /* This could harm data locality on long patterns. For these consider reducing
     * the number of pre-tests, or using more clustered indices. */
    if (lastCh == c && middleCh == s[j + lpat / 2] && firstCh == s[j] &&
        memcmp(&pat[1], &s[j+1], lpat - 2) == 0)
      OUTPUT(j);
    j += bmBc[c];
  }
}
Remove ads
Example
Pattern: abddb
Text:abbaabaabddbabadbb
Pre- Processing stage:
a b d 4 3 1
 Attempt 1:
 abbaabaabddbabadbb
 ....b
 Shift by 4 (bmBc[a])
Comparison of last character of pattern to rightmost character in the window. It's a mismatch and shifted by 4 according to the value in pre-processing stage.
 Attempt 2:
 abbaabaabddbabadbb
     A.d.B
 Shift by 3 (bmBc[b])
Here last and first character of the pattern are matched but middle character is a mismatch. So the pattern is shifted according to the pre-processing stage.
 Attempt 3:
 abbaabaabddbabadbb
        ABDDB
 Shift by 3 (bmBc[b])
We found exact match here but the algorithm continues until it can't move further.
 Attempt 4:
 abbaabaABDDBabadbb
           ....b
 Shift by 4 (bmBc[a])
At this stage, we need to shift by 4 and we can't move the pattern by 4. So, the algorithm terminates. Letters in capital letter are exact match of the pattern in the text.
Complexity
- Pre-processing stage takes O(m) time where "m" is the length of pattern "P".
- Searching stage takes O(mn) time complexity where "n" is the length of text "T".
See also
References
External links
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads
