Sunday algorithm
2013-12-15 23:36
246 查看
Idea
http://www.iti.fh-flensburg.de/lang/algorithmen/pattern/sundayen.htmThe Boyer-Moore-algorithm uses for its bad-character
heuristics the text symbol that has caused a mismatch. The Horspool-algorithmuses the rightmost symbol of the current text window. It was observed
by Sunday [Sun 90] that it may be even better to use the symbol directly right of the text window,
since in any case this symbol is involved in the next possible match of the pattern.
Example:
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
(a) Boyer-Moore | (b) Horspool | (c) Sunday | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
c a b is the current text window that is compared with the pattern. Its suffix a b has matched, but the comparison c-a causes a mismatch. The bad-character heuristics of the Boyer-Moore algorithm (a) uses the "bad" text character c to determine the shift distance.
The Horspool algorithm (b) uses the rightmost character b of the current text window. The Sunday algorithm (c) uses the character directly right of the text window, namely d in this example. Since d does not occur in the pattern at all, the pattern can be
shifted past this position.
Like the Boyer-Moore and the Horspool algorithm, the Sunday algorithm assumes its best case if every time in the first comparison a text symbol is found that does not occur at all
in the pattern. Then the algorithm performs just O(n/m)
comparisons.
In contrast to the Boyer-Moore and the Horspool algorithm the pattern symbols need not be compared from right to left. They can be compared in an arbitrary order. For instance, this
order can depend on the symbol probabilities, provided they are known. Then the least probable symbol in the pattern is compared first, hoping that it does not match, so that the pattern can be shifted
The following example shows the comparisons performed if symbol c of the pattern is compared first..
Example:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... |
---|---|---|---|---|---|---|---|---|---|---|
a | b | c | a | b | d | a | a | c | b | a |
b | c | a | a | b | ||||||
b | c | a | a | b |
Preprocessing
The occurrence function occ required for the bad-character heuristics is computed in the same way as in the Boyer-Moore algorithm.Given a pattern p, the following function sundayInitocc computes the occurrence function; it
is identical to the function bmInitocc.
void sundayInitocc()
{
int j;
char a;
for (a=0; a<alphabetsize; a++)
occ[a]=-1;
for (j=0; j<m; j++)
{
a=p[j];
occ[a]=j;
}
}
Searching algorithm
Using a function matchesAt that compares the pattern with the text window in a certain manner depending on the implementation, the searching algorithm looks as follows:
void sundaySearch()
{
int i=0;
while (i<=n-m)
{
if (matchesAt(i)) report(i);
i+=m;
if (i<n) i-=occ[t[i]];
}
}
After statement i+=m,
it is necessary to check if the value of i is at most n-1,
since subsequently t[i]
is accessed.
References
[Sun 90] | D.M. Sunday: A Very Fast Substring Search Algorithm. Communications of the ACM, 33, 8, 132-142 (1990) |
[1] | http://www-igm.univ-mlv.fr/~lecroq/string/ |
相关文章推荐
- Horspool algorithm
- Boyer-Moore algorithm
- Google Interview Preparation
- Knuth-Morris-Pratt algorithm
- 算法之旅,直奔<algorithm>之十 count_if
- goagent新功能个人配置文件proxy.user.ini使用简介
- 文本数据导入HBASE库找不到类com/google/common/collect/Multimap
- 集成libevent,google protobuf的RPC框架
- GoLang之Concurrency多任务独立模式
- Google Author以及Google Structured Data,贴上你的照片到谷歌搜索结果
- google host
- STL Algorithm函数列表
- boost-string_algo字符串算法库
- Google 高级搜索
- golang的apns证书文件转换(P12 to Pem)
- category extension protocol(类目,延展,协议)
- 算法之旅,直奔<algorithm>之九 count
- django model field validator 设置
- 【Gordon's Great Escape】美食大冒险第二季第一集双语字幕
- fW - medians of medians algo - calculate the median in O(n)