POJ3693 Maximum repetition substring 后缀数组
2012-10-19 23:35
323 查看
Maximum repetition substring
Time Limit: 1000MS Memory Limit: 65536K
Total Submissions: 4671 Accepted: 1381
Description
The repetition number of a string is defined as the maximum number R such that the string can be partitioned into R same consecutive substrings. For example, the repetition number of "ababab" is 3 and "ababa" is 1.
Given a string containing lowercase letters, you are to find a substring of it with maximum repetition number.
Input
The input consists of multiple test cases. Each test case contains exactly one line, which
gives a non-empty string consisting of lowercase letters. The length of the string will not be greater than 100,000.
The last test case is followed by a line containing a '#'.
Output
For each test case, print a line containing the test case number( beginning with 1) followed by the substring of maximum repetition number. If there are multiple substrings of maximum repetition number, print the lexicographically smallest one.
Sample Input
ccabababc
daabbccaa
#
Sample Output
Case 1: ababab
Case 2: aa
Source
2008 Asia Hefei Regional Contest Online by USTC
--------------------
最近在学习后缀数组,这道题写了好久。。。
开始时瞥了一眼别人代码,发现就是暴力搞,枚举重复子串的起点和长度,于是果断TLE。。。
后来发现,人枚举起点时,只枚举了长度的整数倍,算了算,这样就是O(nlogn)了,觉得好神奇啊。。。
另外这道题数据太弱,导致错误的算法也能过,如下:
设以i为开头,重复的子串长为k,j = i + k,lcp(i,j)/k 就是重复次数。
suffix[i]和suffix[j]肯定非常相似,
因为lcp(i,j)= min{height[rank[i]]..height[rank[j]]}(假设rank[i] < rank[j])
所以lcp(i,j)最大 = height[rank[i+1]];
这里用到个结论,lcp(i,j)一定等于height[rank[i+1]];
当然这是有问题的!!!
反例:abcabcpabcabd
靠谱的解法,首先枚举重复子串长度k,想想把原串分成n/k段,每段长为k,如果要寻找的子串长为L,重复了L/k次,则这个子串至少会经过L/k次划分边界,或者说子串完全覆盖的那几个段,一定是相等。即每次只枚举划分的边界i = p*k,p=0,1,2...如果lcp(i,i+k)>0,则从i往前找被截断的部分,同时注意取字典序最小的,好在rank已经把字典序排出来了。对于每个k,只枚举n/k遍,总的是nlogn的时间复杂度。当然里面还有“往前找”,但实际花费不了多少时间,如果实在担心,可以维护一个前缀数组(把串倒过来,后缀数组一下),在维护个rank的RMQ,就可以比较快的往前找了,好麻烦哈。。膜拜“学姐”。。
贡献几个测试用例
abababjklpabababjklq
bbabba
abcabcpabcabd
babbabb
ba
ab
zzbaba
aabzbz
至于答案,自己算吧。。也不长
Time Limit: 1000MS Memory Limit: 65536K
Total Submissions: 4671 Accepted: 1381
Description
The repetition number of a string is defined as the maximum number R such that the string can be partitioned into R same consecutive substrings. For example, the repetition number of "ababab" is 3 and "ababa" is 1.
Given a string containing lowercase letters, you are to find a substring of it with maximum repetition number.
Input
The input consists of multiple test cases. Each test case contains exactly one line, which
gives a non-empty string consisting of lowercase letters. The length of the string will not be greater than 100,000.
The last test case is followed by a line containing a '#'.
Output
For each test case, print a line containing the test case number( beginning with 1) followed by the substring of maximum repetition number. If there are multiple substrings of maximum repetition number, print the lexicographically smallest one.
Sample Input
ccabababc
daabbccaa
#
Sample Output
Case 1: ababab
Case 2: aa
Source
2008 Asia Hefei Regional Contest Online by USTC
--------------------
最近在学习后缀数组,这道题写了好久。。。
开始时瞥了一眼别人代码,发现就是暴力搞,枚举重复子串的起点和长度,于是果断TLE。。。
后来发现,人枚举起点时,只枚举了长度的整数倍,算了算,这样就是O(nlogn)了,觉得好神奇啊。。。
另外这道题数据太弱,导致错误的算法也能过,如下:
int main(){ int i, j, k, t, n, ans, pos, len, cas; cas = 0; while(scanf("%s", str) != EOF && str[0] != '#'){ for (i = 0; str[i]; i++){ s.r[i] = str[i] - 'a' + 1; } s.r[i] = 0; s.n = i; s.getsa(30); s.getheight(); s.initRMQ(); ans = len = 1; pos = 0; for (i = 1; i < s.n; i++){ if (ans == 1 && s.r[i] < s.r[pos]) pos = i; t = s.height[i]; for (j = i; j < s.n && t && s.height[j] >= t; j++){ k = s.sa[i - 1] - s.sa[j]; if (k < 0) k = -k; // printf("%d - %d, len = %d, t = %d\n", s.sa[i - 1], s.sa[j], k, t); if ((t + k) / k > ans){ ans = (t + k) / k; if (s.sa[i - 1] < s.sa[j]) pos = s.sa[i - 1]; else pos = s.sa[j]; len = k; } } } printf("Case %d: ", ++cas); for (i = 0; i < ans * len; i++) printf("%c", s.r[i + pos] + 'a' - 1); printf("\n"); } return 0; }大致想法是:
设以i为开头,重复的子串长为k,j = i + k,lcp(i,j)/k 就是重复次数。
suffix[i]和suffix[j]肯定非常相似,
因为lcp(i,j)= min{height[rank[i]]..height[rank[j]]}(假设rank[i] < rank[j])
所以lcp(i,j)最大 = height[rank[i+1]];
这里用到个结论,lcp(i,j)一定等于height[rank[i+1]];
当然这是有问题的!!!
反例:abcabcpabcabd
靠谱的解法,首先枚举重复子串长度k,想想把原串分成n/k段,每段长为k,如果要寻找的子串长为L,重复了L/k次,则这个子串至少会经过L/k次划分边界,或者说子串完全覆盖的那几个段,一定是相等。即每次只枚举划分的边界i = p*k,p=0,1,2...如果lcp(i,i+k)>0,则从i往前找被截断的部分,同时注意取字典序最小的,好在rank已经把字典序排出来了。对于每个k,只枚举n/k遍,总的是nlogn的时间复杂度。当然里面还有“往前找”,但实际花费不了多少时间,如果实在担心,可以维护一个前缀数组(把串倒过来,后缀数组一下),在维护个rank的RMQ,就可以比较快的往前找了,好麻烦哈。。膜拜“学姐”。。
int main(){ int i, j, k, p, t, n, ans, pos, len, cas; cas = 0; while(scanf("%s", str) != EOF && str[0] != '#'){ ans = len = 1; pos = 0; for (i = 0; str[i]; i++){ s.r[i] = str[i] - 'a' + 1; if (s.r[pos] > s.r[i]) pos = i; } s.r[i] = 0; s.n = i; s.getsa(30); s.getheight(); s.initRMQ(); for (k = 1; k <= s.n / 2; k++){ for (p = 0; p + k < s.n; p += k){ i = p; j = i + k; t = s.lcp(i, j); for (; i >= 0 && j >= 0 && s.r[i] == s.r[j]; i--, j--, t++){ // printf("%d - %d, len = %d, t = %d\n", i, j, k, t); if (t >= k && ((t + k) / k > ans || ((t + k) / k == ans && s.rank[i] < s.rank[pos]))){ ans = (t + k) / k; pos = i; len = k; } } } } printf("Case %d: ", ++cas); for (i = 0; i < ans * len; i++) printf("%c", s.r[i + pos] + 'a' - 1); printf("\n"); } return 0; }
贡献几个测试用例
abababjklpabababjklq
bbabba
abcabcpabcabd
babbabb
ba
ab
zzbaba
aabzbz
至于答案,自己算吧。。也不长
相关文章推荐
- 【后缀数组】poj3693 Maximum repetition substring
- 【POJ3693】Maximum repetition substring【后缀数组】
- 【POJ3693】Maximum repetition substring 后缀数组恶心题
- POJ3693 Maximum repetition substring [后缀数组 ST表]
- 【后缀数组】 HDOJ 2459 && POJ 3693 Maximum repetition substring
- POJ 3693 Maximum Repetition Substring 后缀数组
- POJ - 3693 Maximum repetition substring 后缀数组 分块
- Poj 3693 Maximum repetition substring|后缀数组|st表
- POJ 3693 Maximum repetition substring(后缀数组[重复次数最多的连续重复子串])
- POJ 3693 Maximum repetition substring(后缀数组神题)
- 【POJ】3693 Maximum repetition substring 【后缀数组——求最长连续重复字串】
- POJ 3693 Maximum repetitionsubstring(后缀数组:循环子串)
- POJ-3693 Maximum repetition substring 后缀数组
- POJ 3693 Maximum repetition substring 后缀数组求重复次数最多子串
- POJ 3693 Maximum repetition substring(后缀数组求最长重复子串)
- POJ 3693 Maximum repetition substring 后缀数组 + RMQ预处理
- POJ 3693 Maximum repetition substring 后缀数组 暴力 rmq
- HDU 2459 PKU 3693 Maximum repetition substring 后缀数组 RMQ
- POJ 3693 Maximum repetition substring 后缀数组与区间最值的完美结合
- POJ-3693-Maximum repetition substring(后缀数组-重复次数最多的连续重复子串)