后缀数组(多个字符串的最长公共子串)—— POJ 3294
2015-01-26 09:51
483 查看
对应POJ 题目:点击打开链接
Life Forms
Time Limit:6666MS Memory Limit:0KB 64bit IO Format:%lld
& %llu
Submit Status
Description
cubes, oil slicks or clouds of dust.
The answer is given in the 146th episode of Star Trek - The Next Generation, titled The Chase. It turns out that in the vast majority of the quadrant's life forms ended up with a large fragment
of common DNA.
Given the DNA sequences of several life forms represented as strings of letters, you are to find the longest substring that is shared by more than half of them.
Standard input contains several test cases. Each test case begins with 1 ≤ n ≤ 100, the number of life forms. n lines follow; each contains a string of lower case letters representing the DNA sequence
of a life form. Each DNA sequence contains at least one and not more than 1000 letters. A line containing 0 follows the last test case.
For each test case, output the longest string or strings shared by more than half of the life forms. If there are many, output all of them in alphabetical order. If there is no solution with at least one letter,
output "?". Leave an empty line between test cases.
Gordon V. Cormack
题意:给定一个数n,再给出n个字符串,求不少于n/2个字符串的最长公共子串。
思路:就是后缀数组求多个字符串的最长公共子串,height数组分组+二分答案求上界。细节上,求得一组公共前缀后,要判断是否含有分隔符。一开始我是直接for一遍那个前缀检查是否有分隔符,后来发现其实只需要判断首尾字符是不是来自同一个字符串就可以了,一下子又高效了一点,么么哒。。。
Life Forms
Time Limit:6666MS Memory Limit:0KB 64bit IO Format:%lld
& %llu
Submit Status
Description
Problem C: Life Forms
You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial traits such as height, colour, wrinkles, ears, eyebrows and the like. A few bear no human resemblance; these typically have geometric or amorphous shapes likecubes, oil slicks or clouds of dust.
The answer is given in the 146th episode of Star Trek - The Next Generation, titled The Chase. It turns out that in the vast majority of the quadrant's life forms ended up with a large fragment
of common DNA.
Given the DNA sequences of several life forms represented as strings of letters, you are to find the longest substring that is shared by more than half of them.
Standard input contains several test cases. Each test case begins with 1 ≤ n ≤ 100, the number of life forms. n lines follow; each contains a string of lower case letters representing the DNA sequence
of a life form. Each DNA sequence contains at least one and not more than 1000 letters. A line containing 0 follows the last test case.
For each test case, output the longest string or strings shared by more than half of the life forms. If there are many, output all of them in alphabetical order. If there is no solution with at least one letter,
output "?". Leave an empty line between test cases.
Sample Input
3 abcdefg bcdefgh cdefghi 3 xxx yyy zzz 0
Output for Sample Input
bcdefg cdefgh ?
Gordon V. Cormack
题意:给定一个数n,再给出n个字符串,求不少于n/2个字符串的最长公共子串。
思路:就是后缀数组求多个字符串的最长公共子串,height数组分组+二分答案求上界。细节上,求得一组公共前缀后,要判断是否含有分隔符。一开始我是直接for一遍那个前缀检查是否有分隔符,后来发现其实只需要判断首尾字符是不是来自同一个字符串就可以了,一下子又高效了一点,么么哒。。。
#include <stdio.h> #include <stdlib.h> #include <string.h> #define MS(x, y) memset(x, y, sizeof(x)) const int MAXN = 100000+2000; const int INF = 1<<30; int wa[MAXN],wb[MAXN],wv[MAXN],ws[MAXN]; int rank[MAXN],r[MAXN],sa[MAXN],height[MAXN]; char str[1005]; int vis[1005], ID[1005]; int block[MAXN]; int cmp(int *r, int a, int b, int l) { return r[a] == r[b] && r[a+l] == r[b+l]; } void da(int *r, int *sa, int n, int m) { int i, j, p, *x = wa, *y = wb, *t; for(i=0; i<m; i++) ws[i] = 0; for(i=0; i<n; i++) ws[x[i] = r[i]]++; for(i=1; i<m; i++) ws[i] += ws[i-1]; for(i=n-1; i>=0; i--) sa[--ws[x[i]]] = i; for(j=1,p=1; p<n; j<<=1, m=p){ for(p=0,i=n-j; i<n; i++) y[p++] = i; for(i=0; i<n; i++) if(sa[i] >= j) y[p++] = sa[i] - j; for(i=0; i<n; i++) wv[i] = x[y[i]]; for(i=0; i<m; i++) ws[i] = 0; for(i=0; i<n; i++) ws[wv[i]]++; for(i=1; i<m; i++) ws[i] += ws[i-1]; for(i=n-1; i>=0; i--) sa[--ws[wv[i]]] = y[i]; for(t=x,x=y,y=t,p=1,x[sa[0]]=0,i=1; i<n; i++) x[sa[i]] = cmp(y, sa[i-1], sa[i], j) ? p-1 : p++; } return; } void calheight(int *r, int *sa, int n) { int i, j, k = 0; for(i=1; i<n; i++) rank[sa[i]] = i; for(i=0; i<n-1; height[rank[i++]] = k) for(k ? k-- : 0,j=sa[rank[i]-1]; r[i+k] == r[j+k]; k++); return; } int main() { //freopen("in.txt", "r", stdin); int n; scanf("%d", &n); while(n) { if(1 == n){ scanf("%s", str); printf("%s\n", str); scanf("%d", &n); if(n) printf("\n"); continue; } int i, j, k; MS(rank, 0); MS(sa, 0); MS(wa, 0); MS(wb, 0); MS(ws, 0); MS(wv, 0); MS(r, 0); MS(height, 0); MS(block, -1); MS(ID, 0); int len = 1, tmp_l, maxn = 0; int left = 1, right = INF; for(i=0; i<n; i++){//把所有字符串连成一个用分隔符分隔的字符串 scanf("%s", str); tmp_l = strlen(str); if(tmp_l < right) right = tmp_l;//二分答案的右边界为最短字符串的长度 int k; for(j=len, k=0; k<tmp_l; j++, k++){ block[j] = i;//下标为j的字符所在的是第i个字符串 r[j] = str[k] - 'a' + 1; if(r[j] > maxn) maxn = r[j]; } len += tmp_l; r[len++] = 0;//末尾添加一个最小值 } da(r, sa, len, maxn+1); calheight(r, sa, len); int beg = 0, end = 0, ok, u = 0, ul = 0, LEN = 0; while(left <= right) { ok = u = 0; int mid = left + (right - left)/2;//二分答案 for(i=n+1; i<len; i++){ if(height[i] >= mid){//确定某一组的起点终点 //for(k=sa[i]; k < sa[i] + mid; k++) // if(0 == r[k]) break;//该公共前缀含有分隔符 if(block[sa[i]] == block[sa[i] + mid - 1]){//判断首尾字符是否来自同一个字符串 if(!beg) beg = i; end = i; } } if((beg && end) && (i == len - 1 || height[i] < mid)){ int count = 0; MS(vis, 0); for(j=beg-1; j<=end; j++){//一组里面有多少个后缀来自不同的字符串 int num = block[sa[j]]; if(!vis[num]) { vis[num] = 1; count++; } } if(count > n/2){//符合题意的解 ID[u++] = sa[j-1];//保存下标 LEN = mid; ok = 1; } beg = end = 0; } } if(ok) ul = u;//u值在每次二分都会置为0,故在每次找到合理的解后要赋给其它变量 if(ok) left = mid + 1;//找到解,说明不是最长 else right = mid - 1; } if(ul){ for(i=0; i<ul; i++){ for(j=ID[i]; j<ID[i] + LEN; j++) printf("%c", char(r[j] - 1 +'a')); printf("\n"); } } else printf("?\n"); scanf("%d", &n); if(n) printf("\n"); } }
相关文章推荐
- poj 1226 Substrings 求n个字符串的最长公共子串(这里可以是反序相同) 后缀数组
- poj 2774 最长公共子串--字符串hash或者后缀数组或者后缀自动机
- poj 3294 在K个字符串中出现最长公共子串
- poj 3294 求多于k个字符串的最长公共子串的个数-------后缀数组+二分答案
- poj 3294 Life Forms 求n(n>1)个字符串的最长的一个子串 后缀数组
- poj 2774 Long Long Message 求两个字符串的最长公共子串 后缀数组
- POJ 3450 Corporate Identity(kmp求多个字符串的最长公共子串)
- poj--3450 KMP求多个字符串的最长公共子串
- 求多个字符串的最长公共子串
- Poj 3294 Life Forms (后缀数组 在n个串中出现k次的最长公共子串并输出)
- HDU 1403 & POJ 2774 Longest Common Substring (后缀数组啊 求最长公共子串 模板题)
- poj 3294 Life Forms(不小于k 个字符串中的最长子串)
- POJ 2774 Long Long Message(最长公共子串 -初学后缀数组)
- POJ 3294 Life Forms(不小于k个字符串中的最长子串 后缀数组)
- POJ 3294 后缀数组:求不小于k个字符串中的最长子串
- POJ 3080:Blue Jeans:枚举求解n个字符串的最长公共连续子串
- POJ 2774 找出2字符串 最长公共连续子串
- poj 1226 hdu 1238 Substrings 求若干字符串正串及反串的最长公共子串 2002亚洲赛天津预选题
- Poj 2774两个字符串的最长公共子串长度
- poj 3080 kmp求解多个字符串的最长公共字串,(数据小,有点小暴力 16ms)