您的位置:首页 > 其它

*[leetcode] 30.Substring with Concatenation of All Words

2017-09-03 11:35 441 查看
题目地址:https://leetcode.com/problems/substring-with-concatenation-of-all-words/discuss/

题目描述: You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.

For example, given:
s: "barfoothefoobarman"
words: ["foo", "bar"]

You should return the indices: [0,9].
(order does not matter).


我的代码:

class Solution {
public:
vector<int> findSubstring(string S, vector<string>& L) {
vector<int> re;
int n=S.size(),nl=L.size();
if(n==0||nl==0) return re;
unordered_map<string,int> words;
int m=L[0].size();
if(n<m) return re;
for(auto& s:L) words[s]++;
for(int i=0;i<m;i++){
int start=i,end=i;
int k=0;
unordered_map<string,int> p;
while(end<=n-m){
string word=S.substr(end,m);
end+=m;
if(words.find(word)==words.end()){
p.clear();
k=0;
start=end;
continue;
}
p[word]++;
k++;
while(p[word]>words[word]){
p[S.substr(start,m)]--;
k--;
start+=m;

}
if(k==nl){
re.push_back(start);
p[S.substr(start,m)]--;
start+=m;
k--;
}
}
}
return re;
}
};


别人的代码:

class Solution {
// The general idea:
// Construct a hash function f for L, f: vector<string> -> int,
// Then use the return value of f to check whether a substring is a concatenation
// of all words in L.
// f has two levels, the first level is a hash function f1 for every single word in L.
// f1 : string -> double
// So with f1, L is converted into a vector of float numbers
// Then another hash function f2 is defined to convert a vec
b294
tor of doubles into a single int.
// Finally f(L) := f2(f1(L))
// To obtain lower complexity, we require f1 and f2 can be computed through moving window.
// The following corner case also needs to be considered:
// f2(f1(["ab", "cd"])) != f2(f1(["ac", "bd"]))
// There are many possible options for f2 and f1.
// The following code only shows one possibility (probably not the best),
// f2 is the function "hash" in the class,
// f1([a1, a2, ... , an]) := int( decimal_part(log(a1) + log(a2) + ... + log(an)) * 1000000000 )
public:
// The complexity of this function is O(nW).
double hash(double f, double code[], string &word) {
double result = 0.;
for (auto &c : word) result = result * f + code[c];
return result;
}
vector<int> findSubstring(string S, vector<string> &L) {
uniform_real_distribution<double> unif(0., 1.);
default_random_engine seed;
double code[128];
for (auto &d : code) d = unif(seed);
double f = unif(seed) / 5. + 0.8;
double value = 0;

// The complexity of the following for loop is O(L.size( ) * nW).
for (auto &str : L) value += log(hash(f, code, str));

int unit = 1e9;
int key = (value-floor(value))*unit;
int nS = S.size(), nL = L.size(), nW = L[0].size();
double fn = pow(f, nW-1.);
vector<int> result;
if (nS < nW) return result;
vector<double> values(nS-nW+1);
string word(S.begin(), S.begin()+nW);
values[0] = hash(f, code, word);

// Use a moving window to hash every word with length nW in S to a float number,
// which is stored in vector values[]
// The complexity of this step is O(nS).
for (int i=1; i<=nS-nW; ++i) values[i] = (values[i-1] - code[S[i-1]]*fn)*f + code[S[i+nW-1]];

// This for loop will run nW times, each iteration has a complexity O(nS/nW)
// So the overall complexity is O(nW * (nS / nW)) = O(nS)
for (int i=0; i<nW; ++i) {
int j0=i, j1=i, k=0;
double sum = 0.;

// Use a moving window to hash every L.size() continuous words with length nW in S.
// This while loop will terminate within nS/nW iterations since the increasement of j1 is nW,
// So the complexity of this while loop is O(nS / nW).
while(j1<=nS-nW) {
sum += log(values[j1]);
++k;
j1 += nW;
if (k==nL) {
int key1 = (sum-floor(sum)) * unit;
if (key1==key) result.push_back(j0);
sum -= log(values[j0]);
--k;
j0 += nW;
}
}
}
return result;
}
};


分析与学习:

首先分析一下题目,题意很简单,即在字符串里面找出所有目标子串,这些子串是单词表L中所有字符串的一个全排列。

简单的思路就是用字典来记录L,然后用另一个单词表来记录S的子串,并判断两者的大小关系。若是对S中每一个位置开始的子串做判断,显然不合适。不过我们可以看到,对字符串S分别从下标i(0< = i < m)开始,以m长度为间隔做划分,(m是L中单词的长度)。由此对每个i,可以得到(n-i)/m 个长度为m的单词,我们只要找到这些单词中连续的nl个使该nl个单词构成的字典与L的字典相同(nl为L中单词的个数)。

而第二个代码则是对整个单词表进行重新编码(即采用密码学原理),code[i]表示ASCII中第i个字符的新值,f作权。对每个单词,不同的位置赋予不同的指数k,该位置的权值就是f的k次方。而这个单词的新值就是每个字母与该位置的权值的乘积之和的对数值。

然后对整个单词表,该单词表的排列值就等于所有单词的新值之和的小数部分。

经过这样两次hash转换,我们只需对按代码1同样的方法对S进行划分,然后对单词表的值做判断即可。

但该方法的唯一性有待证明,但显而易见的是,取随机数的情况,f2(f1(“ab”,”cd”))==f2(f1(“ac”,”bd”))的可能性是极低的。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: