您的位置:首页 > 其它

据说是一道百度的笔试题

2008-01-25 16:28 281 查看
在论坛上看到有人问这个,觉得其中一个有点意思就回了。顺便把回的内容发在这里。

题目是这样的:

///////////////////////////////////////////////////////////////////// answer beginning ///////////////////////////////////////////////////////////////////////////

考虑一个在线好友系统。系统为每个用户维护一个好友列表,列表限制最多可以有500个好友,好友必须是这个系统中的其它用户。好友关系是单向的,用户B是用户A的好友,但A不一定是B的好友。

用户以ID形式表示,现给出好友列表数据的文本形式如下:
1 3,5,7,67,78,3332
2 567,890
31 1,66
14 567
78 10000

每行数据有两列,第一列为用户ID,第二列为其好友ID,不同ID间用”,”分隔,ID升序排列。列之间用”t”分隔。

要求:
请设计合适的索引数据结构,来完成以下查询:
给定用户A和B,查询A和B之间是否有这样的关系:B是A的二维好友(好友的好友)。
如上例中,10000为1的二维好友,因为78为1的好友,10000为78的好友。

详细说明自己的解题思路,说明自己实现的一些关键点。并给出实现的伪代码实现建立索引过程和查询过程,并说明空间和时间复杂度。

限制:
用户数量不超过1000万,平均50个好友。

///////////////////////////////////////////////////////////////////// answer ending ///////////////////////////////////////////////////////////////////////////

///////////////////////////////////////////////////////////////////// my reply beginning ///////////////////////////////////////////////////////////////////////////

Construct a link which has 10,000,000 nodes and node's definition is:

struct node{
int cur_node;
std::vector<node*> p_friend_vec;
std::vector<node*> p_indirect_vec;
};

cur_node is the number of user from 1~10,000,000, p_friend_vec is a container to contain pointers point to its firends nodes, and p_indirect_vec contain pointers point to nodes who has friend of current node, i.e. node 10000 has a pointer to 78 to indicates 10000 is a friend of 78.

To construct this list we first init a 10000 nodes list which has cur_node vaule from 1 to 10000, then reads txt file row by row to index the friends relationship, I think this should be simple. This construction's complexity is O(N) or O(50N) in time and O(N) or O(100N) in space.

To query a relationship between A and B, we need to iterate all pointers in A and B, to check out if or not they have a nodes both be the pointee of A and B. To do this, typically, a double loop:
for(std::vector<node*>::iterator i == p_friend_vec.begin(); i != p_friend_vec.end(); ++i)
{
for(std::vector<node*>::iterator j == p_indirect_vec.begin(); j != p_indirect_vec.end(); ++j)
{
if(i->cur_node == j->cur_node)
// bingo!
}
}

This search is O(502) or O(M2) which M is average friends number.

Above method is typical, however there is another way to do this more quickly. This method is somewhat similar to idf. It requires a 10,000,000*10,000,000 matrix. The matrix record friends information, e.g. 1 has friends 3,5,7... so first row of that matrix is a vector: [1, 0, 1, 0, 1, 0, 1,...](notice that the one always is friend of itself). And notice that the column of matirx recode friends who belonged to. To see if or not A and B has the required relationship we just 'OR' A's row and B's column to see any 1 exist in result vector, if the vector has one or more than one 1, A and B has the required relationship. This method is O(50) or O(M) (which M is average friends number) in finding. That's much faster than above method which is O(M2). And we only use 0, 1 in this matrix, so it's space complexity is also low which is 100M/8 = 13M
///////////////////////////////////////////////////////////////////// my reply ending ///////////////////////////////////////////////////////////////////////////
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: