您的位置：首页 > 编程语言

《编程之美》学习笔记——2.3寻找发贴“水王”

2015-01-12 11:21 211 查看

一、问题

Tango是微软亚洲研究院的一个试验项目。研究院的员工和实习生们都很喜欢在Tango上面交流灌水。传说，Tango有一大“水王”，他不但喜欢发贴，还会回复其他ID发的每个帖子。坊间风闻该“水王”发帖数目超过了帖子总数的一半。如果你有一个当前论坛上所有帖子（包括回帖）的列表，其中帖子作者的ID也在表中，你能快速找出这个传说中的Tango水王吗？

问题分析：

输入：给入一个含有重复数据的ID数组

输出：重复次数最高的ID

约束：重复次数最高的ID个数超过总数的一半。

二、解法

版本一：排序查找

对于数据规模为n的ID序列，首先对序列根据ID进行排序，最快的排序算法时间复杂度为O(nlgn)，然后来我们根据排好序的序列对查找重复次数最多的ID，这个时间复杂度是O(n)，因此总的时间复杂度是O(nlgn+n)。

实现时使用快速排序算法（见排序专题博文）对问题的ID序列进行排序，并设计算法寻找重复次数最多的ID（包括但不仅限于重复次数大于总数1/2的ID元素）

算法C实现：

/**
* @file find_id_by_sort_search.c
* @brief find the id which repeats most in the sequence.
* @author chenxilinsidney
* @version 1.0
* @date 2015-01-07
*/

#include <stdlib.h>
#include <stdio.h>
// #define NDEBUG
#include <assert.h>

// #define NDBG_PRINT
#include "debug_print.h"

typedef int TYPE;

#define MAX_COUNT      10000000
TYPE array[MAX_COUNT] = {0};

/**
* @brief select the last element as a pivoit.
* Reorder the array so that all elements with values less than the pivot
* come before the pivot, while all elements with values greater than the pivot
* come after it (equal values can go either way). After this partitioning, the
* pivot is in its final position.
*
* @param[in,out]  array          input and output array
* @param[in]      index_begin    the begin index of the array(included)
* @param[in]      index_end      the end index of the array(included)
*
* @return the position of the pivot(index from the array)
*/
TYPE partition(TYPE* array, TYPE index_begin, TYPE index_end)
{
/// pick last element of the array as the pivot
TYPE pivot = array[index_end];
/// index of the elments that not greater than pivot
TYPE i = index_begin - 1;
TYPE j, temp;
/// check array's elment one by one
for (j = index_begin; j < index_end; j++) {
if (array[j] <= pivot) {
/// save the elements not greater than pivot to left index of i.
i++;
temp = array[j];
array[j] = array[i];
array[i] = temp;
}
}
/// set the pivot to the right position
array[index_end] = array[++i];
array[i] = pivot;
/// return the position of the pivot
return i;
}

/**
* @brief sort the array by Quicksort method.
*
* @param[in,out]  array       sequence for sort
* @param[in]      index_begin begin index of sequence for sort(included)
* @param[in]      index_end   end index of sequence for sort(included)
*/
void quick_sort(TYPE* array, TYPE index_begin, TYPE index_end)
{
/// sort only under the index_begin < index_end condition
if ( index_begin < index_end) {
/// exchange elements to the pivot position by partition function
TYPE index_pivot = partition(array, index_begin, index_end);
/// sort the array before the pivot position
quick_sort(array, index_begin, index_pivot - 1);
/// sort the array after the pivot position
quick_sort(array, index_pivot + 1, index_end);
}
}

/**
* @brief find the max repeated element from a sorted sequence
*
* @param[in]      array  sorted sequence
* @param[in]      count  sequence length
*
* @return the max element that repeats max time
*/
TYPE search_max_repeated_element(TYPE* array, TYPE count)
{
TYPE count_temp, value_temp;
TYPE max_value = array[0];
TYPE max_count = 0;
TYPE i = 0;
while (i < count) {
value_temp = array[i];
count_temp = 1;
while (array[++i] == value_temp)
count_temp++;
if (count_temp >= max_count) {
max_count = count_temp;
max_value = value_temp;
}
}
return max_value;
}

/**
* @brief find the element that repeats max time in the unsorted array
*
* @param[in,out]  array  unsorted array
* @param[in]      count  array length
*
* @return the element value that repeats max time
*/
TYPE find_id(TYPE* array, TYPE count)
{
/// sort array first
quick_sort(array, 0, count - 1);
/// search the wanted element
return search_max_repeated_element(array, count);
}

int main(void) {
/// read data to array
TYPE count = 0;
while(count < MAX_COUNT && scanf("%u\n", array + count) == 1) {
++count;
}
/// find id
TYPE id = find_id(array, count);
/// output result
printf("max repeat count element is %d.\n", id);
return EXIT_SUCCESS;
}

分析：

在查找方式上，这里通过扫描一遍排好序的ID列表来统计各个ID出现次数，输出最大次数的ID，时间复杂度为O(n)。

更优的方式，可以考虑重复次数最高的ID个数超过总数的一半这个约束条件。如果这个约束条件成立，那么排好序的ID列表中第n/2项一定会是这个ID，这样就不用扫描ID列表，时间复杂度为O(1)，大大减低查找所花费的时间。（这里恰到好处的使用了题目中的约束条件）

但是以上两种方法都需要对ID进行排序，因此总时间复杂度至少为O(nlgn)。

版本二：采用“已知的水王发帖数超过一半”这个约束处理本题。

思路：如果每次删除两个不同的ID（不管是否包含“水王”的ID），那么，剩下的ID列表中，“水王”ID出现的次数仍然超过总数的一半。看到这一点之后，就可以通过不断重复这个过程，把ID列表中的ID总数降低（转化为更小的问题），从而得到问题的答案。（《编程之美》原话）

下面算法思想：（参考博客： http://blog.csdn.net/rein07/article/details/6741661）
假设每个ID都有可能是水王，那么在遍历时这个水王就要遇到一种挑战，可能自己的帖子数是会增加的，也可能是遇到挑战的，帖子数要减少的。这样遍历下来，只有水王的

帖子增加的减去遇到挑战的帖子数会是大于0的。其他任何帖子假设为水王时都是禁不起挑战的。

步骤：

1. 可以假设帖子的第一个ID是次数最大的，用candidate记录，次数用nTimes记录。

2. 遍历下一个ID，如果跟candidate一样，nTimes++，否则，遇到一个挑战，则nTimes--，如果nTimes == 0，下一步就要重复第一步了。

3.遍历结束，nTimes>0的那个candidate就是水王ID，他是获胜者。

算法C实现：

TYPE find_id(TYPE* array, TYPE count)
{
TYPE i;
TYPE candidate, nTimes = 0;
for (i = 0; i < count; i++) {
if (nTimes == 0) {
candidate = array[i];
nTimes = 1;
} else {
if (candidate == array[i])
nTimes++;
else
nTimes--;
}
}
return candidate;
}

分析：这个算法只需要遍历一遍ID数组便可以找出“水王”，时间复杂度O(n)，并且需要的空间复杂度也很低，空间复杂度O(1)。

在这个题目中，有一个计算机科学中很普遍的思想，就是如何把一个问题转化为规模较小的若干个问题。分治、递推和贪心等都是基于这样的思路。在转化过程中，小的问题跟原问题本质上一致。这样，我们可以通过同样的方式将小问题转化为更小的问题。因此，转化过程是很重要的。像上面这个题目，我们保证了问题的解在小问题中仍然具有与原问题相同的性质：水王的ID在ID列表中的次数超过一半。转化本身计算的效率越高，转化之后问题规模缩小得越快，则整体算法的时间复杂度越低。(摘自《编程之美》)

三、拓展问题

随着Tango的发展，管理员发现，“超级水王”没有了。统计结果表明，有3个发帖很多的ID，他们的发帖数目都超过了帖子总数目N的1/4。你能从发帖ID列表中快速找出他们的ID吗？

思路：可以与前一个问题进行类比。前一问题通过每次删除两个不同的ID，剩下的ID列表中““水王”ID出现的次数仍然超过总数的一半，使问题规模降低，类比的：拓展问题我们每次删除4个两两不同的ID，剩下的ID列表中这3个超过总数1/4的ID出现的次数也将仍然超过总数的1/4，也使问题规模降低。接下来就剩下算法实现的问题了，我们需要理清的一问题是这3个ID相互之间并不做比较，他们是一伙的，为此我们需要修改上前一个问题解中if else语句的顺序。

算法C实现：

/**
* @brief find the element that repeats max times in the unsorted array
*
* @param[in,out]  array  unsorted array
* @param[in]      count  array length
* @param[out]     count  three elements value that repeats max times
* (times > 1 / 4 total)
*/
void find_id(TYPE* array, TYPE count, TYPE* candidate)
{
TYPE i;
#define ELEMENT_COUNT 3
/// iniitialize
TYPE nTimes[ELEMENT_COUNT] = {0};
candidate[0] = 0;
candidate[1] = 0;
candidate[2] = 0;
for (i = 0; i < count; i++) {
if (candidate[0] == array[i]) {
nTimes[0]++;
} else if (candidate[1] == array[i]) {
nTimes[1]++;
} else if (candidate[2] == array[i]) {
nTimes[2]++;
} else if (nTimes[0] == 0) {
nTimes[0] = 1;
candidate[0] = array[i];
} else if (nTimes[1] == 0) {
nTimes[1] = 1;
candidate[1] = array[i];
} else if (nTimes[2] == 0) {
nTimes[2] = 1;
candidate[2] = array[i];
} else {
nTimes[0]--;
nTimes[1]--;
nTimes[2]--;
}
}
#undef ELEMENT_COUNT
return;
}

备注：

1.转载请注明出处：http://blog.csdn.net/chensilly8888/

2.全文源码均开源（在UBUNTU + GCC4.8.2下编译并测试通过），可下载或查看：https://github.com/chenxilinsidney/funnycprogram/tree/master/beauty_of_programming/find_id

3.有写错或写漏之处请指正，谢谢！

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航