您的位置：首页 > 理论基础 > 数据结构算法

【数据结构-trie树】trie数实现单词查询和单词统计

2012-09-07 10:48 411 查看

参考内容：

1. 这位童鞋的文章 http://blog.csdn.net/zhulei632/article/details/6704496
2. 严蔚敏 -数据结构

1.键树的定义：

键树又叫“数字查找树”。深度>=2 . 树中的每个节点一般不是直接包含关键字，而是包含组成关键字的符号（当然叶子节点除外，叶子节点可能包含整个单词以及词频，非叶节点也可包含单词和词频）。根据存储结构的不同，又分为双链树和多重链表树。或者就是常说的“Trie树”，取自检索“retrieve”中间的四个单词。因此也被称为检索树。Trie树的每个节点含有d个指针域（d为关键字的基数，如果是字母，那么基数为26 即a-z .如果是数字，那么基数是10, 即0-9）。

如图所示，一个Trie树的结构如下：

如果在trie树的node节点添加新的域 count,记录已有的单词总数。那么，Trie树除了实现单词查询之外，还可以实现单词频度统计。

如，我们定义的Trie树的节点结构如下：

typedef struct Trie_node{
    int  count;
    struct Trie_node *next[26];

}TrieNode, *Trie;

其中next数组指向下一层次节点。

有了这个结构。Trie树的实现就有了一个基础。

2。Trie树的建立

建立一个Trie树的过程就是不断添加新的单词的过程。由根节点向下扫描，如果不存在相应的节点则创建之。否则进入下一个层次，直到单词添加完毕。

根据该算法，代码不难写出：

//创建新的节点
TrieNode* createTrieNode(){
    TrieNode* root = (TrieNode*)malloc(sizeof(TrieNode));
    root->count = 0;
    memset(root->next, 0, sizeof(root->next));
    return root;
}
//插入单词。
void trie_insert(Trie root, char* word){
    TrieNode* node = root;
    char *p = word;
    while(*p)
    {
        if(NULL == node->next[*p-'a'])
        {
            node->next[*p-'a'] = createTrieNode();
        }
        node = node->next[*p-'a'];
        p++;
    }
    node->count += 1;
}

3 .Trie树的检索

Trie树中检索的过程是走一条从跟节点开始到叶子节点的路径（不一定走到叶子节点，取决于你的Trie树的实现，如果规定每个单词以$结束，那么检索成功的话一定走到叶子节点）：如下图示意,检索单词bat的路径用红色标出。

实现检索的代码：

int trie_search(Trie root, char* word){
    TrieNode* node = root;
    char *p = word;
    while(*p && node!=NULL)
    {
        node = node->next[*p-'a'];
        p++;
    }
    return (node != NULL && node->count > 0);
}

4. 利用Trie树实现词频统计

如上文所述：在Trie的node节点中添加count域后，可以统计单词出现的次数。统计的方法就是在插入单词的时候，令相应的count域加1（初始化为0）。代码见Trie插入部分。

完整的测试代码如下：

#include <stdio.h>
#include <stdlib.h>
#include <memory.h>

typedef struct Trie_node{
    int  count;
    struct Trie_node *next[26];

}TrieNode, *Trie;

TrieNode* createTrieNode(){
    TrieNode* root = (TrieNode*)malloc(sizeof(TrieNode));
    root->count = 0;
    memset(root->next, 0, sizeof(root->next));
    return root;
}

void trie_insert(Trie root, char* word){
    TrieNode* node = root;
    char *p = word;
    while(*p){
        if(NULL == node->next[*p-'a']){
            node->next[*p-'a'] = createTrieNode();
        }
        node = node->next[*p-'a'];
        p++;
    }
    node->count += 1;
}

int trie_search(Trie root, char* word){
    TrieNode* node = root;
    char *p = word;
    while(*p && node!=NULL){
        node = node->next[*p-'a'];
        p++;
    }
    return (node != NULL && node->count > 0);
}

int trie_word_count(Trie root, char* word){
	TrieNode * node = root;
	char *p = word;
	while(*p &&node != NULL){
		node = node->next[*p-'a'];
		p++;
	}
	return node->count;
}

int main(){
    Trie t = createTrieNode();
    char word[][10] = {"test","study","open","show","shit","work","work","test","tea","word","area","word","test","test","test"};
	for(int i = 0;i < 15;i++ ){
		trie_insert(t,word[i]);
	}
	for(int i = 0;i < 15;i++ ){
		printf("the word %s appears %d times in the trie-tree\n",word[i],trie_word_count(t,word[i]));
	}
	char s[10] = "testit";
	printf("the word %s exist? %d \n",s,trie_search(t,s));
    return 0;
}

运行结果如下：

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航