您的位置:首页 > 其它

利用Sphinx实现实时全文检索

2011-03-16 12:23 375 查看
Sphinx 0.9.9及以前的版本,原生不支持实时索引,一般的做法是通过主索引+增量索引的方式来实现“准实时”索引,最新的1.10.1(trunk中,尚未发布)终于支持real-time index,查看SVN中文档,我们很容易利用Sphinx搭建一个按需索引(on demand index)的全文检索系统。

参考文章:http://filiptepper.com/2010/05/27/real-time-indexing-and-searching-with-sphinx-1-10-1-dev.html

首先,从sphinxsearch的SVN下载最新的代码,编译安装:

svn checkout http://sphinxsearch.googlecode.com/svn/trunk sphinx
cd sphinx/
./configure --prefix=/path/to/sphinx
make
make install


编译没问题的话,在sphinx安装目录下的etc,建立sphinx.conf的配置文件,记得一定指定中文编码方面的配置搜索,否则中文会有问题:

index rt {
# 指定索引类型为real-time index
type = rt
# 指定utf-8编码
charset_type  = utf-8
# 指定utf-8的编码表
charset_table  = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
# 一元分词
ngram_len = 1
# 需要分词的字符
ngram_chars   = U+3000..U+2FA1F
# 索引文件保存地址
path = /path/to/sphinx/data/rt
# 索引列
rt_field = message
# 索引属性
rt_attr_uint = message_id
}

searchd {
log = /path/to/sphinx/log/searchd.log
query_log = /path/to/sphinx/log/query.log
pid_file = /path/to/sphinx/log/searchd.pid
workers = threads
# sphinx模拟mysql接口,不需要真正的mysql,mysql41表示支持mysql4.1~mysql5.1协议
listen = 127.0.0.1:9527:mysql41
}


启动sphinx服务:

/path/to/sphinx/bin/searchd --config /path/to/sphinx/etc/sphinx.conf


插入几条数据看看:

ubuntu:chaoqun ~:mysql -h127.0.0.1 -P9527
Welcome to the MySQL monitor.  Commands end with ; or /g.
Your MySQL connection id is 1
Server version: 1.10.1-dev (r2351)

Type 'help;' or '/h' for help. Type '/c' to clear the current input statement.

mysql> INSERT INTO rt VALUES (1, 'this message has a body', 1);
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO rt VALUES (2, '测试中文OK', 2);
Query OK, 1 row affected (0.00 sec)

mysql>


测试全文检索:

mysql> SELECT * FROM rt WHERE MATCH('message');
+------+--------+------------+
| id   | weight | message_id |
+------+--------+------------+
|    1 |   1643 |          1 |
+------+--------+------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM rt WHERE MATCH('OK');
+------+--------+------------+
| id   | weight | message_id |
+------+--------+------------+
|    2 |   1643 |          2 |
+------+--------+------------+
1 row in set (0.01 sec)

mysql> SELECT * FROM rt WHERE MATCH('中');
+------+--------+------------+
| id   | weight | message_id |
+------+--------+------------+
|    2 |   1643 |          2 |
+------+--------+------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM rt WHERE MATCH('我');
Empty set (0.00 sec)

mysql>


简单方便,码完收工。

【搜索引擎系统知识

http://www.kuqin.com/searchengine/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: