您的位置：首页 > 编程语言 > Python开发

搜索引擎–elasticsearch python客户端pyes 建立索引和搜索

2013-10-02 00:34 726 查看

主机环境:Ubuntu 13.04Python版本：2.7.4转载请标明：http://blog.geekcome.com/archives/118 官方站点：http://www.elasticsearch.com/中文站点：http://es-cn.medcl.net/下面一段介绍引用自中文站点：好吧，假如你建了一个web站点或者是一个应用程序，你就可能会需要添加搜索功能（因为这太有必要了），而事实上让搜索跑起来是有难度的，我们不仅想要搜索的速度快，而且还要安装方便（最好是无痛安装），另外模式定义要非常自由（schema free），可以通过HTTP以JSON格式的数据来进行索引，服务器必须是一直可用的（HA高可用，这个不能丢），从一台机器能够扩展到成千上万台，然后搜索必须是实时的（real-time），使用起来一定要简单、支持多租户，我们需要一整套的解决方案，并且是为云构建的。
“让搜索更简单”，这是我们的宣言，“并且要酷，像盆景一样”
elasticsearch 的目标是解决上面的所有问题以及更多。她是开源的（Apache2协议），分布式的，RESTful的，构建在Apache Lucene之上的的搜索引擎.1 、分布式服务器的安装：首先下载http://www.elasticsearch.org/download/，选择合适的版本安装，这里直接下载了适合ubuntu的DEB包，下载完成后直接dpkg命令安装。安装完成后可以通过sudo service elasticsearch start来启动服务。2、安装pyes客户端使用命令

1	pip install pyes

安装elasticsearch的python的组件。3、安装pyes的中文分词组件直接下载https://github.com/medcl/elasticsearch-rtf/blob/master/elasticsearch/plugins/analysis-ik/elasticsearch-analysis-ik-1.2.2.jar中文分词组件然后移动的elasticsearch的安装目录/usr/share/elasticsearch/analysis-ik/,修改配置文件/etc/elasticsearch/elasticsearch.yml设置插件的路径path.plugins:/usr/share/elasticsearch/plugins并添加分词组建配置

index:

analysis:

analyzer:

ik:

5	alias:[ik_analyzer]

6	type:org.elasticsearch.index.analysis.IkAnalyzerProvider

最后下载IK分词使用的词典cd /etc/elasticsearch
wget http://github.com/downloads/medcl/elasticsearch-analysis-ik/ik.zip –no-check-certificate
unzip ik.zip
rm ik.zip
重启elasticsearch服务即可。

4、建立索引

01	#!/usr/bin/env python

02	#-- coding:utf-8--

03	import os

04	import sys

05	from pyes import *

07	INDEX_NAME = 'txtfiles'

09	class IndexFiles( object ):

10	def __init__( self ,root):

11	conn = ES( '127.0.0.1:9200' , timeout = 3.5 ) #连接ES

try

13	conn.delete_index(INDEX_NAME)

#pass

except

pass

17	conn.create_index(INDEX_NAME) #新建一个索引

#定义索引存储结构

20	mapping = {u 'content' :{ 'boost' : 1.0 ,

21	'index' : 'analyzed' ,

22	'store' : 'yes' ,

23	'type' :u 'string' ,

24	"indexAnalyzer" : "ik" ,

25	"searchAnalyzer" : "ik" ,

26	"term_vector" : "with_positions_offsets" },

27	u 'name' :{ 'boost' : 1.0 ,

28	'index' : 'analyzed' ,

29	'store' : 'yes' ,

30	'type' :u 'string' ,

31	"indexAnalyzer" : "ik" ,

32	"searchAnalyzer" : "ik" ,

33	"term_vector" : "with_positions_offsets" },

34	u 'dirpath' :{ 'boost' : 1.0 ,

35	'index' : 'analyzed' ,

36	'store' : 'yes' ,

37	'type' :u 'string' ,

38	"indexAnalyzer" : "ik" ,

39	"searchAnalyzer" : "ik" ,

40	"term_vector" : "with_positions_offsets" },

43	conn.put_mapping( "test-type" , { 'properties' :mapping}, [INDEX_NAME]) #定义test-type

45	self .addIndex(conn,root)

47	conn.default_indices = [INDEX_NAME] #设置默认的索引

48	conn.refresh() #刷新以获得最新插入的文档

50	def addIndex( self ,conn,root):

51	print root

52	for root, dirnames, filenames in os.walk(root):

53	for filename in filenames:

54	if not filename.endswith( '.txt' ):

continue

56	print "Indexing file " , filename

try

58	path = os.path.join(root,filename)

59	file = open (path)

60	contents = unicode ( file .read(), 'utf-8' )

61	file .close()

62	if len (contents) > 0 :

63	conn.index({ 'name' :filename, 'dirpath' :root, 'content' :contents},INDEX_NAME, 'test-type' )

else

65	print 'no contentsin file %s' ,path

66	except Exception,e:

print

69	if __name__ = = '__main__' :

70	IndexFiles( './txtfiles' )

5、搜索并高亮显示view source

01	#!/usr/bin/env python

02	#-- coding:utf-8 --

04	import os

05	import sys

06	from pyes import *

08	conn = ES( '127.0.0.1:9200' , timeout = 3.5 ) #连接ES

09	sq = StringQuery(u '世界末日' , 'content' )

10	h = HighLighter([ '<b>' ], [ '</b>' ], fragment_size = 20 )

12	s = Search(sq,highlight = h)

13	s.add_highlight( "content" )

14	results = conn.search(s,indices = 'txtfiles' ,doc_types = 'test-type' )

list

[]

17	for r in results:

18	if (r._meta.highlight.has_key( "content" )):

19	r[ 'content' ] = r._meta.highlight[u "content" ][ 0 ]

20	list .append(r)

21	print r[ 'content' ]

22	print len ( list )

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航