您的位置:首页 > 其它

Elasticsearch 5.4.3 ik分词、pinyin分词配置

2017-07-31 00:00 381 查看
摘要: ik中文分词,pinyin分词同时生效配置

ik分词器安装

下载ik分词器:https://github.com/medcl/elasticsearch-analysis-ik [ik与Elasticsearch版本一致]

把elasticsearch-analysis-ik-5.4.3.zip,解压后的文件拷贝到elasticsearch-5.4.3/plugins/。

mkdir /opt/ik
unzip elasticsearch-analysis-ik-5.4.3.zip -d /opt/ik
mv /opt/ik {ES_HOME}/plugins

重启es,ik分词器安装完成

pinyin分词器安装

pinyin分词器安装,相对复杂。要自己进行源码的编译打包。

下载源码、编译源码:

git clone https://github.com/medcl/elasticsearch-analysis-pinyin.git cd elasticsearch-analysis-pinyin
mvn clean install -Dmaven.test.skip

安装pinyin分词器:

cd target/releases
unzip elasticsearch-analysis-pinyin-5.5.1.zip
mv elasticsearch elasticsearch-analysis-pinyin
mv elasticsearch-analysis-pinyin {ES_HOME}/plugins

重启es,pinyin分词器安装完成

创建索引[index]

创建索引,并设置index分析器相关属性:

curl -XPUT "http://localhost:9200/medcl/" -d'
{
"index": {
"analysis": {
"analyzer": {
"ik_pinyin_analyzer": {
"type": "custom",
"tokenizer": "ik_smart",
"filter": ["my_pinyin", "word_delimiter"]
}
},
"filter": {
"my_pinyin": {
"type": "pinyin",
"first_letter": "prefix",
"padding_char": " "
}
}
}
}
}'


创建类型[mapping]

创建一个type并设置mapping:

curl -XPOST http://localhost:9200/medcl/folks/_mapping -d'
{
"folks": {
"properties": {
"name": {
"type": "keyword",
"fields": {
"pinyin": {
"type": "text",
"store": "no",
"term_vector": "with_positions_offsets",
"analyzer": "ik_pinyin_analyzer",
"boost": 10
}
}
}
}
}
}'


创建文档

创建两份文档

curl -XPOST http://localhost:9200/medcl/folks/andy -d'{"name":"刘德华"}'
curl -XPOST http://localhost:9200/medcl/folks/tina -d'{"name":"中华人民共和国国歌"}'


测试pinyin分词

下面四个查询请求都能查询出“刘德华”

curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:liu"
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:de"
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:hua"
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:ldh"

查询结果示例:

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.85669875,
"hits": [
{
"_index": "medcl",
"_type"
8000
: "folks",
"_id": "andy",
"_score": 0.85669875,
"_source": {
"name": "刘德华"
}
}
]
}
}


测试ik分词

发送请求:

curl -XPOST "http://172.30.250.164:9200/medcl/_search?pretty" -d'
{
"query": {
"match": {
"name.pinyin": "国歌"
}
},
"highlight": {
"fields": {
"name.pinyin": {}
}
}
}'

返回结果:

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 9.507006,
"hits" : [
{
"_index" : "medcl",
"_type" : "folks",
"_id" : "tina",
"_score" : 9.507006,
"_source" : {
"name" : "中华人民共和国国歌"
},
"highlight" : {
"name.pinyin" : [
"<em>中华人民共和国</em><em>国歌</em>"
]
}
}
]
}
}


测试ik+pin分词

发送请求:

curl -XPOST "http://localhost:9200/medcl/_search?pretty" -d'
{
"query": {
"match": {
"name.pinyin": "zhonghua"
}
},
"highlight": {
"fields": {
"name.pinyin": {}
}
}
}'

返回结果:

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 6.188843,
"hits" : [
{
"_index" : "medcl",
"_type" : "folks",
"_id" : "tina",
"_score" : 6.188843,
"_source" : {
"name" : "中华人民共和国国歌"
},
"highlight" : {
"name.pinyin" : [
"<em>中华人民共和国</em>国歌"
]
}
},
{
"_index" : "medcl",
"_type" : "folks",
"_id" : "3",
"_score" : 3.0490103,
"_source" : {
"@timestamp" : "2017-07-13T06:42:00.203Z",
"last_modify_time" : "2017-07-13T02:52:53.000Z",
"name" : "可能猜到可以使用iterator来删除循环中的元素",
"@version" : "1",
"id" : 3,
"type" : "jdbc"
},
"highlight" : {
"name.pinyin" : [
"可能猜到可以使用iterator来删除循<em>环中</em>的元素"
]
}
},
{
"_index" : "medcl",
"_type" : "folks",
"_id" : "andy",
"_score" : 0.22534128,
"_source" : {
"name" : "刘德华"
},
"highlight" : {
"name.pinyin" : [
"<em>刘德华</em>"
]
}
}
]
}
}

Ps:由于测试库多加几个文档,可以忽略返回结果中的,第二条结果。在该博客中并没有加入。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  Elasticsearch es ik pinyin