您的位置:首页 > 其它

elasticsearch-dsl 2.0.0 介绍

2016-03-02 20:16 375 查看

elasticsearch-dsl 2.0.0  by Honza Král  原文链接  翻译:AbnerGong

Elasticsearch DSL是一个高级库,为了对Elasticsearch进行辅助书写和运行的。它建在官方低级客户端(elasticsearch-py)之上。

它提供了书写和操纵查询的非常方便和流畅的方式。而且它保持与Elasticsearch JSON DSL非常接近的属于和结构。它从Python揭开了整个DSL,通过定义类或者类似查询集的方式。

它也提供了可选的对文档的包装方式:定义mapping,取回和保存文档,包装文档数据用用户定义的类。

要用其它的Elasticsearch APIs(比如cluster health)只需要用根本客户端即可(underlying client)

适应性(Compatibility)

搜索样例(Search Example)

我们先直接用dict写一个典型的搜索请求:

(译者注:下文中的filtered在elasticsearch2.0版本以后已经被bool取代)

from elasticsearch import Elasticsearch
client = Elasticsearch()

response = client.search(
index="my-index",
body={
"query": {
"filtered": {
"query": {
"bool": {
"must": [{"match": {"title": "python"}}],
"must_not": [{"match": {"description": "beta"}}]
}
},
"filter": {"term": {"category": "search"}}
}
},
"aggs" : {
"per_tag": {
"terms": {"field": "tags"},
"aggs": {
"max_lines": {"max": {"field": "lines"}}
}
}
}
}
)

for hit in response['hits']['hits']:
print(hit['_score'], hit['_source']['title'])

for tag in response['aggregations']['per_tag']['buckets']:
print(tag['key'], tag['max_lines']['value'])


用这个方法的问题在于它非常冗长,还可能会有错误嵌套的语法错误,很难修改(比如加入另一个filter)而且绝对写起来很无趣

让我们用Python DSL重写一下这个样例:

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q

client = Elasticsearch()

s = Search(using=client, index="my-index") \
.filter("term", category="search") \
.query("match", title="python")   \
.query(~Q("match", description="beta"))

s.aggs.bucket('per_tag', 'terms', field='tags') \
.metric('max_lines', 'max', field='lines')

response = s.execute()

for hit in response:
print(hit.meta.score, hit.title)

for tag in response.aggregations.per_tag.buckets:
print(tag.key, tag.max_lines.value)


正如你所看到的,这个库处理了(took care of):

- 通过名称(eq. “match”)创建合适的
Query
对象

- 将一些查询组到一个
bool
查询中

- 因为
.filter()
被使用而创建一个
filtered
查询

- 提供对返回结果数据的很方便的访问

- 没有用到弯曲或竖直的括号(即大括号或中括号)

持续性样例(Persistence Example)

from datetime import datetime
from elasticsearch_dsl import DocType, String, Date, Integer
from elasticsearch_dsl.connections import connections

# Define a default Elasticsearch client
connections.create_connection(hosts=['localhost'])

class Article(DocType):
title = String(analyzer='snowball', fields={'raw': String(index='not_analyzed')})
body = String(analyzer='snowball')
tags = String(index='not_analyzed')
published_from = Date()
lines = Integer()

class Meta:
index = 'blog'

def save(self, ** kwargs):
self.lines = len(self.body.split())
return super(Article, self).save(** kwargs)

def is_published(self):
return datetime.now() > self.published_from

# create the mappings in elasticsearch
Article.init()

# create and save and article
article = Article(meta={'id': 42}, title='Hello world!', tags=['test'])
article.body = ''' looong text '''
article.published_from = datetime.now()
article.save()

article = Article.get(id=42)
print(article.is_published())

# Display cluster health
print(connections.get_connection().cluster.health())


在这个例子你能看到:

提供一个默认连接

用mapping配置定义一些域

设置索引名

定义自定义的方法

重写(override)内置的
.save()
方法来hook into the 持续生命周期

取回并保存对象到Elasticsearch中

访问基本客户端for other APIs

你可以在文档的persistence章节查看更多内容

从elasticsearch-py迁移

你不用非得转换你的整个应用为了获得Python DSL的好处,你可以逐渐地,通过先从你已经存在的dict创建一个search对象,用API更改它并序列化回dict:

body = {...} # insert complicated query here

# Convert to Search object
s = Search.from_dict(body)

# Add some filters, aggregations, queries, ...
s.filter("term", tags="python")

# Convert back to dict to plug back into existing code
body = s.to_dict()


官方文档 Documentation

https://elasticsearch-dsl.readthedocs.org/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: