Elasticsearch分页查询From&Size VS scroll
2018-03-30 19:27
363 查看
对于ES来说,按照一般的查询流程来说,如果我想查询数据:
1 客户端请求发给某个节点
2 节点转发给个个分片,查询每个分片上的前10条
3 结果返回给节点,整合数据,提取前10条
4 返回给请求客户端
这时,你查询的的数据可以获取整个条数,但是返回的只是默认的10条,所以这个时候就需要考虑使用分页查询。
对于数据量,博主在800万条的时候,用From&Size也是没有问题的,但是博主有一个操作需要查询一个大概1亿7千万条的数据,这个时候用From&Size在2千万条的时候就会出错,后来查了一下From&Size在大数据量下性能下降的厉害,导致一些错误出现,所以本博主推荐,能用scroll就用scroll。
下面给出2中使用方式的java代码:
首先呢,需要在java中引入elasticsearch-jar,比如使用maven:
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>2.3.2</version>
</dependency>然后初始化一个client对象:
System.out.println("from size 模式启动!");
Date begin = new Date();
long count = client.prepareCount(INDEX).setTypes(TYPE).execute().actionGet().getCount();
SearchRequestBuilder requestBuilder = client.prepareSearch(INDEX).setTypes(TYPE).setQuery(QueryBuilders.matchAllQuery());
for(int i=0,sum=0; sum<count; i++){
SearchResponse response = requestBuilder.setFrom(i).setSize(50000).execute().actionGet();
sum += response.getHits().hits().length;
System.out.println("总量"+count+" 已经查到"+sum);
}
Date end = new Date();
System.out.println("耗时: "+(end.getTime()-begin.getTime()));下面是scroll分页的执行代码,注意啊!scroll里面的size是相对于每个分片来说的,所以实际返回的数量是:
System.out.println("scroll 模式启动!");
begin = new Date();
SearchResponse scrollResponse = client.prepareSearch(INDEX)
.setSearchType(SearchType.SCAN).setSize(10000).setScroll(TimeValue.timeValueMinutes(1))
.execute().actionGet();
count = scrollResponse.getHits().getTotalHits();//第一次不返回数据
for(int i=0,sum=0; sum<count; i++){
scrollResponse = client.prepareSearchScroll(scrollResponse.getScrollId())
.setScroll(TimeValue.timeValueMinutes(8))
.execute().actionGet();
sum += scrollResponse.getHits().hits().length;
System.out.println("总量"+count+" 已经查到"+sum);
}
end = new Date();
System.out.println("耗时: "+(end.getTime()-begin.getTime()));在这里值得一提的是:ES的CURD操作,如果单条数据大量数据效率一般都比较低,所以要使用bulk操作,例如如下操作: public static void updateHourByScroll(String Type) throws IOException {
System.out.println("scroll 模式启动!");
Date begin = new Date();
SearchResponse scrollResponse = client.prepareSearch(Index).setTypes(TYPE)
1 客户端请求发给某个节点
2 节点转发给个个分片,查询每个分片上的前10条
3 结果返回给节点,整合数据,提取前10条
4 返回给请求客户端
这时,你查询的的数据可以获取整个条数,但是返回的只是默认的10条,所以这个时候就需要考虑使用分页查询。
对于数据量,博主在800万条的时候,用From&Size也是没有问题的,但是博主有一个操作需要查询一个大概1亿7千万条的数据,这个时候用From&Size在2千万条的时候就会出错,后来查了一下From&Size在大数据量下性能下降的厉害,导致一些错误出现,所以本博主推荐,能用scroll就用scroll。
下面给出2中使用方式的java代码:
首先呢,需要在java中引入elasticsearch-jar,比如使用maven:
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>2.3.2</version>
</dependency>然后初始化一个client对象:
private static TransportClient client; private static String INDEX = "index_name"; private static String TYPE = "type_name"; public static TransportClient init(){ Settings settings = ImmutableSettings.settingsBuilder() .put("client.transport.sniff", true) .put("cluster.name", "cluster_name") .build(); client = new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress("localhost",9300)); return client; } public static void main(String[] args) { TransportClient client = init(); //这样就可以使用client执行查询了 }然后就是创建两个查询过程了 ,下面是from-size分页的执行代码:
System.out.println("from size 模式启动!");
Date begin = new Date();
long count = client.prepareCount(INDEX).setTypes(TYPE).execute().actionGet().getCount();
SearchRequestBuilder requestBuilder = client.prepareSearch(INDEX).setTypes(TYPE).setQuery(QueryBuilders.matchAllQuery());
for(int i=0,sum=0; sum<count; i++){
SearchResponse response = requestBuilder.setFrom(i).setSize(50000).execute().actionGet();
sum += response.getHits().hits().length;
System.out.println("总量"+count+" 已经查到"+sum);
}
Date end = new Date();
System.out.println("耗时: "+(end.getTime()-begin.getTime()));下面是scroll分页的执行代码,注意啊!scroll里面的size是相对于每个分片来说的,所以实际返回的数量是:
分片的数量*size
System.out.println("scroll 模式启动!");
begin = new Date();
SearchResponse scrollResponse = client.prepareSearch(INDEX)
.setSearchType(SearchType.SCAN).setSize(10000).setScroll(TimeValue.timeValueMinutes(1))
.execute().actionGet();
count = scrollResponse.getHits().getTotalHits();//第一次不返回数据
for(int i=0,sum=0; sum<count; i++){
scrollResponse = client.prepareSearchScroll(scrollResponse.getScrollId())
.setScroll(TimeValue.timeValueMinutes(8))
.execute().actionGet();
sum += scrollResponse.getHits().hits().length;
System.out.println("总量"+count+" 已经查到"+sum);
}
end = new Date();
System.out.println("耗时: "+(end.getTime()-begin.getTime()));在这里值得一提的是:ES的CURD操作,如果单条数据大量数据效率一般都比较低,所以要使用bulk操作,例如如下操作: public static void updateHourByScroll(String Type) throws IOException {
System.out.println("scroll 模式启动!");
Date begin = new Date();
SearchResponse scrollResponse = client.prepareSearch(Index).setTypes(TYPE)
.setSearchType(SearchType.SCAN).setSize(5000).setScroll(TimeValue.timeValueMinutes(1)) .execute().actionGet();long count = scrollResponse.getHits().getTotalHits();//第一次不返回数据 for(int i=0,sum=0; sum<count; i++){ scrollResponse = client.prepareSearchScroll(scrollResponse.getScrollId()) .setScroll(TimeValue.timeValueMinutes(8)) .execute().actionGet(); sum += scrollResponse.getHits().hits().length; SearchHits searchHits = scrollResponse.getHits(); List<UpdateRequest> list = new ArrayList<UpdateRequest>(); for (SearchHit hit : searchHits) { String id = hit.getId(); Map<String, Object> source = hit.getSource(); Integer year = Integer.valueOf(source.get("Year").toString()); Integer month = Integer.valueOf(source.get("Mon").toString()); Integer day = Integer.valueOf(source.get("Day").toString()); Integer hour = Integer.valueOf(source.get("Hour").toString()); String time = getyear_month_day_hour(year, month, day, hour); System.out.println(time); UpdateRequest uRequest = new UpdateRequest() .index(Index) .type(Type) .id(id) .doc(jsonBuilder().startObject().field("TimeFormat", time).endObject()); list.add(uRequest); } // 批量执行 BulkRequestBuilder bulkRequest = client.prepareBulk(); for (UpdateRequest uprequest : list) { bulkRequest.add(uprequest); } BulkResponse bulkResponse = bulkRequest.execute().actionGet(); if (bulkResponse.hasFailures()) { System.out.println("批量错误!"); } System.out.println("总量" + count + " 已经查到" + sum); } Date end = new Date(); System.out.println("耗时: "+(end.getTime()-begin.getTime())); }
相关文章推荐
- Elasticsearch——分页查询From&Size VS scroll
- Elasticsearch——分页查询From&Size VS scroll
- Elasticsearch——分页查询From&Size VS scroll
- Elasticsearch——分页查询From&Size VS scroll
- Elasticsearch——分页查询From&Size VS scroll
- Elasticsearch分页查询From&Size VS scroll
- from-size VS scroll-scan
- SetScrollSizes( nMapMode, GetDocument( )->GetMyDocSize( ) )中的MSDN解释
- android两对概念辨析:ouchEvent vs. GestureDetector & Srcoller vs. ScrollView
- document.body.scrollTop vs document.documentElement.scrollTop && document.body.scrollHeight vs document.documentElement.scrollHeight
- [UI] Elastic Stack & scrollReveal.js
- Android ApiDemos示例解析(101):Views->Auto Complete->3. Scroll
- 【Algorithms】理论计算机科学 & P vs NP - 问题概述
- vs合并压缩css,js插件——Bundler & Minifier
- Test Design Studio vs QuickTest® Professional
- jquery fontsize:"+=6px"
- Win7(32&64)VS2013配置GDAL环境
- [Sencha ExtJS & Touch] VS Code 的 Sencha 扩展(智能提示,代码导航和集成Build操作等)
- 备忘:编译Apache的configure: error: Size of "void *" is less than size of "long"错误