hbase scan查询代码分析
2014-01-15 10:05
225 查看
Scan查询过程
步骤1. HTable.getScanner()
关掉之前在server端打开的Scanner,防止server端过多的资源占用
client端:ScannerCallable.call() -> close(scannerId)
server端:HRegionServer.close(scannerId)
根据localStartKey在指定region上打开scanner
client端:ScannerCallable.call() -> openScanner(regionName,scan)
server端:
创建RegionScanner
把scanner加入server的map集合
为新生成的scanner创建Lease
步骤2. ResultScanner.next()
从client端缓存中或者server端获取kv
client端:cache.poll() 或者 next(scannerId, caching)
server端:HRegionServer.next(scannerId,nbRows)
RegionScannerImpl.nextRaw(List outResults, int limit, String metric)
Scanner的种类
Server端:InternalScanner & KeyValueScanner
Client端:ResultScanner
其他(HFileScanner、MetaScanner)
1. InternalScanner
是server端内部较高层次的scanner抽象,实现类:
RegionScannerImpl
StoreScanner
KeyValueHeap
接口包括:
next(),返回KeyValue List
close(),关闭scanner并释放server段资源
2. KeyValueScanner
是底层的scanner,用来获取KeyValue,实现类有:
StoreScanner
StoreFileScanner
KeyValueHeap
NonLazyKeyValueScanner 每次都会做doRealSeek(forward)?reseek(kv):seek(kv);
MemStoreScanner
StoreScanner
KeyValueHeap
常用接口:
peek()
next()
seek() 定位到指定的KeyValue
reseek() 从当前scanner位置之后的定位到KeyValue
requestSeek()
KeyValueHeap
在Region层面用来组合访问多个store,在Store层面用来组合访问memstore和storefiles
PriorityQueue存储Scanner,KVScannerComparator对scanner进行排序,先比较peak的kv,再比较SequenceID
MemStoreScanner = Long.MAX_VALUE
StoreFileScanner = SequenceID
StoreScanner = 0
pollRealKV()从PriorityQueue中寻找可以做real seek的scanner
ScanQueryMatcher
在查找KV过程中确定是否包含当前KV,以及接下来如何操作
StoreScanner.getScanners(matcher) -> StoreFileScanner
MatchCode的十种状态
INCLUDE
INCLUDE_AND_SEEK_NEXT_ROW : moreRowsMayExistAfter(),getKeyForNextRow()
INCLUDE_AND_SEEK_NEXT_COL : getKeyForNextColumn()
DONE
DONE_SCAN
SEEK_NEXT_ROW : moreRowsMayExistAfter()
SEEK_NEXT_COL : getKeyForNextColumn()
SKIP : heap.next()
SEEK_NEXT_USING_HINT : getNextKeyHint()
NEXT(没用到): Do not include, jump to next StoreFile or memstore (in time order)
public MatchCode match(KeyValue kv)
比较是否是相同row
比较版本是否过期
检查是否被删除
检查是否在time range
Filters过滤
ColumnTracker检查
ColumnTracker
ScanWildcardColumnTracker
ExplicitColumnTracker
DeleteTracker
ScanDeleteTracker
针对删除的查询策略
retainDeletesInOutput
keepDeletedCells=true,不会再做删除检查
seePastDeleteMarkers
步骤1. HTable.getScanner()
关掉之前在server端打开的Scanner,防止server端过多的资源占用
client端:ScannerCallable.call() -> close(scannerId)
server端:HRegionServer.close(scannerId)
根据localStartKey在指定region上打开scanner
client端:ScannerCallable.call() -> openScanner(regionName,scan)
server端:
创建RegionScanner
把scanner加入server的map集合
为新生成的scanner创建Lease
步骤2. ResultScanner.next()
从client端缓存中或者server端获取kv
client端:cache.poll() 或者 next(scannerId, caching)
server端:HRegionServer.next(scannerId,nbRows)
RegionScannerImpl.nextRaw(List outResults, int limit, String metric)
Scanner的种类
Server端:InternalScanner & KeyValueScanner
Client端:ResultScanner
其他(HFileScanner、MetaScanner)
1. InternalScanner
是server端内部较高层次的scanner抽象,实现类:
RegionScannerImpl
StoreScanner
KeyValueHeap
接口包括:
next(),返回KeyValue List
close(),关闭scanner并释放server段资源
2. KeyValueScanner
是底层的scanner,用来获取KeyValue,实现类有:
StoreScanner
StoreFileScanner
KeyValueHeap
NonLazyKeyValueScanner 每次都会做doRealSeek(forward)?reseek(kv):seek(kv);
MemStoreScanner
StoreScanner
KeyValueHeap
常用接口:
peek()
next()
seek() 定位到指定的KeyValue
reseek() 从当前scanner位置之后的定位到KeyValue
requestSeek()
KeyValueHeap
在Region层面用来组合访问多个store,在Store层面用来组合访问memstore和storefiles
PriorityQueue存储Scanner,KVScannerComparator对scanner进行排序,先比较peak的kv,再比较SequenceID
MemStoreScanner = Long.MAX_VALUE
StoreFileScanner = SequenceID
StoreScanner = 0
pollRealKV()从PriorityQueue中寻找可以做real seek的scanner
ScanQueryMatcher
在查找KV过程中确定是否包含当前KV,以及接下来如何操作
StoreScanner.getScanners(matcher) -> StoreFileScanner
MatchCode的十种状态
INCLUDE
INCLUDE_AND_SEEK_NEXT_ROW : moreRowsMayExistAfter(),getKeyForNextRow()
INCLUDE_AND_SEEK_NEXT_COL : getKeyForNextColumn()
DONE
DONE_SCAN
SEEK_NEXT_ROW : moreRowsMayExistAfter()
SEEK_NEXT_COL : getKeyForNextColumn()
SKIP : heap.next()
SEEK_NEXT_USING_HINT : getNextKeyHint()
NEXT(没用到): Do not include, jump to next StoreFile or memstore (in time order)
public MatchCode match(KeyValue kv)
比较是否是相同row
比较版本是否过期
检查是否被删除
检查是否在time range
Filters过滤
ColumnTracker检查
ColumnTracker
ScanWildcardColumnTracker
ExplicitColumnTracker
DeleteTracker
ScanDeleteTracker
针对删除的查询策略
retainDeletesInOutput
keepDeletedCells=true,不会再做删除检查
seePastDeleteMarkers
相关文章推荐
- sql代码书写规范 及子查询分析
- Spark SQL模块代码分析(查询语句到逻辑查询计划树的过程)
- 蓝屏代码查询及代码分析
- python自动化工具日志查询分析脚本代码实现
- PostgreSQL代码分析,查询优化部分,canonicalize_qual
- 【代码】PHP 分析查询MySQL大量数据的内存占用情况
- 【转】Informix数据表结构分析资料整理之约束查询代码
- PostgreSQL代码分析,查询优化部分,pull_ands()和pull_ors()
- Python数据分析之如何利用pandas查询数据示例代码
- PostgreSQL代码分析,查询优化部分,process_duplicate_ors
- PostgreSQL代码分析,查询优化部分,pull_ands()和pull_ors()
- 代码分析错误查询SQL
- Informix数据表结构分析资料整理之约束查询代码
- Informix数据表结构分析资料整理之约束查询代码
- python自动化工具日志查询分析脚本代码实现
- Informix数据表结构分析资料整理之约束查询代码
- PostgreSQL代码分析,查询优化部分,canonicalize_qual
- Informix数据表结构分析资料整理之约束查询代码
- Informix数据表结构分析资料整理之约束查询代码
- CNI bridge 插件实现代码分析