您的位置：首页 > 数据库

对PostgreSQL 的 hash join 的原理的学习

2017-08-09 15:33 316 查看

开始

PostgreSQL 名人 momjian 的文章指出了其pseudo code：

for (j = 0; j < length(inner); j++)
　　hash_key = hash(inner[j]);
　　append(hash_store[hash_key], inner[j]);
for (i = 0; i < length(outer); i++)
　　hash_key = hash(outer[i]);
　　for (j = 0; j < length(hash_store[hash_key]); j++)
　　　　if (outer[i] == hash_store[hash_key][j])
　　　　　　output(outer[i], inner[j]);

为了看的更加清楚一点，加上自己的注释：

//利用 inner 表， 来构造 hash 表(放在内存里)
for (j = 0; j < length(inner); j++)
{
hash_key = hash(inner[j]);
append(hash_store[hash_key], inner[j]);
}

//对 outer 表的每一个元素， 进行遍历
for (i = 0; i < length(outer); i++)
{
//拿到 outer 表中的  某个元素， 进行 hash运算， 得到其 hash_key 值
hash_key = hash(outer[i]);

//用上面刚得到的 hash_key值， 来 对 hash 表进行 探测（假定hash表中有此key 值）
//采用 length (hash_store[hash_Key])  是因为，hash算法构造完hash 表后，有可能出现一个key值处有多个元素的情况。
//例如：  hash_key 100 ，对应 a,c, e； 而  hash_key 200 ， 对应 d;  hash_key 300， 对应 f;
//也就是说， 如下的遍历，其实是对 拥有相同 的 （此处是上面刚运算的，特定的）hash_key 值的各个元素的遍历

for (j = 0; j < length(hash_store[hash_key]); j++)
{
//如果找到了匹配值，则输出一行结果
if (outer[i] == hash_store[hash_key][j])
output(outer[i], inner[j]);
}
}

[作者：技术者高健@博客园 mail: luckyjackgao@gmail.com ]

实践一下：

postgres=# \d employee
Table "public.employee"
Column |         Type          | Modifiers
--------+-----------------------+-----------
id     | integer               |
name   | character varying(20) |
deptno | integer               |
age    | integer               |
Indexes:
"idx_id_dept" btree (id, deptno)

postgres=# \d deptment
Table "public.deptment"
Column  |         Type          | Modifiers
----------+-----------------------+-----------
deptno   | integer               |
deptname | character varying(20) |

postgres=#

postgres=# select count(*) from employee;
count
-------
1000
(1 row)

postgres=# select count(*) from deptment;
count
-------
102
(1 row)

postgres=#

执行计划：

postgres=# explain select a.name, b.deptname from employee a, deptment b where a.deptno=b.deptno;
QUERY PLAN
-------------------------------------------------------------------------
Hash Join  (cost=3.29..34.05 rows=1000 width=14)
Hash Cond: (a.deptno = b.deptno)
->  Seq Scan on employee a  (cost=0.00..17.00 rows=1000 width=10)
->  Hash  (cost=2.02..2.02 rows=102 width=12)
->  Seq Scan on deptment b  (cost=0.00..2.02 rows=102 width=12)
(5 rows)

postgres=#

[作者：技术者高健@博客园 mail: luckyjackgao@gmail.com ]

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航