CitusDB, PostgreSQLs Use Hadoop Distribute Query - 4 : Query Trace 2
2015-10-14 22:45
197 查看
![](http://postgres.cn/images/pg2015_conf.png)
Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大,国内顶级PostgreSQL数据库专家将悉数到场,并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:
Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
Postgres-XL的项目发起人Mason Sharp
pgpool的作者石井达夫(Tatsuo Ishii)
PG-Strom的作者海外浩平(Kaigai Kohei)
Greenplum研发总监姚延栋
周正中(德哥), PostgreSQL中国用户会创始人之一
汪洋,平安科技数据库技术部经理
……
2015年度PG大象会报名地址:http://postgres2015.eventdove.com/ PostgreSQL中国社区: http://postgres.cn/ PostgreSQL专业1群: 3336901(已满) PostgreSQL专业2群: 100910388 PostgreSQL专业3群: 150657323 |
![]() |
http://blog.163.com/digoal@126/blog/static/163877040201322205436514/
4. # 大表和小表的关联查询
大表指的是shard数目大于等于large_table_shard_count的表
CitusDB 2.0还不允许大表和大表的关联查询.
小表指的是shard数目小于large_table_shard_count的表, 小表和大表JOIN时, 小表的数据会先通过worker_fetch_regular_table拷贝到各个相关的worker节点.
然后在worker本地进行JOIN, 完了把结果COPY到master节点, master节点再对COPY过来的临时表进行结果合并.
postgres=# show large_table_shard_count; large_table_shard_count ------------------------- 4(1 row)postgres=# select * from pg_dist_shard; logicalrelid | shardid | shardstorage | shardalias | shardminvalue | shardmaxvalue --------------+----------------------+--------------+------------+---------------+--------------- 16403 | -1221512698612183530 | f | | 1999-01-01 | 1999-05-13 16403 | 2698407172708592541 | f | | 1999-05-13 | 1999-09-11 16403 | 631600631725577683 | f | | 1998-09-06 | 1998-12-31 16403 | -2048097437225231483 | f | | 1999-09-11 | 1999-12-31 16403 | 3598423008504059948 | f | | 1970-12-30 | 1998-09-06(5 rows)
# 新建一个小表
postgres=# CREATE TABLE customer_reviews_small ( customer_id TEXT not null, review_date DATE not null, review_rating INTEGER not null, review_votes INTEGER, review_helpful_votes INTEGER, product_id CHAR(10) not null, product_title TEXT not null, product_sales_rank BIGINT, product_group TEXT, product_category TEXT, product_subcategory TEXT, similar_product_ids CHAR(10)[])DISTRIBUTE BY APPEND (review_date);postgres=# CREATE INDEX customer_id_index ON customer_reviews_small (customer_id);CREATE INDEX
# 从任一worker节点连到master节点, 导入数据. (注意文件要放在worker节点, 这个和COPY是不一样的, COPY要求文件是和数据库服务器在一起的, 以为它调用的是服务端的file接口)
[root@db-172-16-3-150 Documentation]# scp /data06/customer_reviews_1998.csv 172.16.3.33:/home/citusdb/ citusdb@db-172-16-3-33-> psql -h 172.16.3.150 -p 9900 -U digoal postgrespsql (9.2.1)Type "help" for help.postgres=# \STAGE customer_reviews_small FROM '/home/citusdb/customer_reviews_1998.csv' (FORMAT CSV)postgres=# select * from pg_dist_shard; logicalrelid | shardid | shardstorage | shardalias | shardminvalue | shardmaxvalue --------------+----------------------+--------------+------------+---------------+--------------- 16403 | -1221512698612183530 | f | | 1999-01-01 | 1999-05-13 16403 | 2698407172708592541 | f | | 1999-05-13 | 1999-09-11 16403 | 631600631725577683 | f | | 1998-09-06 | 1998-12-31 16403 | -2048097437225231483 | f | | 1999-09-11 | 1999-12-31 16403 | 3598423008504059948 | f | | 1970-12-30 | 1998-09-06 16604 | 102017 | t | | 1970-12-30 | 1998-12-31(6 rows)
现在有两个distributed表了, 一个有5个shard, 另一个只有1个shard.
postgres=# select * from customer_reviews t1, customer_reviews_small t2 where t1.customer_id=t2.customer_id limit 1; customer_id | review_date | review_rating | review_votes | review_helpful_votes | product_id | product_title | product_sales_rank | product_group | product_category | product_subcategory | similar_product_ids | customer_id | review_date | review_rating | review_votes | review_helpful_votes | product_id | product_title | product_sales_rank | product_group | product_category | product_subcategory | similar_product_ids ---------------+-------------+---------------+--------------+----------------------+------------+-------------------------+--------------------+---------------+------------------+---------------------+----------------------------------------------------------+---------------+-------------+---------------+--------------+----------------------+------------+-------------------------+--------------------+---------------+------------------+---------------------+---------------------------------------------------------- AQV87I9Y4CIQF | 1998-09-06 | 5 | 56 | 55 | 1561580368 | Building for a Lifetime | 215662 | Book | Home & Garden | Home Design | {1589230612,1881955656,0140258094,1931498113,0070171513} | AQV87I9Y4CIQF | 1998-09-06 | 5 | 56 | 55 | 1561580368 | Building for a Lifetime | 215662 | Book | Home & Garden | Home Design | {1589230612,1881955656,0140258094,1931498113,0070171513}(1 row)
主节点日志
2013-03-22 15:09:19.538 CST,"digoal","postgres",18621,"172.16.3.33:41011",514c0239.48bd,24,"idle",2013-03-22 15:03:21 CST,3/401,0,LOG,00000,"statement: select * from customer_reviews t1, customer_reviews_small t2 where t1.customer_id=t2.customer_id limit 1;",,,,,,,,"exec_simple_query, postgres.c:969","psql"
# 33节点日志, 其中3个hdfs中的外部表通过worker_fetch_foreign_file取到数据库中, 小表customer_reviews_small_102017这通过worker_fetch_regular_table获取到数据库中, 注意日志中记录了3个会话的信息, hdfs是三个表, 获取三次是可以理解的. 但是小表其实是在worker节点里面的, 从pg_dist_shard_placement是能够验证的, 为什么也fetch了3次呢? 原因是3个会话, 并行操作. 从执行时间来看分别消耗了300毫秒.
不过函数的源码看不到, 不知道有没有判断不需要COPY的情况. 否则小表的处理开销还是非常大的.
2013-03-22 15:13:42.079 CST,"postgres","postgres",18245,"172.16.3.150:62163",514c04a5.4745,3,"idle",2013-03-22 15:13:41 CST,2/701,0,LOG,00000,"statement: SELECT worker_fetch_foreign_file('customer_reviews_631600631725577683', 34190254, '{db-172-16-3-40.sky-mobi.com,db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969","" 2013-03-22 15:13:42.080 CST,"postgres","postgres",18246,"172.16.3.150:62165",514c04a5.4746,3,"idle",2013-03-22 15:13:41 CST,3/162,0,LOG,00000,"statement: SELECT worker_fetch_foreign_file('customer_reviews_3598423008504059948', 67108864, '{db-172-16-3-40.sky-mobi.com,db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969","" 2013-03-22 15:13:42.080 CST,"postgres","postgres",18247,"172.16.3.150:62167",514c04a5.4747,3,"idle",2013-03-22 15:13:41 CST,4/74,0,LOG,00000,"statement: SELECT worker_fetch_foreign_file('customer_reviews_17225231375097368086', 67108864, '{db-172-16-3-40.sky-mobi.com,db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969","" 2013-03-22 15:13:42.379 CST,"postgres","postgres",18245,"172.16.3.150:62163",514c04a5.4745,4,"idle",2013-03-22 15:13:41 CST,2/702,0,LOG,00000,"statement: SELECT worker_fetch_regular_table('customer_reviews_small_102017', 145932288, '{db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969","" 2013-03-22 15:13:42.379 CST,"postgres","postgres",18246,"172.16.3.150:62165",514c04a5.4746,4,"idle",2013-03-22 15:13:41 CST,3/163,0,LOG,00000,"statement: SELECT worker_fetch_regular_table('customer_reviews_small_102017', 145932288, '{db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969","" 2013-03-22 15:13:42.379 CST,"postgres","postgres",18247,"172.16.3.150:62167",514c04a5.4747,4,"idle",2013-03-22 15:13:41 CST,4/75,0,LOG,00000,"statement: SELECT worker_fetch_regular_table('customer_reviews_small_102017', 145932288, '{db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969","" 2013-03-22 15:13:42.679 CST,"postgres","postgres",18245,"172.16.3.150:62163",514c04a5.4745,5,"idle",2013-03-22 15:13:41 CST,2/703,0,LOG,00000,"statement: COPY (SELECT t1.customer_id, t1.review_date, t1.review_rating, t1.review_votes, t1.review_helpful_votes, t1.product_id, t1.product_title, t1.product_sales_rank, t1.product_group, t1.product_category, t1.product_subcategory, t1.similar_product_ids, t2.customer_id, t2.review_date, t2.review_rating, t2.review_votes, t2.review_helpful_votes, t2.product_id, t2.product_title, t2.product_sales_rank, t2.product_group, t2.product_category, t2.product_subcategory, t2.similar_product_ids FROM customer_reviews_631600631725577683 t1, customer_reviews_small_102017 t2 WHERE (t1.customer_id = t2.customer_id) LIMIT 1::bigint) TO STDOUT",,,,,,,,"exec_simple_query, postgres.c:969","" 2013-03-22 15:13:42.679 CST,"postgres","postgres",18247,"172.16.3.150:62167",514c04a5.4747,5,"idle",2013-03-22 15:13:41 CST,4/76,0,LOG,00000,"statement: COPY (SELECT t1.customer_id, t1.review_date, t1.review_rating, t1.review_votes, t1.review_helpful_votes, t1.product_id, t1.product_title, t1.product_sales_rank, t1.product_group, t1.product_category, t1.product_subcategory, t1.similar_product_ids, t2.customer_id, t2.review_date, t2.review_rating, t2.review_votes, t2.review_helpful_votes, t2.product_id, t2.product_title, t2.product_sales_rank, t2.product_group, t2.product_category, t2.product_subcategory, t2.similar_product_ids FROM customer_reviews_17225231375097368086 t1, customer_reviews_small_102017 t2 WHERE (t1.customer_id = t2.customer_id) LIMIT 1::bigint) TO STDOUT",,,,,,,,"exec_simple_query, postgres.c:969","" 2013-03-22 15:13:42.680 CST,"postgres","postgres",18246,"172.16.3.150:62165",514c04a5.4746,5,"idle",2013-03-22 15:13:41 CST,3/164,0,LOG,00000,"statement: COPY (SELECT t1.customer_id, t1.review_date, t1.review_rating, t1.review_votes, t1.review_helpful_votes, t1.product_id, t1.product_title, t1.product_sales_rank, t1.product_group, t1.product_category, t1.product_subcategory, t1.similar_product_ids, t2.customer_id, t2.review_date, t2.review_rating, t2.review_votes, t2.review_helpful_votes, t2.product_id, t2.product_title, t2.product_sales_rank, t2.product_group, t2.product_category, t2.product_subcategory, t2.similar_product_ids FROM customer_reviews_3598423008504059948 t1, customer_reviews_small_102017 t2 WHERE (t1.customer_id = t2.customer_id) LIMIT 1::bigint) TO STDOUT",,,,,,,,"exec_simple_query, postgres.c:969",""
# 39节点日志
2013-03-22 15:13:42.175 CST,"postgres","postgres",9582,"172.16.3.150:61134",514c04a5.256e,3,"idle",2013-03-22 15:13:41 CST,2/751,0,LOG,00000,"statement: SELECT worker_fetch_foreign_file('customer_reviews_16398646636484320133', 64029428, '{db-172-16-3-40.sky-mobi.com,db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969","" 2013-03-22 15:13:42.175 CST,"postgres","postgres",9581,"172.16.3.150:61132",514c04a5.256d,3,"idle",2013-03-22 15:13:41 CST,3/92,0,LOG,00000,"statement: SELECT worker_fetch_foreign_file('customer_reviews_2698407172708592541', 67108864, '{db-172-16-3-40.sky-mobi.com,db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969","" 2013-03-22 15:13:42.475 CST,"postgres","postgres",9581,"172.16.3.150:61132",514c04a5.256d,4,"idle",2013-03-22 15:13:41 CST,3/93,0,LOG,00000,"statement: SELECT worker_fetch_regular_table('customer_reviews_small_102017', 145932288, '{db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969","" 2013-03-22 15:13:42.475 CST,"postgres","postgres",9582,"172.16.3.150:61134",514c04a5.256e,4,"idle",2013-03-22 15:13:41 CST,2/752,0,LOG,00000,"statement: SELECT worker_fetch_regular_table('customer_reviews_small_102017', 145932288, '{db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969","" 2013-03-22 15:13:42.775 CST,"postgres","postgres",9581,"172.16.3.150:61132",514c04a5.256d,5,"idle",2013-03-22 15:13:41 CST,3/94,0,LOG,00000,"statement: COPY (SELECT t1.customer_id, t1.review_date, t1.review_rating, t1.review_votes, t1.review_helpful_votes, t1.product_id, t1.product_title, t1.product_sales_rank, t1.product_group, t1.product_category, t1.product_subcategory, t1.similar_product_ids, t2.customer_id, t2.review_date, t2.review_rating, t2.review_votes, t2.review_helpful_votes, t2.product_id, t2.product_title, t2.product_sales_rank, t2.product_group, t2.product_category, t2.product_subcategory, t2.similar_product_ids FROM customer_reviews_2698407172708592541 t1, customer_reviews_small_102017 t2 WHERE (t1.customer_id = t2.customer_id) LIMIT 1::bigint) TO STDOUT",,,,,,,,"exec_simple_query, postgres.c:969","" 2013-03-22 15:13:42.775 CST,"postgres","postgres",9582,"172.16.3.150:61134",514c04a5.256e,5,"idle",2013-03-22 15:13:41 CST,2/753,0,LOG,00000,"statement: COPY (SELECT t1.customer_id, t1.review_date, t1.review_rating, t1.review_votes, t1.review_helpful_votes, t1.product_id, t1.product_title, t1.product_sales_rank, t1.product_group, t1.product_category, t1.product_subcategory, t1.similar_product_ids, t2.customer_id, t2.review_date, t2.review_rating, t2.review_votes, t2.review_helpful_votes, t2.product_id, t2.product_title, t2.product_sales_rank, t2.product_group, t2.product_category, t2.product_subcategory, t2.similar_product_ids FROM customer_reviews_16398646636484320133 t1, customer_reviews_small_102017 t2 WHERE (t1.customer_id = t2.customer_id) LIMIT 1::bigint) TO STDOUT",,,,,,,,"exec_simple_query, postgres.c:969",""
# 40节点日志
无
# 拷贝数据的函数
postgres=# \df+ worker_fetch_regular_table List of functions Schema | Name | Result data type | Argument data types | Type | Volatility | Owner | Language | Source code | Description ------------+----------------------------+------------------+---------------------------------+--------+------------+----------+----------+----------------------------+----------------------------------------- pg_catalog | worker_fetch_regular_table | void | text, bigint, text[], integer[] | normal | volatile | postgres | internal | worker_fetch_regular_table | fetch PostgreSQL table from remote node(1 row)postgres=# \df+ worker_fetch_foreign_file List of functions Schema | Name | Result data type | Argument data types | Type | Volatility | Owner | Lang uage | Source code | Description ------------+---------------------------+------------------+---------------------------------+--------+------------+----------+----- -----+---------------------------+---------------------------------------------------- pg_catalog | worker_fetch_foreign_file | void | text, bigint, text[], integer[] | normal | volatile | postgres | inte rnal | worker_fetch_foreign_file | fetch foreign file from remote node and apply file (1 row)
5. # 小表和小表的关联查询
每次查询时将表的信息通过fetch拷贝到worker本地的临时表, 然后JOIN. 注意是每次查询, 单事物多次查询也会多次调用fetch.
postgres=# set large_table_shard_count=6;SETpostgres=# select * from customer_reviews t1, customer_reviews_small t2 where t1.customer_id=t2.customer_id limit 1; customer_id | review_date | review_rating | review_votes | review_helpful_votes | product_id | product_title | product_sales_rank | product_group | product_category | product_subcategory | similar_product_ids | customer_id | review_date | review_rating | review_votes | review_helpful_votes | product_id | product_title | product_sales_rank | product_group | product_category | product_subcategory | similar_product_ids ---------------+-------------+---------------+--------------+----------------------+------------+-------------------------+--------------------+---------------+------------------+---------------------+----------------------------------------------------------+---------------+-------------+---------------+--------------+----------------------+------------+-------------------------+--------------------+---------------+------------------+---------------------+---------------------------------------------------------- AQV87I9Y4CIQF | 1998-09-06 | 5 | 56 | 55 | 1561580368 | Building for a Lifetime | 215662 | Book | Home & Garden | Home Design | {1589230612,1881955656,0140258094,1931498113,0070171513} | AQV87I9Y4CIQF | 1998-09-06 | 5 | 56 | 55 | 1561580368 | Building for a Lifetime | 215662 | Book | Home & Garden | Home Design | {1589230612,1881955656,0140258094,1931498113,0070171513}(1 row)
33 日志
2013-03-22 15:18:36.843 CST,"postgres","postgres",18271,"172.16.3.150:33076",514c05cc.475f,3,"idle",2013-03-22 15:18:36 CST,2/719,0,LOG,00000,"statement: SELECT worker_fetch_foreign_file('customer_reviews_631600631725577683', 34190254, '{db-172-16-3-40.sky-mobi.com,db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969",""2013-03-22 15:18:36.843 CST,"postgres","postgres",18272,"172.16.3.150:33078",514c05cc.4760,3,"idle",2013-03-22 15:18:36 CST,3/170,0,LOG,00000,"statement: SELECT worker_fetch_foreign_file('customer_reviews_3598423008504059948', 67108864, '{db-172-16-3-40.sky-mobi.com,db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969",""2013-03-22 15:18:36.843 CST,"postgres","postgres",18273,"172.16.3.150:33080",514c05cc.4761,3,"idle",2013-03-22 15:18:36 CST,4/82,0,LOG,00000,"statement: SELECT worker_fetch_foreign_file('customer_reviews_17225231375097368086', 67108864, '{db-172-16-3-40.sky-mobi.com,db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969",""2013-03-22 15:18:37.142 CST,"postgres","postgres",18271,"172.16.3.150:33076",514c05cc.475f,4,"idle",2013-03-22 15:18:36 CST,2/720,0,LOG,00000,"statement: SELECT worker_fetch_regular_table('customer_reviews_small_102017', 145932288, '{db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969",""2013-03-22 15:18:37.142 CST,"postgres","postgres",18273,"172.16.3.150:33080",514c05cc.4761,4,"idle",2013-03-22 15:18:36 CST,4/83,0,LOG,00000,"statement: SELECT worker_fetch_regular_table('customer_reviews_small_102017', 145932288, '{db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969",""2013-03-22 15:18:37.143 CST,"postgres","postgres",18272,"172.16.3.150:33078",514c05cc.4760,4,"idle",2013-03-22 15:18:36 CST,3/171,0,LOG,00000,"statement: SELECT worker_fetch_regular_table('customer_reviews_small_102017', 145932288, '{db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969",""
40 日志
2013-03-22 15:18:36.926 CST,"postgres","postgres",9601,"172.16.3.150:59324",514c05cc.2581,3,"idle",2013-03-22 15:18:36 CST,2/769,0,LOG,00000,"statement: SELECT worker_fetch_foreign_file('customer_reviews_2698407172708592541', 67108864, '{db-172-16-3-40.sky-mobi.com,db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969",""2013-03-22 15:18:36.926 CST,"postgres","postgres",9602,"172.16.3.150:59326",514c05cc.2582,3,"idle",2013-03-22 15:18:36 CST,3/100,0,LOG,00000,"statement: SELECT worker_fetch_foreign_file('customer_reviews_16398646636484320133', 64029428, '{db-172-16-3-40.sky-mobi.com,db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969",""2013-03-22 15:18:37.226 CST,"postgres","postgres",9601,"172.16.3.150:59324",514c05cc.2581,4,"idle",2013-03-22 15:18:36 CST,2/770,0,LOG,00000,"statement: SELECT worker_fetch_regular_table('customer_reviews_small_102017', 145932288, '{db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969",""2013-03-22 15:18:37.226 CST,"postgres","postgres",9602,"172.16.3.150:59326",514c05cc.2582,4,"idle",2013-03-22 15:18:36 CST,3/101,0,LOG,00000,"statement: SELECT worker_fetch_regular_table('customer_reviews_small_102017', 145932288, '{db-172-16-3-39.sky-mobi.com,db-172-16-3-33.sky-mobi.com}', '{9900,9900}')",,,,,,,,"exec_simple_query, postgres.c:969",""
6. # 查询中使用自定义函数
postgres=# create or replace function my_lower(i text) returns text as $$declarebegin return lower(i);end;$$ language plpgsql;CREATE FUNCTION postgres=# select avg(review_rating),my_lower(product_group) from customer_reviews group by my_lower(product_group); avg | my_lower --------------------+---------- 2.0000000000000000 | ce 4.4846521282116104 | music 4.3278330075790599 | video 5.0000000000000000 | toy 4.0000000000000000 | software 4.2944774472673496 | book 4.2773476749740566 | dvd(7 rows)
worker节点日志截取 :
2013-03-22 14:43:30.102 CST,"postgres","postgres",18082,"172.16.3.150:28810",514bfd91.46a2,3,"idle",2013-03-22 14:43:29 CST,2/625,0,LOG,00000,"statement: COPY (SELECT sum(review_rating) AS avg, count(*) AS avg, my_lower(product_group) AS my_lower FROM customer_reviews_631600631725577683 customer_reviews WHERE true GROUP BY my_lower(product_group)) TO STDOUT",,,,,,,,"exec_simple_query, postgres.c:969",""2013-03-22 14:43:30.104 CST,"postgres","postgres",18084,"172.16.3.150:28814",514bfd91.46a4,3,"idle",2013-03-22 14:43:29 CST,4/64,0,LOG,00000,"statement: COPY (SELECT sum(review_rating) AS avg, count(*) AS avg, my_lower(product_group) AS my_lower FROM customer_reviews_17225231375097368086 customer_reviews WHERE true GROUP BY my_lower(product_group)) TO STDOUT",,,,,,,,"exec_simple_query, postgres.c:969",""2013-03-22 14:43:30.104 CST,"postgres","postgres",18083,"172.16.3.150:28812",514bfd91.46a3,3,"idle",2013-03-22 14:43:29 CST,3/152,0,LOG,00000,"statement: COPY (SELECT sum(review_rating) AS avg, count(*) AS avg, my_lower(product_group) AS my_lower FROM customer_reviews_3598423008504059948 customer_reviews WHERE true GROUP BY my_lower(product_group)) TO STDOUT",,,,,,,,"exec_simple_query, postgres.c:969",""2013-03-22 14:43:30.084 CST,"postgres","postgres",9463,"172.16.3.150:55253",514bfd91.24f7,3,"idle",2013-03-22 14:43:29 CST,2/677,0,LOG,00000,"statement: COPY (SELECT sum(review_rating) AS avg, count(*) AS avg, my_lower(product_group) AS my_lower FROM customer_reviews_2698407172708592541 customer_reviews WHERE true GROUP BY my_lower(product_group)) TO STDOUT",,,,,,,,"exec_simple_query, postgres.c:969",""2013-03-22 14:43:30.084 CST,"postgres","postgres",9464,"172.16.3.150:55255",514bfd91.24f8,3,"idle",2013-03-22 14:43:29 CST,3/82,0,LOG,00000,"statement: COPY (SELECT sum(review_rating) AS avg, count(*) AS avg, my_lower(product_group) AS my_lower FROM customer_reviews_16398646636484320133 customer_reviews WHERE true GROUP BY my_lower(product_group)) TO STDOUT",,,,,,,,"exec_simple_query, postgres.c:969",""
7. # 统计信息
# 外部表的统计信息收集需要在worker节点进行, 主节点执行会报错.
postgres=# analyze verbose customer_reviews;ERROR: 58P01: could not stat file "": No such file or directoryLOCATION: fileAnalyzeForeignTable, file_fdw.c:781
# worker节点
postgres=# analyze verbose customer_reviews_16398646636484320133;INFO: analyzing "public.customer_reviews_16398646636484320133"INFO: "customer_reviews_16398646636484320133": file contains 377161 rows; 30000 rows in sampleANALYZE
[参考]
1. http://blog.163.com/digoal@126/blog/static/1638770402013219840831/
2. http://blog.163.com/digoal@126/blog/static/16387704020132192011747/
3. http://blog.163.com/digoal@126/blog/static/16387704020132189835346/
4. http://blog.163.com/digoal@126/blog/static/1638770402013221475935/
5. DISTRIBUTED参数详细介绍 :
postgres=# select * from pg_settings where name ='task_assignment_policy';-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------name | task_assignment_policysetting | greedyunit | category | Distributed Databaseshort_desc | Sets the policy to use when assigning tasks to worker nodes.extra_desc | The master node assigns tasks to worker nodes based on shard locations. This configuration value specifies the policy to use when making these assignments. The greedy policy aims to evenly distribute tasks across worker nodes; and the round-robin policy assigns tasks to worker nodes in a round-robin fashion.context | uservartype | enumsource | defaultmin_val | max_val | enumvals | {greedy,round-robin}boot_val | greedyreset_val | greedysourcefile | sourceline | postgres=# select * from pg_settings where category = 'Distributed Database';-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------- name | large_table_shard_count setting | 4 unit | category | Distributed Database short_desc | The shard count threshold over which a table is considered large. extra_desc | A distributed table is considered to be large if it has more shards than the value specified here. This largeness crite ria is then used in picking a table join order during distributed query planning. context | user vartype | integer source | default min_val | 1 max_val | 10000 enumvals | boot_val | 4 reset_val | 4 sourcefile | sourceline | -[ RECORD 2 ]----------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------- name | limit_clause_row_fetch_count setting | -1 unit | category | Distributed Database short_desc | Number of rows to fetch per task for limit clause optimization extra_desc | Select queries get partitioned and executed as smaller tasks. In some cases, select queries with limit clauses may need to fetch all rows from each task to generate results. In those cases, and where an approximation would produce meaningful results, this configuration value sets the number of rows to fetch from each task. context | user vartype | integer source | default min_val | -1 max_val | 2147483647 enumvals | boot_val | -1 reset_val | -1 sourcefile | sourceline | -[ RECORD 3 ]----------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------- name | max_running_tasks_per_node setting | 8 unit | category | Distributed Database short_desc | Sets the maximum number of tasks to run concurrently per node. extra_desc | The task tracker process schedules and executes the tasks assigned to it as appropriate. This configuration value sets the maximum number of tasks to execute concurrently on one node at any given time. context | sighup vartype | integer source | default min_val | 1 max_val | 2147483647 enumvals | boot_val | 8 reset_val | 8 sourcefile | sourceline | -[ RECORD 4 ]----------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------- name | max_tracked_tasks_per_node setting | 256 unit | category | Distributed Database short_desc | Sets the maximum number of tracked tasks per node. extra_desc | The task tracker processes keeps all assigned tasks in a shared hash table, and schedules and executes these tasks as a ppropriate. This configuration value limits the size of the hash table, and therefore the maximum number of tasks that can be tracke d at any given time. context | postmaster vartype | integer source | default min_val | 8 max_val | 2147483647 enumvals | boot_val | 256 reset_val | 256 sourcefile | sourceline | -[ RECORD 5 ]----------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------- name | max_worker_nodes_tracked setting | 2048 unit | category | Distributed Database short_desc | Sets the maximum number of worker nodes that are tracked. extra_desc | Worker nodes' network locations, their membership and health status are tracked in a shared hash table on the master no de. This configuration value limits the size of the hash table, and consequently the maximum number of worker nodes that can be trac ked. context | postmaster vartype | integer source | default min_val | 8 max_val | 2147483647 enumvals | boot_val | 2048 reset_val | 2048 sourcefile | sourceline | -[ RECORD 6 ]----------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------- name | partition_buffer_size setting | 8192 unit | kB category | Distributed Database short_desc | Sets the buffer size to use for partition operations. extra_desc | Worker nodes allow for table data to be repartitioned into multiple text files, much like Hadoop's Map command. This co nfiguration value sets the buffer size to use per partition operation. After the buffer fills up, we flush the repartitioned data in to text files. context | user vartype | integer source | default min_val | 0 max_val | 2147483647 enumvals | boot_val | 8192 reset_val | 8192 sourcefile | sourceline | -[ RECORD 7 ]----------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------- name | remote_task_check_interval setting | 100 unit | ms category | Distributed Database short_desc | Sets the frequency at which we check job statuses. extra_desc | The master node assigns tasks to workers nodes, and then regularly checks with them about each task's progress. This co nfiguration value sets the time interval between two consequent checks. context | sighup vartype | integer source | default min_val | 10 max_val | 100000 enumvals | boot_val | 100 reset_val | 100 sourcefile | sourceline | -[ RECORD 8 ]----------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------- name | shard_max_size setting | 2097152 unit | kB category | Distributed Database short_desc | Sets the maximum size a shard will grow before it gets split. extra_desc | Shards store table and file data. When the source file's size for one shard exceeds this configuration value, the datab ase ensures that either a new shard gets created, or the current one gets split. Note that shards read this configuration value at s harded table creation time, and later reuse the initially read value. context | user vartype | integer source | default min_val | 256 max_val | 2147483647 enumvals | boot_val | 2097152 reset_val | 2097152 sourcefile | sourceline | -[ RECORD 9 ]----------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------- name | shard_replication_factor setting | 2 unit | category | Distributed Database short_desc | Sets the replication factor for shards. extra_desc | Shards are replicated across nodes according to this replication factor. Note that shards read this configuration value at sharded table creation time, and later reuse the initially read value. context | user vartype | integer source | default min_val | 1 max_val | 100 enumvals | boot_val | 2 reset_val | 2 sourcefile | sourceline | -[ RECORD 10 ]---------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------- name | task_assignment_policy setting | greedy unit | category | Distributed Database short_desc | Sets the policy to use when assigning tasks to worker nodes. extra_desc | The master node assigns tasks to worker nodes based on shard locations. This configuration value specifies the policy t o use when making these assignments. The greedy policy aims to evenly distribute tasks across worker nodes; and the round-robin poli cy assigns tasks to worker nodes in a round-robin fashion. context | user vartype | enum source | default min_val | max_val | enumvals | {greedy,round-robin} boot_val | greedy reset_val | greedy sourcefile | sourceline | -[ RECORD 11 ]---------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------- name | task_tracker_active setting | off unit | category | Distributed Database short_desc | Starts up the task tracker process. extra_desc | The task tracker background process runs on every worker node, and manages the execution of tasks assigned to it. This configuration entry activates the task tracker. context | postmaster vartype | bool source | default min_val | max_val | enumvals | boot_val | off reset_val | off sourcefile | sourceline | -[ RECORD 12 ]---------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------- name | task_tracker_delay setting | 200 unit | ms category | Distributed Database short_desc | Task tracker sleep time between task management rounds. extra_desc | The task tracker process wakes up regularly, walks over all tasks assigned to it, and schedules and executes these task s. Then, the task tracker sleeps for a time period before walking over these tasks again. This configuration value determines the le ngth of that sleeping period. context | sighup vartype | integer source | default min_val | 10 max_val | 100000 enumvals | boot_val | 200 reset_val | 200 sourcefile | sourceline |
相关文章推荐
- Android之获取手机上的图片和视频缩略图thumbnails
- 详解HDFS Short Circuit Local Reads
- 数据库链接字符串查询网站
- Hadoop_2.1.0 MapReduce序列图
- 使用Hadoop搭建现代电信企业架构
- 分布式版本管理git入门指南使用资料汇总及文章推荐
- 单机版搭建Hadoop环境图文教程详解
- DB2实例管理
- DB2实例管理
- 保障MySQL数据安全的14个最佳方法
- mysql问答汇集
- 创建一个空的IBM DB2 ECO数据库的方法
- Access 2000 数据库 80 万记录通用快速分页类
- 开通一个数据库失败的原因的和解决办法
- 一个简单的asp数据库操作类
- CentOS下DB2数据库安装过程详解
- PostgreSQL新手入门教程
- EasyASP v1.5发布(包含数据库操作类,原clsDbCtrl.asp)第1/2页