Query performance optimization of Vertica
2013-07-23 15:51
597 查看
Don't fetch any data that you don't need,or don't fetch any columns that you don't need. Because retrieving more data or more columns, which can increase network,I/O,memory and CPU overhead for the server. For example, if you need several columns you can use
AT EPOCH LATEST
SELECT fi.name, fi.InvestmentKey,id.VendorId,id.CUSIP,id.ISIN,id.DomicileCountryId,id.CurrencyId
FROM dbo.FixedIncome fi
INNER JOIN dbo.InvestmentIdDimension id ON id.InvestmentKey = fi.InvestmentKey
WHERE id.InvestmentId = 'B000023K1X'
But do not use:
AT EPOCH LATEST
SELECT fi.*, id.*
FROM dbo.FixedIncome fi
INNER JOIN dbo.InvestmentIdDimension id ON id.InvestmentKey = fi.InvestmentKey
WHERE id.InvestmentId = 'B000023K1X'
To avoid blocking Vertica write process, we alway add the "AT EPOCH
LATEST" for query,which is snapshot read. for example, You can use
AT EPOCH LATEST SELECT ... FROM ...,
But do not use:
SELECT ... FROM ...
Chop up a complex query to many simpler queries.
Join decomposition, if posible, Sometimes, Using "In" clause or sub
query clause instead of a complex "JOIN" clause. like this, we can use
AT EPOCH LATEST
SELECT s1.CompanyId, id.InvestmentId, s1.InvestmentKey,id.VendorId,id.CUSIP,id.ISIN,id.DomicileCountryId,id.CurrencyId
FROM ( SELECT CompanyId,InvestmentKey FROM dbo.FixedIncome WHERE CompanyId = '0C00000BDL') s1
INNER JOIN dbo.InvestmentIdDimension id ON id.InvestmentKey = s1.InvestmentKey
WHERE id.VendorId = 101 OR id.VendorId = 102;
But do not use:
AT EPOCH LATEST
SELECT s1.CompanyId, id.InvestmentId, s1.InvestmentKey,id.VendorId,id.CUSIP,id.ISIN,id.DomicileCountryId,id.CurrencyId
FROM dbo.FixedIncome fi
INNER JOIN dbo.InvestmentIdDimension id ON id.InvestmentKey = s1.InvestmentKey
WHERE fi.CompanyId = '0C00000BDL' AND( id.VendorId = 101 OR id.VendorId = 102 );
Try to use the temporary table to cache data, which can avoid scan an physical table for times.
Try to push the outer predicate into the inner subquery clause, so that it is evaluated before the analytic computation
For Top-K query, if posible, we'd better omit the order by clause, Or we'd better adding a filter condition for it.
For sort operation, We can create Pre-sorted projections, so the
vertica can choose the faster Group By Pipeline over Group By Hash
Please refer to the "Optimizing Query Performance" chapter in
reference manual of vertica, which doc's name is "Communiti Vertica
Community Edition 6.0"
[https://my.vertica.com/docs/CE/6.0.1/HTML/index.htm#12525.htm ]
AT EPOCH LATEST
SELECT fi.name, fi.InvestmentKey,id.VendorId,id.CUSIP,id.ISIN,id.DomicileCountryId,id.CurrencyId
FROM dbo.FixedIncome fi
INNER JOIN dbo.InvestmentIdDimension id ON id.InvestmentKey = fi.InvestmentKey
WHERE id.InvestmentId = 'B000023K1X'
But do not use:
AT EPOCH LATEST
SELECT fi.*, id.*
FROM dbo.FixedIncome fi
INNER JOIN dbo.InvestmentIdDimension id ON id.InvestmentKey = fi.InvestmentKey
WHERE id.InvestmentId = 'B000023K1X'
To avoid blocking Vertica write process, we alway add the "AT EPOCH
LATEST" for query,which is snapshot read. for example, You can use
AT EPOCH LATEST SELECT ... FROM ...,
But do not use:
SELECT ... FROM ...
Chop up a complex query to many simpler queries.
Join decomposition, if posible, Sometimes, Using "In" clause or sub
query clause instead of a complex "JOIN" clause. like this, we can use
AT EPOCH LATEST
SELECT s1.CompanyId, id.InvestmentId, s1.InvestmentKey,id.VendorId,id.CUSIP,id.ISIN,id.DomicileCountryId,id.CurrencyId
FROM ( SELECT CompanyId,InvestmentKey FROM dbo.FixedIncome WHERE CompanyId = '0C00000BDL') s1
INNER JOIN dbo.InvestmentIdDimension id ON id.InvestmentKey = s1.InvestmentKey
WHERE id.VendorId = 101 OR id.VendorId = 102;
But do not use:
AT EPOCH LATEST
SELECT s1.CompanyId, id.InvestmentId, s1.InvestmentKey,id.VendorId,id.CUSIP,id.ISIN,id.DomicileCountryId,id.CurrencyId
FROM dbo.FixedIncome fi
INNER JOIN dbo.InvestmentIdDimension id ON id.InvestmentKey = s1.InvestmentKey
WHERE fi.CompanyId = '0C00000BDL' AND( id.VendorId = 101 OR id.VendorId = 102 );
Try to use the temporary table to cache data, which can avoid scan an physical table for times.
Try to push the outer predicate into the inner subquery clause, so that it is evaluated before the analytic computation
For Top-K query, if posible, we'd better omit the order by clause, Or we'd better adding a filter condition for it.
For sort operation, We can create Pre-sorted projections, so the
vertica can choose the faster Group By Pipeline over Group By Hash
Please refer to the "Optimizing Query Performance" chapter in
reference manual of vertica, which doc's name is "Communiti Vertica
Community Edition 6.0"
[https://my.vertica.com/docs/CE/6.0.1/HTML/index.htm#12525.htm ]
相关文章推荐
- query the list of factor on performance impact
- Improving Performance of FOR ALL ENTRIES QUERY
- Improving Performance of FOR ALL ENTRIES QUERY
- 查询优化的具体步骤(The Phases of Query Optimization)
- Improving Performance of FOR ALL ENTRIES QUERY
- Improving Performance of FOR ALL ENTRIES QUERY
- The State of Web Performance Optimization
- Notes on <High Performance MySQL> -- Ch4: Query Performance Optimization
- 错误分析:Internal Query Processor Error: The query processor ran out of stack space during query optimization.
- Internal Query Processor Error: The query processor ran out of stack space during query optimization.
- query performance of the access dababase
- query performance of the access dababase
- Improving Performance of FOR ALL ENTRIES QUERY
- Performance Optimization of MTP
- Retrofit学习教程(4)-Multiple Query Parameters of Same Name
- WCF实例上下文模式与并发模式对性能的影响 转载自:http://log.medcl.net/item/2010/03/wcf-instance-context-mode-and-the-performance-impact-of-conc
- Hibernate error : org.hibernate.hql.ast.QuerySyntaxException: unexpected end of subtree
- The 16th tip of DB Query Analyzer
- Performance of Java versus C++
- QueryPerformanceCounter函数