Want to archive tables? Use Percona Toolkit’s pt-archiver--转载
2015-11-02 17:35
483 查看
原文地址:https://www.percona.com/blog/2013/08/12/want-to-archive-tables-use-pt-archiver/
Percona Toolkit’s pt-archiver is one of the best utilities to archive the records from large tables to another tables or files. One interesting thing is that pt-archiver is a read-write tool. It deletes data from the source by default, so after archiving you don’t need to delete it separately.
As it is done by default, you should take care before actually running it on then production server. You can test your archiving jobs with the — dry-run OR you can use the –no-delete option if you’re not sure about. The purpose of this script is mainly to archive old data from the table without impacting OLTP queries and insert the data into another table on the same/different server OR into a file in a format which is suitable for LOAD DATA INFILE.
How does pt-archiver select records to archive?
Pt-archiver uses the index to select records from the table. The index is used to optimize repeated accesses to the table. Pt-archiver remembers the last row it retrieves from each SELECT statement, and uses it to construct a WHERE clause. It does this using the columns in the specified index that should allow MySQL to start the next SELECT where the last one ended – rather than potentially scanning from the beginning of the table with each successive SELECT.
If you want to run pt-archiver with a specific index you can use the “-i” option in –source DSN options. The “-i” option tells pt-archiver which index it should scan to archive. This appears in a FORCE INDEX orUSE INDEX hint in the SELECT statements that are used to fetch rows to archive. If you don’t specify anything, pt-archiver will auto-discover a good index, preferring a PRIMARY KEY if one exists. Most of the time, without “-i” option, pt-archiver works well.
How to run pt-archiver?
For archive records into normal file, you can run something like
Shell
From archive records from one table to another table on same server or different, you can run something like
Shell
Please check this before you use default file option (-F) in –source https://www.percona.com/doc/percona-toolkit/2.1/pt-archiver.html#cmdoption-pt-archiver–dest
Archiving in a replication environment:
In the replication environment it’s really important that the slave should not lag for a long time. So for that, there are two options which we can use while archiving to control the slave lag on slave server.
–check-slave-lag : Pause archiving until the specified DSN’s slave lag is less than –max-lag. In this option, you can give slave details to connect slave lag. (i.e –check-slave-lag h=localhost,S=/tmp/mysql_sandbox29784.sock)
–max-lag : Pause archiving if the slave given by –check-slave-lag lags.
This options causes pt-archiver to look at the slave every time when it’s about to fetch another row. If the slave’s lag is greater than the option’s value, or if the slave isn’t running (so its lag is NULL), pt-archiver sleeps for –check-interval seconds and then looks at the lag again. It repeats until the slave is caught up, then proceeds to fetch and archive the row.
Some useful options for pt-archiver:
–for-update/-share-lock : Adds the FOR UPDATE/LOCK IN SHARE MODE modifier to SELECT statements.
–no-delete : Do not delete archived rows.
–plugin : Perl module name to use as a generic plugin.
–progress : Print progress information every X rows.
–statistics : Collect and print timing statistics.
–where : WHERE clause to limit which rows to archive (required).
Shell
Percona Toolkit’s pt-archiver works with Percona XtraDB Cluster (PXC) 5.5.28-23.7 and newer, but there are three limitations you should consider before archiving on a cluster. You can get more informationhere.
pt-archiver is extensible via a plugin mechanism. You can inject your own code to add advanced archiving logic that could be useful for archiving dependent data, applying complex business rules, or building a data warehouse during the archiving process. Follow this URL for more info on that.
Bugs related to pt-archiver: https://bugs.launchpad.net/percona-toolkit/+bugs?field.tag=pt-archiver
More details about pt-archiver: https://www.percona.com/doc/percona-toolkit/2.2/pt-archiver.html
Percona Toolkit’s pt-archiver is one of the best utilities to archive the records from large tables to another tables or files. One interesting thing is that pt-archiver is a read-write tool. It deletes data from the source by default, so after archiving you don’t need to delete it separately.
As it is done by default, you should take care before actually running it on then production server. You can test your archiving jobs with the — dry-run OR you can use the –no-delete option if you’re not sure about. The purpose of this script is mainly to archive old data from the table without impacting OLTP queries and insert the data into another table on the same/different server OR into a file in a format which is suitable for LOAD DATA INFILE.
How does pt-archiver select records to archive?
Pt-archiver uses the index to select records from the table. The index is used to optimize repeated accesses to the table. Pt-archiver remembers the last row it retrieves from each SELECT statement, and uses it to construct a WHERE clause. It does this using the columns in the specified index that should allow MySQL to start the next SELECT where the last one ended – rather than potentially scanning from the beginning of the table with each successive SELECT.
If you want to run pt-archiver with a specific index you can use the “-i” option in –source DSN options. The “-i” option tells pt-archiver which index it should scan to archive. This appears in a FORCE INDEX orUSE INDEX hint in the SELECT statements that are used to fetch rows to archive. If you don’t specify anything, pt-archiver will auto-discover a good index, preferring a PRIMARY KEY if one exists. Most of the time, without “-i” option, pt-archiver works well.
How to run pt-archiver?
For archive records into normal file, you can run something like
Shell
1 | pt-archiver --source h=localhost,D=nil,t=test --file '/home/nilnandan/%Y-%m-%d-tabname' --where "name='nil'" --limit-1000 |
Shell
1 | pt-archiver --source h=localhost,D=nil,t=test --dest h=fedora.vm --where "name='nil'" --limit-1000 |
Archiving in a replication environment:
In the replication environment it’s really important that the slave should not lag for a long time. So for that, there are two options which we can use while archiving to control the slave lag on slave server.
–check-slave-lag : Pause archiving until the specified DSN’s slave lag is less than –max-lag. In this option, you can give slave details to connect slave lag. (i.e –check-slave-lag h=localhost,S=/tmp/mysql_sandbox29784.sock)
–max-lag : Pause archiving if the slave given by –check-slave-lag lags.
This options causes pt-archiver to look at the slave every time when it’s about to fetch another row. If the slave’s lag is greater than the option’s value, or if the slave isn’t running (so its lag is NULL), pt-archiver sleeps for –check-interval seconds and then looks at the lag again. It repeats until the slave is caught up, then proceeds to fetch and archive the row.
Some useful options for pt-archiver:
–for-update/-share-lock : Adds the FOR UPDATE/LOCK IN SHARE MODE modifier to SELECT statements.
–no-delete : Do not delete archived rows.
–plugin : Perl module name to use as a generic plugin.
–progress : Print progress information every X rows.
–statistics : Collect and print timing statistics.
–where : WHERE clause to limit which rows to archive (required).
Shell
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | nilnandan@nil:~$ pt-archiver --source h=localhost,D=nil,t=test,S=/tmp/mysql_sandbox29783.sock --file '/home/nilnandan/%Y-%m-%d-tabname' --where "name='nilnandan'" --limit=50000 --progress=50000 --txn-size=50000 --statistics --bulk-delete --max-lag=1 --check-interval=15 --check-slave-lag h=localhost,S=/tmp/mysql_sandbox29784.sock TIME ELAPSED COUNT 2013-08-08T10:08:39 0 0 2013-08-08T10:09:25 46 50000 2013-08-08T10:10:32 113 100000 2013-08-08T10:11:41 182 148576 Started at 2013-08-08T10:08:39, ended at 2013-08-08T10:11:59 Source: D=nil,S=/tmp/mysql_sandbox29783.sock,h=localhost,t=test SELECT 148576 INSERT 0 DELETE 148576 Action Count Time Pct print_file 148576 18.2674 9.12 bulk_deleting 3 8.9535 4.47 select 4 2.9204 1.46 commit 3 0.0005 0.00 other 0 170.0719 84.95 nilnandan@nil:~$ |
pt-archiver is extensible via a plugin mechanism. You can inject your own code to add advanced archiving logic that could be useful for archiving dependent data, applying complex business rules, or building a data warehouse during the archiving process. Follow this URL for more info on that.
Bugs related to pt-archiver: https://bugs.launchpad.net/percona-toolkit/+bugs?field.tag=pt-archiver
More details about pt-archiver: https://www.percona.com/doc/percona-toolkit/2.2/pt-archiver.html
相关文章推荐
- c#中定义数组--字符串及数组操作
- OpenGL ES着色器语言之变量和数据类型(一)(官方文档第四章)和varying,uniform,attribute修饰范围
- MYSQL的基本使用【表的创建和修改、视图、函数、存储过程、触发器和事件调度器】
- MYSQL的基本使用【表的创建和修改、视图、函数、存储过程、触发器和事件调度器】
- Android Training - 提升布局文件的性能(Lesson 1 - 优化布局的层级)
- Win32中调用其他应用程序的方法(函数)winexec,shellexecute ,createprocess
- map 与 pojo相互转换
- sqoop1.4.6安装
- 搜索框没有语音架构的时候
- 杭电1339
- js ==与===区别(两个等号与三个等号)
- Java语言基础之包与权限修饰符
- 使用logstash+elasticsearch+kibana快速搭建日志平台
- 基于opencv,设置findcontours参数减提取中间轮廓
- 黑马程序员——C语言——循环控制语句
- qtcreator拖拽多个按键到qscrollarea滚动实现
- android 全局获取Context的技巧 《第一行代码》
- [Hive] - Hive参数含义详解
- No Hibernate Session bound to thread, and configuration does not allow creat
- 笔记_Maya绑定基础_1、对象命名的规范_2、创建物体简单的父子关系