脚本 - 清理/维护gpperfmon历史数据
2016-05-05 17:17
369 查看
文章标题:脚本 - 清理/维护gpperfmon历史数据
原文作者:Faisal Ali - 2015-2-26 21:13
译文作者:Goopand(goopand {AT} gmail.com) - 2016-5-5 17:05
原文链接:https://support.pivotal.io/hc/en-us/articles/203309923-Script-Cleanup-Maintenance-script-for-historical-gpperfmon-data
更新链接:https://discuss.zendesk.com/hc/en-us/articles/203309923-Script-Cleanup-Maintenance-script-for-historical-gpperfmon-data
该脚本可用于控制gpperfmon库的规模大小,或删除不再需要的历史数据。
2、在承载关键业务的机器上运行该脚本之前,请先在测试集群上进行ceshi(由于oschina的敏感词过滤,这里用拼音代替)、验证。
▪以您实际的数据库环境为准,修改environment_parameters.env文件中的参数
▪按如下方式,运行脚本:
▪环境变量文件
▪shell脚本(注意:原作者的shell脚本内容,已由译者Goopand稍作改动如下)
这3个文件夹的说明如下:
fix : 包含删除历史分区的SQL
tmp : 垃圾文件以及执行drop语句命令的输出结果
log : shell脚本的日志文件
脚本的输出形式类似于下面:
[脚本执行]
[日志信息]
[附件]
environment_parameters.env (126 Bytes)
gpperfmon_maintenance.sh (8 KB)
评论区:
[Kushal Choubay]
别忘了还有这个 - VACUUM,VACUUM FULL and REINDEX 对减小gpperfmon(以及其他任何)数据库也有帮助。
gpperfmon=# REINDEX DATABASE gpperfmon;
gpperfmon=# VACUUM [FULL] [TABLENAME] ; ------ 推荐使用脚本或者命令集,对各个表(用户表和系统表)执行vacuum。
VACUUM是一项成本较高的操作,而且也很耗时。建议在数据库空闲时段执行。请务必不要杀掉vacuum进程,并在你怀疑数据库hung的时候寻求技术支持。
原文作者:Faisal Ali - 2015-2-26 21:13
译文作者:Goopand(goopand {AT} gmail.com) - 2016-5-5 17:05
原文链接:https://support.pivotal.io/hc/en-us/articles/203309923-Script-Cleanup-Maintenance-script-for-historical-gpperfmon-data
更新链接:https://discuss.zendesk.com/hc/en-us/articles/203309923-Script-Cleanup-Maintenance-script-for-historical-gpperfmon-data
【目标】
通过提供一份简单的脚本,帮助管理员安排一个维护时间窗口,对gpperfmon库中超出保留周期的历史分区数据,进行清理(注:这里保留周期以“月”为单位)。该脚本可用于控制gpperfmon库的规模大小,或删除不再需要的历史数据。
【声明】
1、该脚本只是作为示范目的,如您遇到任何问题,将无法获得技术支持。2、在承载关键业务的机器上运行该脚本之前,请先在测试集群上进行ceshi(由于oschina的敏感词过滤,这里用拼音代替)、验证。
【执行】
▪把environment_parameters.env和gpperfmon_maintenance.sh这两个文件(即:本文档的附件),拷贝到你的任意目录下(两个文件必须在同一个目录下)gpadmin:Fullrack@mdw $ pwd /data1/gpadmin gpadmin:Fullrack@mdw $ ls -ltr total 56 -rw------- 1 gpadmin gpadmin 7485 Aug 21 02:18 gpperfmon_maintenance.sh -rw------- 1 gpadmin gpadmin 127 Aug 21 02:29 environment_parameters.env
▪以您实际的数据库环境为准,修改environment_parameters.env文件中的参数
▪按如下方式,运行脚本:
/bin/sh gpperfmon_maintenance.sh
【脚本】
该脚本分为两部分(完整的shell脚本,可从本文末尾的附件处下载)▪环境变量文件
gphome:/usr/local/greenplum-db-4.2.6.3 pgdatabase:gpperfmon pgport:5432 master_data_directory:/data/master/gpseg-1 retention:3
▪shell脚本(注意:原作者的shell脚本内容,已由译者Goopand稍作改动如下)
#!/bin/bash # # gpperfmon_maintenance.sh # pivotal - 2014 # # Function : To extract the history partition name which is older than retention period. extract_partition_information() { echo "INFO - Extracting information of partition older than retention period: "$Retention" Months" echo psql -d $PGDATABASE -p $PGPORT -c "SELECT schemaname||'.'||tablename as Parent_Table, partitionschemaname||'.'||partitiontablename as Partition_Name, age(substring(partitionrangestart from 2 for 19)::timestamp) Partition_Age, substring(partitionrangestart from 2 for 19)::timestamp as Partition_Start, substring(partitionrangeend from 2 for 19)::timestamp as Partition_End, partitionrank as Parition_Rank, (select pg_size_pretty(pg_total_relation_size(b.partitiontablename)) from pg_partitions b where p.partitiontablename=b.partitiontablename ) as Partition_Size FROM pg_partitions p WHERE partitionrangestart < current_timestamp::timestamp without time zone - interval '${Retention} months' and tablename like '%history' and partitionrank <> 1 ORDER BY 3 desc;" } # Function : To extract the sql to drop those older partition, but it ignores if that is the only partition of the table. generate_sql_to_drop() { echo "INFO - Generating SQL to drop partition older than retention period: "$Retention" Months" echo psql -d $PGDATABASE -p $PGPORT -Atc "SELECT 'ALTER TABLE ' ||schemaname||'.'||tablename || ' DROP PARTITION FOR (RANK(' || partitionrank|| '));' FROM pg_partitions WHERE partitionrangestart < current_timestamp::timestamp without time zone - interval '${Retention} months' and tablename in ( select a.tablename from pg_partitions a where a.tablename like '%history' group by a.tablename having count(*) > 1 ) and partitionrank <> 1 ORDER BY partitionrank desc; " > $sql_file } # Function : To drop the partition. execute_drop_sql() { echo "INFO - Excecuting the sql file generated to drop the partition with retention older than: " $Retention" Months" echo psql -d $PGDATABASE -p $PGPORT -ef $sql_file > $drop_output } # Function : To extract the history partition name after executing the drop. extract_partition_info_after_drop() { echo "INFO - Extracting information of partition after dropping the partition more than the retention period: "$Retention" Months" echo echo "MESG - If any partition left after drop, the partition could be the last partition of the table" echo "MESG - Drop script ignore the last partition , to avoid the below error \"cannot drop partition for rank 1 of relation \"<table-name>\" -- only one remains\" " echo psql -d $PGDATABASE -p $PGPORT -c "SELECT schemaname||'.'||tablename as Parent_Table, partitionschemaname||'.'||partitiontablename as Partition_Name, age(substring(partitionrangestart from 2 for 19)::timestamp) Partition_Age, substring(partitionrangestart from 2 for 19)::timestamp as Partition_Start, substring(partitionrangeend from 2 for 19)::timestamp as Partition_End FROM pg_partitions p WHERE partitionrangestart < current_timestamp::timestamp without time zone - interval '${Retention} months' and tablename like '%history' and partitionrank <> 1 ORDER BY 3 desc; " } # Main program starts here # Script and log directories echo "INFO - Generating the directories name / location where the output logs will saved / stored" echo export script=$0 export script_basename=`basename $script` export script_dir=`dirname $script` cd $script_dir export script_dir=`pwd` export install_dir=`dirname $script_dir` export logdir=$script_dir/log export tmpdir=$script_dir/tmp export fixdir=$script_dir/fix # Creating tmp / log directory echo "INFO - Creating the directories which will be used for storing logs / temp files ( if not available ) " echo mkdir -p $script_dir/log mkdir -p $script_dir/tmp mkdir -p $script_dir/fix # Reading the parameter file to set the environment echo "INFO - Reading the parameter file to set the environment" echo export paramfile=$script_dir/environment_parameters.env export GPHOME=`grep -i gphome $paramfile | grep -v grep | cut -d: -f2` source $GPHOME/greenplum_path.sh export PGDATABASE=`grep -i pgdatabase $paramfile | grep -v grep | cut -d: -f2` export PGPORT=`grep -i pgport $paramfile | grep -v grep | cut -d: -f2` export MASTER_DATA_DIRECTORY=`grep -i master_data_directory $paramfile | grep -v grep | cut -d: -f2` export Retention=`grep -i retention $paramfile | grep -v grep | cut -d: -f2` # Script and log filenames echo "INFO - Generating filenames needed for output logs" echo export logfile=${logdir}/${script_basename}.${PGDATABASE}.${PGPORT}.log export oldlog1=${logdir}/${script_basename}.${PGDATABASE}.${PGPORT}.log.1 export oldlog2=${logdir}/${script_basename}.${PGDATABASE}.${PGPORT}.log.2 export junkfile=${tmpdir}/${script_basename}.${PGDATABASE}.${PGPORT}.junk export sql_file=${fixdir}/${script_basename}.${PGDATABASE}.${PGPORT}.dropping_older_partition.sql export drop_output=${tmpdir}/${script_basename}.${PGDATABASE}.${PGPORT}.drop_output.tmp # Save old log files echo "INFO - Checking / archiving the old log files from previous run" echo if (test -f $oldlog1 ) then mv -f $oldlog1 $oldlog2 > $junkfile 2>> $junkfile fi if (test -f $logfile ) then mv -f $logfile $oldlog1 > $junkfile 2>> $junkfile fi # Remove old temporary files. echo "INFO - Removing the old / temporary files from previous run, if any" echo if (test -f $sql_file ) then rm -r $sql_file > $junkfile 2>> $junkfile fi # Direct messages to logfile echo "INFO - All the log / output messages are being moved to logfile: " $logfile echo "INFO - Please use a different session to view the progress / logfile: " $logfile echo "INFO - Do not press ctrl + c or kill the session unless its needed , allow the program to complete" echo exec > $logfile 2>> $logfile # Printing the message on the environment that will be used by this script echo "INFO - Program succesfully started" echo "INFO - Program started at" `date` echo echo "--------------------------------------------------------------------------------------------------------------------------------------------------------------------" echo echo "MESG - GreenPlum Database Cluster Environment: " echo echo " INFO - Software Location:" $GPHOME echo " INFO - Database:" $PGDATABASE echo " INFO - Port:" $PGPORT echo " INFO - Master Data Directory:" $MASTER_DATA_DIRECTORY echo " INFO - Retention:"$Retention" Months" echo echo "MESG - The script logs name / location" echo echo " INFO - Logfile Destination:" $logdir echo " INFO - Logfile Name:" $logfile echo echo "--------------------------------------------------------------------------------------------------------------------------------------------------------------------" echo # Calling the Function to confirm the script execution extract_partition_information generate_sql_to_drop execute_drop_sql extract_partition_info_after_drop # Program ending messages. echo "INFO - Progam succesfully completed" echo "INFO - Program ended at" `date` echo
【输出】
如果脚本运行成功,将会生成3个文件夹:-rw------- 1 gpadmin gpadmin 7485 Aug 21 02:18 gpperfmon_maintenance.sh -rw------- 1 gpadmin gpadmin 127 Aug 21 02:29 environment_parameters.env drwx------ 2 gpadmin gpadmin 111 Aug 21 02:18 fix drwx------ 2 gpadmin gpadmin 153 Aug 21 02:18 tmp drwx------ 2 gpadmin gpadmin 125 Aug 21 02:18 log
这3个文件夹的说明如下:
fix : 包含删除历史分区的SQL
tmp : 垃圾文件以及执行drop语句命令的输出结果
log : shell脚本的日志文件
脚本的输出形式类似于下面:
[脚本执行]
gpadmin:Fullrack@mdw $ /bin/sh gpperfmon_maintenance.sh INFO - Generating the directories name / location where the output logs will saved / stored INFO - Creating the directories which will be used for storing logs / temp files ( if not available ) INFO - Reading the parameter file to set the environment INFO - Generating filenames needed for output logs INFO - Checking / archiving the old log files from previous run INFO - Removing the old / temporary files from previous run, if any INFO - All the log / output messages are being moved to logfile: /data1/gpadmin/log/gpperfmon_maintenance.sh.gpperfmon.5432.log INFO - Please use a different session to view the progress / logfile: /data1/gpadmin/log/gpperfmon_maintenance.sh.gpperfmon.5432.log INFO - Do not press ctrl + c or kill the session unless its needed , allow the program to complete
[日志信息]
gpadmin:Fullrack@mdw $ cat gpperfmon_maintenance.sh.gpperfmon.5432.log INFO - Program succesfully started INFO - Program started at Thu Aug 21 02:07:50 PDT 2014 -------------------------------------------------------------------------------------------------------------------------------------------------------------------- MESG - GreenPlum Database Cluster Environment: INFO - Software Location: /usr/local/greenplum-db-4.2.6.3 INFO - Database: gpperfmon INFO - Port: 5432 INFO - Master Data Directory: /data/master/gpseg-1 INFO - Retention: 3 Months MESG - The script logs name / location INFO - Logfile Destination: /data1/gpadmin/log INFO - Logfile Name: /data1/gpadmin/log/gpperfmon_maintenance.sh.gpperfmon.5432.log -------------------------------------------------------------------------------------------------------------------------------------------------------------------- INFO - Extracting information of partition older than retention period: 3 Months Parent Table | Partition Name | Partition Age | Partition Start | Partition End | Parition Rank | Partition Size ----------------------------------+------------------------------------------+---------------------------------+---------------------+---------------------+---------------+---------------- public.log_alert_history | public.log_alert_history_1_prt_1 | 4 years 7 mons 20 days 03:00:00 | 2009-12-31 21:00:00 | 2010-01-31 21:00:00 | 1 | 288 kB public.iterators_history | public.iterators_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 | 1 | 288 kB public.database_history | public.database_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 | 1 | 0 bytes public.segment_history | public.segment_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 | 1 | 0 bytes public.emcconnect_history | public.emcconnect_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 | 1 | 288 kB public.health_history | public.health_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 | 1 | 288 kB public.filerep_history | public.filerep_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 | 1 | 0 bytes public.diskspace_history | public.diskspace_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 | 1 | 288 kB public.network_interface_history | public.network_interface_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 | 1 | 0 bytes public.socket_history | public.socket_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 | 1 | 0 bytes public.udp_history | public.udp_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 | 1 | 0 bytes public.tcp_history | public.tcp_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 | 1 | 0 bytes public.tcp_extended_history | public.tcp_extended_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 | 1 | 0 bytes public.queries_history | public.queries_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 | 1 | 288 kB (14 rows) INFO - Generating SQL to drop partiton older than retention period: 3 Months INFO - Excecuting the sql file generated to drop the partition with retention older than: 3 Months INFO - Extracting information of partition after dropping the partition more than the retention period: 3 Months MESG - If any partition left after drop, the partition could be the last partition of the table MESG - Drop script ignore the last partition , to avoid the below error "cannot drop partition for rank 1 of relation "" -- only one remains" Parent Table | Partition Name | Partition Age | Partition Start | Partition End ----------------------------------+------------------------------------------+---------------------------------+---------------------+--------------------- public.log_alert_history | public.log_alert_history_1_prt_1 | 4 years 7 mons 20 days 03:00:00 | 2009-12-31 21:00:00 | 2010-01-31 21:00:00 public.socket_history | public.socket_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 public.udp_history | public.udp_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 public.tcp_history | public.tcp_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 public.tcp_extended_history | public.tcp_extended_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 public.network_interface_history | public.network_interface_history_1_prt_1 | 4 years 7 mons 20 days | 2010-01-01 00:00:00 | 2010-02-01 00:00:00 (6 rows) INFO - Progam succesfully completed INFO - Program ended at Thu Aug 21 02:07:51 PDT 2014
[附件]
environment_parameters.env (126 Bytes)
gpperfmon_maintenance.sh (8 KB)
评论区:
[Kushal Choubay]
别忘了还有这个 - VACUUM,VACUUM FULL and REINDEX 对减小gpperfmon(以及其他任何)数据库也有帮助。
gpperfmon=# REINDEX DATABASE gpperfmon;
gpperfmon=# VACUUM [FULL] [TABLENAME] ; ------ 推荐使用脚本或者命令集,对各个表(用户表和系统表)执行vacuum。
VACUUM是一项成本较高的操作,而且也很耗时。建议在数据库空闲时段执行。请务必不要杀掉vacuum进程,并在你怀疑数据库hung的时候寻求技术支持。
相关文章推荐
- linux 动态调用 .so 库文件中的函数
- 【Linux 学习笔记】gcc初体验
- Centos 7 系统安装
- Nginx启动报错:10013: An attempt was made to access a socket in a way forbidden
- linux下查看和添加PATH环境变量
- opencv函数操作对象的一致性问题
- shopnc 导出Excel数据问题实例 && ajax 获取当前值并传递
- Linux系统中“动态库”和“静态库”那点事儿
- ubuntu下配置php开发环境(linux)
- 小何讲Linux: 底层文件I/O操作
- Hadoop全分布式配置(2个节点)
- 为什么Nginx的性能要比Apache高很多
- Linux: grep多个关键字“与”和“或”
- Hadoop常见错误解决
- OpenGL学习脚印: 使用索引绘图(index drawing)
- Linux 进程通信
- ARM平台基于嵌入式Linux Gstreamer 使用
- apache 2.4 +php7安装手册
- apache 工作模式
- linux下多节点之间,免密钥访问实现