oracle占用系统资源很高 sqlplus 无响应
2011-07-19 17:49
351 查看
昨晚,客服电话我们。。TSP平台无法登录。。后来我们重启tomcat,但还是不能登录。经检查,是数据库出问题。进入oracle服务器top一下,oracle进程占用资源很大,id剩余0.0%,而且负载迅速涨到156..试图进入oracle sys账户查询当前是否有循环语句连接,但sqlplus输完回车等半天没反应,而且资源持续上涨,负载很高!
后来,google了半天终于找到问题所在——sqlplus 无响应挂起处理,弄了半天原来oracle的bug引起——我的oracle版本是10.2.0.1。后来注意了一下系统运行时间是198天。汗!和网络文档说的一样。。
以下摘自网络:
事实上只要Linux x86主机运行天数是是24.8的倍数都有可能引发该bug,因为time()函数值为null,造成无限死循环,从而耗尽cpu。
解决办法三种:
1) 重启主机;
2) 打patch set,如升级到10.2.0.4;
3) 对该bug单独打临时patch 4612267。
第一种方法没有彻底解决问题,以后照旧;第二种方法,升级时间长,且要求停库很久,当前生产环境暂不适合;
参考文档:
Doc ID: 338461.1 SQL*Plus 10.2.0.1 Hangs, When System Uptime Is Long Period of Time
Doc ID: 4612267.8 Bug 4612267 - OCI client spins when machine uptime >= 249 days
我采取的是第三种方法,打补丁包的方式。而且据oracle官方文档说明,oracle11已经修复该问题。
下面是pache 4612267补丁包的安装及验证方法:
先停监听、dbconsole和数据库
$ lsnrctl stop
$ emctl stop dbconsole
$ sqlplus / as sysdba
SQL> shutdown immediate
注意:dbconsole是在已经装了Oracle EM的情况下要停止,如果未安装则无需干涉。
安装patch
$ mkdir $ORACLE_BASE/patches
$ cd $ORACLE_BASE/patches
$ rz (SecureCRT里上传 p4612267_10201_LINUX.zip 文件, 其它上传方式也可以)
$ unzip p4612267_10201_LINUX.zip
$ cd 4612267/
$ $ORACLE_HOME/OPatch/opatch apply
Invoking OPatch 10.2.0.1.0
...
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/10.2.0/db_1')
Is the local system ready for patching?
Do you want to proceed? [y|n]
y (此处输入y)
User Responded with: Y
...
ApplySession adding interim patch '4612267' to inventory
The local system has been patched and can be restarted.
OPatch succeeded.
验证patch
$ $ORACLE_HOME/OPatch/opatch lsinventory
Invoking OPatch 10.2.0.1.0
Oracle interim Patch Installer version 10.2.0.1.0
Copyright (c) 2005, Oracle Corporation. All rights reserved..
Oracle Home : /u01/app/oracle/product/10.2.0/db_1
Central Inventory : /u01/app/oracle/oraInventory
from : /u01/app/oracle/product/10.2.0/db_1/oraInst.loc
OPatch version : 10.2.0.1.0
OUI version : 10.2.0.1.0
OUI location : /u01/app/oracle/product/10.2.0/db_1/oui
Log file location : /u01/app/oracle/product/10.2.0/db_1/cfgtoollogs/opatch/opatch-2009_Jan_13_11-06-27-HKT_Tue.log
Lsinventory Output file location : /u01/app/oracle/product/10.2.0/db_1/cfgtoollogs/opatch/lsinv/lsinventory-2009_Jan_13_11-06-27-HKT_Tue.txt
--------------------------------------------------------------------------------
Installed Top-level Products (2):
Oracle Database 10g 10.2.0.1.0
Oracle Database 10g Products 10.2.0.1.0
There are 2 products installed in this Oracle Home.
Interim patches (1) :
Patch 4612267 : applied on Tue Jan 13 11:05:10 HKT 2009
Created on 5 Oct 2005, 13:48:00 hrs US/Pacific
Bugs fixed:
4612267
--------------------------------------------------------------------------------
OPatch succeeded.
启动数据库、监听和dbconsole
$ sqlplus / as sysdba
SQL> startup
$ lsnrctl start
$ emctl start dbconsole
如果有需要,还可以删除patch,删除前先停库
$ cd $ORACLE_BASE/patches/4612267
$ $ORACLE_HOME/OPatch/opatch rollback -id 4612267
Invoking OPatch 10.2.0.1.0
...
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/10.2.0/db_1')
Is the local system ready for patching?
Do you want to proceed? [y|n]
y (此处输入y)
User Responded with: Y
...
RollbackSession removing interim patch '4612267' from inventory
The local system has been patched and can be restarted.
OPatch succeeded.
此时再执行上面的验证patch命令就会发现该patch已经删除了。
注:
Running STRACE tool shows:
$ strace /oracle/home/bin/sqlplus -V 2>&1 |less
......
old_mmap(NULL, 385024, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x41794000
gettimeofday({1122996561, 411035}, NULL) = 0
access("/usr/local/UD/conf/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/UD/lib/oracle/network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/UD/conf/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/UD/lib/oracle/network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
fcntl64(-1218313656, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
It is looping on the times() function.--死循环中
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
系统资源过高的真正原因在于此!!
后来,google了半天终于找到问题所在——sqlplus 无响应挂起处理,弄了半天原来oracle的bug引起——我的oracle版本是10.2.0.1。后来注意了一下系统运行时间是198天。汗!和网络文档说的一样。。
以下摘自网络:
事实上只要Linux x86主机运行天数是是24.8的倍数都有可能引发该bug,因为time()函数值为null,造成无限死循环,从而耗尽cpu。
解决办法三种:
1) 重启主机;
2) 打patch set,如升级到10.2.0.4;
3) 对该bug单独打临时patch 4612267。
第一种方法没有彻底解决问题,以后照旧;第二种方法,升级时间长,且要求停库很久,当前生产环境暂不适合;
参考文档:
Doc ID: 338461.1 SQL*Plus 10.2.0.1 Hangs, When System Uptime Is Long Period of Time
Doc ID: 4612267.8 Bug 4612267 - OCI client spins when machine uptime >= 249 days
我采取的是第三种方法,打补丁包的方式。而且据oracle官方文档说明,oracle11已经修复该问题。
下面是pache 4612267补丁包的安装及验证方法:
先停监听、dbconsole和数据库
$ lsnrctl stop
$ emctl stop dbconsole
$ sqlplus / as sysdba
SQL> shutdown immediate
注意:dbconsole是在已经装了Oracle EM的情况下要停止,如果未安装则无需干涉。
安装patch
$ mkdir $ORACLE_BASE/patches
$ cd $ORACLE_BASE/patches
$ rz (SecureCRT里上传 p4612267_10201_LINUX.zip 文件, 其它上传方式也可以)
$ unzip p4612267_10201_LINUX.zip
$ cd 4612267/
$ $ORACLE_HOME/OPatch/opatch apply
Invoking OPatch 10.2.0.1.0
...
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/10.2.0/db_1')
Is the local system ready for patching?
Do you want to proceed? [y|n]
y (此处输入y)
User Responded with: Y
...
ApplySession adding interim patch '4612267' to inventory
The local system has been patched and can be restarted.
OPatch succeeded.
验证patch
$ $ORACLE_HOME/OPatch/opatch lsinventory
Invoking OPatch 10.2.0.1.0
Oracle interim Patch Installer version 10.2.0.1.0
Copyright (c) 2005, Oracle Corporation. All rights reserved..
Oracle Home : /u01/app/oracle/product/10.2.0/db_1
Central Inventory : /u01/app/oracle/oraInventory
from : /u01/app/oracle/product/10.2.0/db_1/oraInst.loc
OPatch version : 10.2.0.1.0
OUI version : 10.2.0.1.0
OUI location : /u01/app/oracle/product/10.2.0/db_1/oui
Log file location : /u01/app/oracle/product/10.2.0/db_1/cfgtoollogs/opatch/opatch-2009_Jan_13_11-06-27-HKT_Tue.log
Lsinventory Output file location : /u01/app/oracle/product/10.2.0/db_1/cfgtoollogs/opatch/lsinv/lsinventory-2009_Jan_13_11-06-27-HKT_Tue.txt
--------------------------------------------------------------------------------
Installed Top-level Products (2):
Oracle Database 10g 10.2.0.1.0
Oracle Database 10g Products 10.2.0.1.0
There are 2 products installed in this Oracle Home.
Interim patches (1) :
Patch 4612267 : applied on Tue Jan 13 11:05:10 HKT 2009
Created on 5 Oct 2005, 13:48:00 hrs US/Pacific
Bugs fixed:
4612267
--------------------------------------------------------------------------------
OPatch succeeded.
启动数据库、监听和dbconsole
$ sqlplus / as sysdba
SQL> startup
$ lsnrctl start
$ emctl start dbconsole
如果有需要,还可以删除patch,删除前先停库
$ cd $ORACLE_BASE/patches/4612267
$ $ORACLE_HOME/OPatch/opatch rollback -id 4612267
Invoking OPatch 10.2.0.1.0
...
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/10.2.0/db_1')
Is the local system ready for patching?
Do you want to proceed? [y|n]
y (此处输入y)
User Responded with: Y
...
RollbackSession removing interim patch '4612267' from inventory
The local system has been patched and can be restarted.
OPatch succeeded.
此时再执行上面的验证patch命令就会发现该patch已经删除了。
注:
Running STRACE tool shows:
$ strace /oracle/home/bin/sqlplus -V 2>&1 |less
......
old_mmap(NULL, 385024, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x41794000
gettimeofday({1122996561, 411035}, NULL) = 0
access("/usr/local/UD/conf/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/UD/lib/oracle/network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/UD/conf/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/UD/lib/oracle/network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
fcntl64(-1218313656, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
It is looping on the times() function.--死循环中
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
系统资源过高的真正原因在于此!!
相关文章推荐
- Oracle进程占用资源过大导致系统缓慢的问题
- oracle查看系统资源占用情况
- oracle查看系统资源占用情况
- 服务器CPU占用资源高达90%-99%,内存占用7.2GB,系统运行慢。oracle
- oracle查看系统资源占用情况
- oracle查看系统资源占用情况
- oracle查看系统资源占用情况
- ubuntu查看系统资源占用
- Linux中查看系统资源占用情况的命令
- 查看linux服务器配置和TOP命令是Linux下常用的性能分析工具,能够实时显示系统中各个进程的资源占用状况。
- Linux作网络系统-资源占用优化配置 参照网络系统cisco IOS
- Linux中查看系统资源占用情况的命令
- PHP-php-fpm占用系统资源分析
- 记录一次php占用系统资源过高的问题
- Linux中查看系统资源占用情况的命令
- ubuntu查看系统资源占用(内存,cpu和进程)
- ubuntu查看系统资源占用(内存,cpu和进程)
- CentOS中查看系统资源占用情况的命令
- 关于php-fpm占用系统资源分析
- [linux]top命令详解-实时显示系统中各个进程的资源占用状况