您的位置：首页 > 产品设计 > UI/UE

LTOM - The On-Board Monitor User Guide (文档 ID 352363.1)

2013-12-18 08:56 756 查看

LTOM

The On-Board Monitor User's Guide

Embedded Real-Time Data Collection and Diagnostics Platform

Carl Davis

Center of Expertise

December 9, 2010

Best Practices

Pro-Active Problem Avoidance and Diagnostic Collection

Although some problems may be unforeseen, in many cases problems may be avoidable if signs are detected early enough. Additionally, if an issue does occur, it is no use collecting information about that issue after the event. LTOM is one of the tools that
support recommend for collecting such diagnostics. For information on suggested uses, other proactive preparations and diagnostics, see:

Document 1482811.1 Best Practices: Proactively Avoiding Database and Query Performance Issues
Document 1477599.1 Best Practices Around Data Collection For Performance Issues
LTOM now provides a graphing utility to graph the data collected. This greatly reduces the need to manually inspect all the output files. See the "LTOMg
System Profiler" section below. Click here to see an
example of the new html system profile output.

Contents

Introduction

Overview

New Features

Support for RAC

Automatic Hang Detection

System Profiler

Automatic Session Tracing

Supported Platforms

Download LTOM

Installing LTOM

Uninstalling LTOM

Running LTOM

LTOMg System Profiler (New)

Reporting Feedback

Appendix A: LTOM Directory Structure

Appendix B: LTOM Rules of Engagement/FAQ

System Profiler

Automatic Hang Detection

Automatic Session Tracing

Appendix C: Automatic Session Tracing Example

Appendix D: Sample System Profiler Trace File

Introduction
The Lite Onboard Monitor (LTOM) is a java program designed as a real-time diagnostic platform for deployment to a customer site. LTOM differs from other support tools, as it is proactive rather than
reactive. LTOM provides real-time automatic problem detection and data collection. LTOM runs on the customer's UNIX server, is tightly integrated with the host operating system and provides an integrated solution for detecting and collecting trace files for
system performance issues. The ability to detect problems and collect data in real-time will hopefully reduce the amount of time it takes to solve problems and reduce customer downtime.

Back to Contents

Overview
Historically, one of the major problems with obtaining the necessary diagnostic information to diagnose database/system performance problems is having the necessary diagnostic data collected while
the problem is actually occurring. Additionally, the necessary diagnostic data is seldom collected because of the time it takes to react to recognizing there is a problem, trying to determine what kind of data to collect, and knowing how to collect the data.
Frequently, the problem has passed or the database has to be shutdown to correct the problem. This forces the customer to wait until the next occurrence and then hopefully the data can be collected fast enough. LTOM does automatic problem detection and collects
the necessary diagnostic traces in real-time while the database/system performance problem is occurring. LTOM provides services for:

Automatic Hang Detection

System Profiler

Automatic Session Tracing

Back to Contents

New Features
Version 4.3.1 of LTOM contains the following new features:

System Profiler has been enhanced to collect additional metrics for parallel query slaves and blocking sessions.

LTOMg had new functionality to format System Profiler trace to provide easier readability.

Back to Contents

Support for RAC
LTOM can be configured for use in a RAC environment. See instructions in the $TOM_HOME/init/hangDetect.properties file for details. To use automatic hang detection, LTOM needs to be installed on
only 1 node of the RAC cluster.
To use all the other features of LTOM, such as the System Profiler or Session Recorder, LTOM must be installed on each node of the RAC cluster. For shared disk environments, install LTOM to a unique
location for each node of the cluster. It is also recommended that OSWatcher (See Note:301137.1) be installed
on each node of a RAC cluster.

Back to Contents

Automatic Hang Detection

This feature should only be used at the direction of Oracle Support or by experienced dba's. The automated collection of heavy tracing on a production system can have a significant performance impact on that system. The user needs to be aware of the consequences
of generating this level of tracing and should proceed with caution.
Automatic Hang Detection uses a rule based hang detection algorithm. LTOM has a default built in set of rules that should be sufficient in most situations but provides the ability to modify or add
new rules as needed. These rules are based on database wait events. LTOM considers only non-idle wait events in its hang detection algorithm. To provide more granularity, a set of rules can be configured to match specific kinds of hangs. For example, if hangs
are occurring because of latch free waits that happen very quickly, hanging the system for a short duration (several minutes), and the default trigger value for latch free is set too high, we can define a rule for latch free that triggers on 15 seconds. Any
session waiting on latch free for a period greater than 15 seconds would then trigger the collection of diagnostic hang traces.
When operating LTOM and this mode is enabled, automatic hang detection proceeds silently in the background while periodically checking for hangs. Once any session has been identified as hung, diagnostic
traces are automatically generated. The type of hang diagnostic and number of diagnostic traces collected is determined by what has been defined in the rules file, $TOM_HOME/init/hangDetect.properties. The default collection is as follows...

HangAnalyze Level 3

Systemstate Level 266

Wait 60 seconds

HangAnalyze Level 3

Systemstate Level 266

To modify this collection edit the $TOM_HOME/init/hangDetect.properties file.
The advantage of using automatic hang detection, is that if the database hangs at 2:00 in the morning and no one is around, the necessary diagnostic traces will be collected and a hang report will
be generated. Email notification can be configured that will alert the user to the hang. To set up email notification, edit the $TOM_HOME/src/ltommail.sh file or simply allow the auto installer to do this for you on installation. To prevent traces from constantly
being generated once a hang is detected, only one set of diagnostic traces are collected and no further hangs will be detected until the mode has been turned off and re-enabled. LTOM can also automatically determine the level of tracing based on the level
of impact to the system of collecting additional diagnostic traces.
For more information see the Automatic
Hang Detection FAQ below and also the rule definition file $TOM_HOME/init/hangDetect.properties.

Back to Contents

System Profiler
One of the problems with relying solely on statspack, is the inability to look at performance from a holistic point of view. Information about non-Oracle processes and the health of the operating
system in terms of memory, CPU and IO for example, is not collected. Further, all static data collectors are problematic in that single sample snapshots or multiple snapshots taken at 15 or 30 minute intervals can miss problems which can occur briefly during
a snapshot interval and will be averaged out over the duration of the snapshot. The System Profiler provides the ability to continually collect data from both the operating system and oracle and provides an integrated snapshot of the overall health of the
operating system together with the database. This data collection contains the output from operating system utilities (top, vmstat and iostat) along with Oracle session data (v$session, v$process, v$sesson_wait, v$system_event and v$system_statistics). The
recording frequency and subsets of available data can also be configured when running the tool.
Once the data is collected, the data can be parsed and analyzed through LTOMg.
This tool provides a graphical interface and can quickly drill down around any performance problem. Click
here to see an example of the new html system profile output.

The following parameters can be configured to control the frequency and selectivity of data to be collected.

Update Freq - latency between snapshots

Display Top - select to record OS top processes

Display Vmstat - select to record vmstat information

Display Iostat - select to record iostat information

Display Sessions - select to record Oracle processes

Display CPU Stats - select to record CPU statistics from OS

Display Current SQL Executing - select to record current SQL executing

For more information see the System
Profiler FAQ below.
Also Appendix
D contains an example of the raw System Profiler data collection.

Back to Contents

Automatic Session Tracing

This feature should only be used at the direction of Oracle Support or by experienced dba's. The automated collection of heavy tracing on a production system can have a significant performance impact on that system. The user needs to be aware of the consequences
of generating this level of tracing and should proceed with caution.
One of the most important diagnostic traces is the Oracle extended SQL trace, commonly known as SQL trace. Obtaining a SQL trace file from oracle database sessions can be problematic, especially
if you do not know which session you need to trace. Likewise, turning on SQL trace for the entire database, just to capture the trace of a few problematic sessions can be prohibitively expensive for some customers. Automatic Session Tracing uses a set of
rules to determine when to turn on SQL trace for individual oracle sessions, using event 10046 level 12 trace. Rules can be defined for database wait events, CPU and specific users. For rules based on wait events, the automatic session recorder monitors certain
V$ views at specified intervals and computes the average wait time between intervals for each event. This computed average wait time is compared to the rule definition for that event, if any. If a rule has been defined for that event and if the average wait
time exceeds the rule threshold for that event then LTOM turns on tracing for that session. For rules based on CPU, the automatic session recorder computes the amount of CPU used by the session between intervals and compares it to the rule. For rules based
on specific users, the automatic session recorder traces any session owned by that user. Sessions can be traced in a circular memory buffer or to a file.
The advantage of tracing in a memory buffer is that the process is not constantly writing to a disk, as I/O is one of the most expensive operations a computer performs. Memory tracing is also advantageous
in that only the last few seconds of tracing that are close to the performance problem is generated avoiding the collection of gigabytes of trace data just for the last few seconds of trace. The rule definition for in memory tracing uses 2 thresholds. The
minimum threshold turns on the tracing for the session in memory and the maximum threshold forces the memory buffer to be written to disk to that session's respective trace file. This allows the session to be continuously traced and dumped to disk only when
something significant occurs. The user can also manually force the memory buffer to be written to disk at any time. The user specifies the amount of memory to dedicate to each session when starting LTOM along with the option to limit the number of sessions
LTOM can trace.
When tracing directly to a file, automatic session tracing simply turns on tracing automatically for any session which violates the rule definitions. When exiting automatic session tracing all sessions
currently being traced will have their respective tracing turned off. As a fail safe, LTOM creates a SQL script file in the $TOM_HOME/recordings/session/stopsessions.sql which will turn off any tracing turned on by LTOM. The user would manually run this file
if required.
For more information see the Automatic
Session Recorder FAQ below and also the rule definition file $TOM_HOME/init/sessionRecorder.properties.
Also Appendix
C contains an example of how to set up Automatic Session Tracing.

Back to Contents

Supported Platforms

Solaris

Linux

HP-UX

AIX

Tru64

Back to Contents

Download LTOM
Current LTOM Version: 4.3.1 December, 2010
Click
here to download the file.
If a file download dialog box does not appear when clicking on the above link, you may need to clear your web browser's cache and/or restart your web browser. If you are still unable to download
the file, you may request that we email you a copy: Carl.Davis@oracle.com

Back to Contents

Installing LTOM
Download of LTOM is available through MetaLink and can be downloaded as a tar file. Copy the tar file to the directory where LTOM is to be installed and issue the following commands.

uncompress ltom.tar.Z

tar xvfp ltom.tar

A directory named tom_base is created which houses all the files associated with LTOM. A README file is located in this directory with full instructions on how to install the tool.

Back to Contents

Uninstalling LTOM
To uninstall LTOM issue the following command on the tom_base directory

rm -rf tom_base

Back to Contents

Running LTOM
The user of LTOM must be a member of the unix dba group as some components of LTOM use OS authentication. In addition, LTOM will prompt for a db username/password. This is required for LTOM to make
jdbc connections to the database. This user must be a db user with full dba privileges.
Before running LTOM certain environment variables need to be defined. Please see the README for further details.
To run LTOM standalone go to the directory tom_base/tom ($TOM_HOME) and issue the following command...

./startltom.sh

This will bring up the command line version of LTOM. Users are first prompted to enter a database username and password. The username must be a db user with dba privileges. Once logged in, users
can then manually enter commands to turn on and off automatic hang detection and data recording functions.

kernaltom:/u02/home/TOM>./startltom.sh

Enter 1 to Start Auto Hang Detection

Enter 2 to Stop Auto Hang Detection

Enter 3 to Start System Profiling

Enter 4 to Stop System Profiling

Enter 7 to Start Session Tracing

Enter 71 to Display Sessions Traced

Enter 72 to Dump All Trace Buffers

Enter 73 to Dump Specific Trace Buffer

Enter 8 to Stop Session Tracing

Enter S to Update status

Enter Q to End Program

CURRENT STATUS: HangDetection=OFF ManRec=OFF SessionRec=OFF

Please Select an Option:

LTOM can also be started as a background task. See instructions in the README There can be no interaction with LTOM once it is started in this mode. To terminate LTOM follow the instructions in the
README.

kernaltom:/u02/home/TOM>nohup ./startltom.sh -s &

Back to Contents

LTOMg System Profiler
A new utility, LTOMg has been added to LTOM. This utility provides the ability to graph the data collected by LTOM. See the LTOMg
User Guide for more information. To see a sample of the LTOMg System Profiler output, click
here.

Sample Graph

Back to Contents

Reporting Feedback
If you encounter problems running LTOM or would like to provide feedback, please send email to Carl.Davis@oracle.com.

Back to Contents

Appendix A: LTOM Directory Structure

The tom_base directory is the root directory created when downloading and untarring the ltom.tar file. The tom_base directory contains 2 subdirectories

TOM_HOME - root directory for all LTOM subdirectories.

Install - directory containing the installer

The TOM_HOME directory contains the following 6 subdirectories:

hanglog directory contains the logs created from running Automatic Hang Detection.

The init directory contains the following initialization files...

tom_deploy.properties - this file contains initialization parameters for LTOM. This file is mandatory for startup of these tools.

hangDetect.properties - this file contains the rule definitions for Automatic Hang Detection. This file should not be edited unless directed by a support analyst.

sessionRecorder.properties - this file contains the rule definitions for Automatic Session Tracing. This file should not be edited unless directed by a support analyst.

The ltomg directory contains 3 subdirectories

src - directory for ltomg source files.

gif - default directory for ltomg gif files.

profile - default directory for ltomg html profiles.

The recordings directory contains 4 subdirectories

event - directory containing the event rule violations for Automatic Data Recorder (desupported)

profile - directory containing the files from the System Profiler

smart - directory containing the files created from running the default event toolkits for the Automatic Data Recorder (desupported)

session - directory containing the log from Automatic Session Tracing

The src directory contains LTOM external source files. The directory also contains the LTOM executable

The tmp directory contains temporary files used by LTOM.

Back to Contents

Appendix B: LTOM Rules of Engagement/FAQ

System Profiler
When to use?
The system profiler should be used whenever a comprehensive view of a performance problem is required. The system profiler should be considered for any performance problems that require analysis
down to the seconds level. This option should be considered whenever statspack snapshots do not provide the granularity necessary to resolve the issue. The system profiler is useful to frame performance issues where a bottleneck may be outside Oracle.
Benefits?

Collect data up to just seconds prior to hang or crash

Collect os data in additional to oracle performance data

Collect statistical data down to 1 second increments

Displays SQL currently executing

RCA timeline

How to use?

Install LTOM

cd $TOM_HOME

/startltom.sh

Select option 3. Then follow prompts

Where is the output?
The system profiler produces a single log file for each recording. An additional io file may be created if profiling with iostat. These file is are located in the $TOM_HOME/recordings/profile directory.
Gotchas?
The system profiler produces a single file each time it is turned on. If left on for days this file could become quite large. It is recommended that the recorder be reset on a daily basis if extended
recording is required.

Back to Contents

Automatic Hang Detection
When to use?

This feature should only be used at the direction of Oracle Support or by experienced dba's. The automated collection of heavy tracing on a production system can have a significant performance impact on that system. The user needs to be aware of the consequences
of generating this level of tracing and should proceed with caution.
Automatic hang detection should be considered for any tars involving hangs/slowdowns when the necessary information collected at the initial outage is insufficient to diagnose the problem. If hang
occurs at 2:00 in the morning and no one is around LTOM will collect required trace files.
Benefits?

Collect systemstates and hanganalyze files during the actual hang without operator intervention

Hang data collection 24x7

Hangs automatically detected

Email notification of hang

How to use?

Install LTOM

Edit the file $TOM_HOME/init/hangDetect.properties if you want to customize

cd $TOM_HOME

./startltom.sh

Select option 1. Then follow prompts

Where is the output?
Automatic hang detection produces several files for each hang. These files are as follows:

$TOM_HOME/hanglog/ hang*.log file containing the systemstate analyzer output and hang analyze summary output.

$TOM_HOME/hanglog/hang*.report file gives details about what caused the hang and records the actions LTOM has taken once the hang was detected.

Systemstate dumps and hang analyze files produced are in the udump.

Email notification if this was configured (see README).

Gotchas?
Once a hang is detected, automatic hang detection produces one set of files. The program needs to be reset before the next set of files is collected. This is to prevent the continuous, indefinite
collection of systemstate/hanganalyze files.

Back to Contents

Automatic Session Tracing
When to use?

This feature should only be used at the direction of Oracle Support or by experienced dba's. The automated collection of heavy tracing on a production system can have a significant performance impact on that system. The user needs to be aware of the consequences
of generating this level of tracing and should proceed with caution.
Automatic session tracing should be considered for situations where specific sessions experience performance problems. Data collection can be tied to specific wait events or CPU. An example may be
latch contention that happens occasionally but is not detected by statspack snapshots.
Benefits?

Collect 10046 trace only when a performance problem occurs

Collect SQL associated with a session's performance problem

Tie data collection to a specific oracle wait event or CPU utilization

Easily identify the users and SQL associated with a particular performance problem

Session tracing for only problematic sessions

Trace sessions owned by a specific user

How to use?

Install LTOM

Edit the file $TOM_HOME/init/sessionRecorder.properties file

Define a rule based on db wait event (See README for full details)

cd $TOM_HOME

./startltom.sh

Select option 7. Then follow prompts

Where is the output?
Automatic Session Tracing produces oracle trace files and a log file. These files are in the following directories:

$TOM_HOME/recordings/session directory contains a file logging any significant performance events that occur during the recording.

Oracle session trace files located in bdump and udump

Gotchas?
It can take up to 3 times the polling frequency for the following values to be displayed back to the user properly. This is due to the polling frequency which basically causes the program to sleep
between sampling and also because we are collecting multiple samples and performing computations.
The user must exit automatic session tracing to turn off tracing once it has been started thru LTOM. The program turns off tracing for all sessions that it enabled tracing for. Failure to exit automatic
session tracing will result in these sessions continuing to be traced.

Back to Contents

Appendix C: Automatic Session Tracing Example
The problem:
A business has an SLA (Service Level Agreement) with their customers that require all customer transactions to complete in under one second. Occasionally, some transactions exceed this requirement
forcing the business to incur a significant financial penalty. By deploying the system profiler and taking snapshots of the system every few seconds it was discovered that a particular wait event was responsible for this excessive time causing the transaction
not to complete in under 1 second. Knowing just the wait event did not provide enough information to determine why this was happening. What was needed was a 10046 trace of the session(s) involved so the underlying SQL could be examined. The business did not
know which of the 1000 concurrent sessions to trace nor could they afford to trace all 1000 sessions as this would force most of the other transactions beyond the 1 second SLA because of the significant overhead of tracing all sessions with the 10046 event.
The solution:
By deploying the automatic session recorder it is possible to trace only those sessions that are being affected by the particular wait event resulting in insignificant performance impact to the database.
The performance impact of taking these diagnostics traces could be even further reduced by tracing these sessions in memory and not having them write to a trace file. Sessions can be monitored in memory through LTOM and only written out to a file when that
session's wait exceeds some threshold value.
Step 1. Configure the recorder
The session recorder can be configured to either trace to a file or to a memory buffer. To configure the recorder, edit the $TOM_HOME/init/sessionRecorder.properties file. A new rule needs to be
added and defined that will trace any session waiting on the particular wait event. A rule can be defined that will either trace directly to a file or trace to a circular memory buffer once a minimum threshold value has been exceeded. The memory contents
will be dumped to a file once a second threshold value has been exceeded. In this example the following line will be added to the properties file to define a rule for tracing sessions waiting on "global cache cr request" to memory...
EVENT=global cache cr request, VALUE=5, 100
What is important to note is that the event name defined in the rule must be exactly the same name as the name column from v$event_name. Also note two values have been specified. The first value
(5), specifies a minimum threshold value, in centiseconds, to turn on the 10046 trace. In this case, any session that waits on "global cache cr request" for a period of .05 seconds during the sampling interval will have it's session traced in memory. The
second value (100), specifies a maximum threshold value, in centiseconds, to dump the contents of the memory buffer to a file. This means that tracing will be started in memory once any session waits on "global cache cr request" for a period of .05 seconds
and will continue to be traced indefinitely until it has been turned off manually or the session terminates for whatever reason. Once the maximum threshold value, in this case 1.0 seconds is exceeded, the entire contents of the trace buffer is dumped from
memory to that sessions respective trace file in the udump/bdump.
Step 2. Turn on session tracing through LTOM
Start LTOM and login.
Select option 7 to begin session recording. You will then be asked to respond to the following prompts...
Enter a polling frequency in seconds. This is the sampling interval LTOM uses to check if the threshold values specified in the
rules get violated. A recommended sampling frequency would be 5 seconds.
Trace sessions to memory or file. Although a rule has already been defined that will trace to memory you can always override it
here. Specify M to trace to memory.
Enter amount of memory for each trace buffer in bytes. Whatever value you enter here will be multiplied by the number of sessions
that are actually traced. You should consider how much free memory you have on your system. A recommended amount would be 50000 bytes.
Enter max processes to trace. This serves as a safety valve. In this example we have 1000 concurrent sessions. Unless we limit the
number of sessions being traced in theory we could get all 1000 sessions being traced each consuming, in this example, 50,000 bytes. It is recommended that the number of sessions traced be limited to a reasonable value as to prevent something unexpected from
happening. A recommended amount would be 5-10 sessions.
Step 3. Monitor/Control session tracing through LTOM
Please Note: It can take up to 3 times the polling frequency for the following values to be displayed back to the user properly. This is
due to the polling frequency which basically causes the program to sleep between sampling and also because we are collecting multiple samples and performing computations.
Select option 71 to display current sessions being traced.
Select option 72 to manually force all sessions traced in memory to their respective trace files in the udump/bdump.
Select option 73 to manually force a particular session's trace in memory to its respective trace file in the udump/bdump.
Select option 74 to stop a specific session from being traced. It is important to note if you disable a session from being traced that session can no longer have tracing enabled and to re-enable
that session tracing you would need to stop all session tracing with option 8 and then restart the session recorder with option 7.
Step 4. Stop the Session Recorder
Select option 8 to stop the session recorder. This option disables any tracing turned on by LTOM.
Step 5. Review the 10046 trace
Each session that was traced through LTOM has produced a trace file in the udump/bdump.
Step 6. Send 10046 trace files to support

Back to Contents

Appendix D: Sample System Profiler File

LTOM Version=4.1.2

HOSTNAME=coehq2

HOSTOS=SunOS

DB_VERSION=9.2.0.1.0

CPU_COUNT=2

PHYSICAL_MEMORY=1024000000

######################################################################

# Copyright (c) 2008 by Oracle Corporation

# LTOM REPORT V4.1.1

#

# This report is generated by running the System Profiler option of

# LTOM. As this report is configurable at runtime some sections of this

# report may be missing if the option was not selected by the user.

# This report looks best if viewed in 132 column mode.

# The following sections repeat for each snapshot interval N...

#

######################################################################

---------------SNAPSHOT# N

system timestamp

---------------VMSTAT:---

current vmstat snapshot from unix vmstat utility

---------------OS TOP CPU PROCESSES:---

current top os processes from unix top utility

---------------ORACLE SESSIONS:---

current oracle session and process information

SID           V$session.sid

PID           v$process.pid

SPID          v$process.spid

%CPU          %cpu from os

TCPU          total cpu in seconds from os

MCPU          v$sesstat.CPU used by this session

             (in 10s of milliseconds. value is the delta value between snapshot)

PROGRAM       v$process.program

USERNAME      v$session.username

EVENT         v$session_wait.event

SEQ           v$session_wait.seq#

SECS          v$session_wait.seconds_in_wait

WAIT_TIME     v$session_wait.wait_time

P1            v$session_wait.p1

P2            v$session_wait.p2

P3            v$session_wait.p3

P1RAW         v$session_wait.p1raw

P2RAW         v$session_wait.p2raw

P3RAW         v$session_wait.p3raw

HASH_VALUE    v$session.sql_hash_value

SQL_ADDRESS   v$session.sql_address

ET            v$session.last_call_et

LOGICAL_READS v$sesstat.session logical reads

USER_COMMITS v$sesstat.user commits

PGA           v$sesstat.session pga memory

CALLS         v$sesstat.session user calls

RSIZE         memory resident size from os

VSIZE         memory virtual size from os

PGA_ALLOC_MEM v$process.pga_alloc_mem

DB_TIME_TOTAL v$sesstat.db_time (V10 only)

CPU_TOTAL     v$sesstat.CPU used by this session

DB_TIME       v$sesstat.db_time (V10 only)

             (in 10s of milliseconds. value is the delta value between snapshot)

MODULE        v$session.module

---------------CURRENT SQL EXECUTING:---

SID v$session.sid

HASH VALUE v$session.sql_hash_value

SQL_ADDRESS v$session.sql_address

LAST_CALL v$session.last_call_et

SQL_TEXT v$sqltext.sql_text

---------------SYSTEM STATISTICS:---

Values are delta values calculated between snapshots from

v$sysstat. Only non zero values are reported.

---------------AVERAGE SYSTEM WAITS IN HUNDREDTHS OF SECONDS:---

Values are delta values calculated between snapshots from

v$system_event. Only non zero values are reported.

---------------SYSTEM WAITS:---

Values are delta values calculated between snapshots from

v$system_event. Only non zero values are reported.

---------------SQL EXECUTED DURING THIS REPORT DURING SNAPSHOT:---

HASH VALUE v$session.sql_hash_value

SQL_ADDRESS v$session.sql_address

SQL_TEXT v$sqltext.sql_text

######################################################################

# REPORT BEGINS BELOW THIS LINE

######################################################################

---------------SNAPSHOT# 1

Tue Sept 25 16:00:00 EDT 2007

---------------VMSTAT:---

r b w   swap   free  re   mf pi po fr de sr dd dd f0 s0  in   sy   cs us sy id wa zy
1 0 0 665112 219424 201 1684  0  0  0  0  0  0  0  0  3 324 6229 1614 12 26 61 50 zz

---------------OS TOP CPU PROCESSES:---

load averages: 1.72, 2.04, 2.07 13:05:02

184 processes: 183 sleeping, 1 on cpu

Memory: 2048M real, 211M free, 1477M swap in use, 645M swap free

PID USERNAME THR PRI NICE  SIZE   RES STATE  TIME   CPU COMMAND
26184 cedavis   16  18   10   51M   21M sleep  0:06 7.73% java
22050 cedavis   22  49    0  336M  264M sleep 92:11 3.62% java
26815 oracle     1  59    0    0K    0K sleep  0:00 1.73% oracle
524 root       1  59    0   45M   75M sleep 19:56 0.94% Xsun
25549 oracle     1  48    0    0K    0K sleep  0:00 0.67% oracle
26816 cedavis    1   0   10 1632K 1072K   cpu  0:00 0.44% top
13212 cedavis   17  18   10   95M   50M sleep 17:56 0.28% java
409 root       1  12    0 1128K  824K sleep 43:38 0.25% init.cssd

---------------ORACLE SESSIONS:---

SID PID SPID %CPU TCPU MCPU PROGRAM USERNAME EVENT SEQ SECS WAIT_TIME P1 P2 P3 HASH VALUE SQL_ADDRESS ET

124 37 13517 0.0 0:00 0 O000 null class slave wait 1 21846 0 0 00 0 00 0 00 0 00 21846

126 35 26815 1.8 0:00 0 UNKNOWN TOM SQL*Net message from client 32 0 2 1952673792 0000000074637000 1 0000000000000001 0 00 3796581998 000000039ACC0F20 0

129 29 11308 0.0 0:07 0 TNS SYS SQL*Net message from client 23 609345 0 1650815232 0000000062657100 1 0000000000000001 0 00 0 00 609345

130 33 18609 0.0 0:05 0 TNS SYS SQL*Net message from client 11837 3 0 1650815232 0000000062657100 1 0000000000000001 0 00 3364942409 00000003975D6AB8 3

131 32 18607 0.0 3:54 0 TNS SYS Streams AQ: waiting for messages in the queue 26216 7 0 9732 0000000000002604 15643062048 00000003A4662F20 10 000000000000000A 2346103937 000000039775BEB8 7

133 31 18652 0.0 0:00 0 TNS SYS SQL*Net message from client 16 851210 0 1650815232 0000000062657100 1 0000000000000001 0 00 0 00 851210

134 30 18567 0.0 1:09 0 TNS SYS SQL*Net message from client 32547 340 0 1650815232 0000000062657100 1 0000000000000001 0 00 0 00 340

135 28 22029 0.0 0:01 0 TNS CARL SQL*Net message from client 94 772200 0 1650815232 0000000062657100 1 0000000000000001 0 00 0 00 772200

137 26 21352 0.0 0:07 0 TNS SYS SQL*Net message from client 20 607909 0 1650815232 0000000062657100 1 0000000000000001 0 00 0 00 607909

138 34 11767 0.0 0:02 0 TNS SYS SQL*Net message from client 2033 1223 0 1650815232 0000000062657100 1 0000000000000001 0 00 0 00 1223

140 27 18519 0.0 0:00 0 q001 null Streams AQ: waiting for time management or cleanup tasks 1 851228 0 0 00 0 00 0 00 3393152264 00000003A4122A90 851228

144 25 18388 0.0 0:03 0 QMNC null Streams AQ: qmn coordinator idle wait 54536 16 0 0 00 0 00 0 00 0 00 851240

147 24 24006 0.0 0:12 0 TNS SYS enq: TM - contention 53 609402 0 1414332422 00000000544D0006 51578 000000000000C97A 0 00 3630001660 00000003975CD560 609402

148 23 25549 0.6 0:01 0 UNKNOWN TOM SQL*Net message from client 180 0 0 1952673792 0000000074637000 1 0000000000000001 0 00 0 00 0

149 22 18066 0.0 0:03 0 RBAL null rdbms ipc message 11 502140 0 300 000000000000012C 0 00 0 00 0 00 851264

150 21 18056 0.0 0:45 0 ASMB null ASM background timer 3 851261 0 0 00 0 00 0 00 0 00 851264

151 20 18513 0.0 0:01 0 q000 null Streams AQ: qmn slave idle wait 1 851228 0 0 00 0 00 0 00 0 00 851228

154 19 18043 0.0 2:20 0 LCK0 null rdbms ipc message 16517 3 0 300 000000000000012C 0 00 0 00 0 00 851267

155 16 18008 0.1 10:06 0 MMNL null rdbms ipc message 21 332994 0 100 0000000000000064 0 00 0 00 0 00 851275

156 15 18006 0.0 3:38 0 MMON null rdbms ipc message 46461 19 0 300 000000000000012C 0 00 0 00 3393152264 00000003A4122A90 851275

157 14 17998 0.0 5:19 0 CJQ0 null rdbms ipc message 25290 0 0 175 00000000000000AF 0 00 0 00 0 00 851275

158 13 17996 0.0 0:00 0 RECO null rdbms ipc message 15 66401 0 180000 000000000002BF20 0 00 0 00 0 00 851275

159 12 17993 0.0 5:00 0 SMON null smon timer 4934 27848 0 300 000000000000012C 0 00 0 00 0 00 851275

161 11 17987 0.1 7:24 0 CKPT null rdbms ipc message 43428 3 0 300 000000000000012C 0 00 0 00 0 00 851275

162 10 17985 0.0 0:33 0 LGWR null rdbms ipc message 62412 34 0 300 000000000000012C 0 00 0 00 0 00 851275

163 9 17980 0.0 2:01 0 DBW0 null rdbms ipc message 17491 34 0 300 000000000000012C 0 00 0 00 0 00 851275

164 8 17977 0.0 0:06 0 MMAN null rdbms ipc message 16 831704 0 300 000000000000012C 0 00 0 00 0 00 851275

165 7 17969 0.0 0:45 0 LMS0 null gcs remote message 6 851267 0 24 0000000000000018 0 00 0 00 0 00 851275

166 6 17964 0.0 0:43 0 LMD0 null ges remote message 4 851267 0 64 0000000000000040 0 00 0 00 0 00 851275

167 5 17961 0.0 1:49 0 LMON null rdbms ipc message 2771 0 0 10 000000000000000A 0 00 0 00 0 00 851275

168 4 17953 0.0 0:08 0 PSP0 null rdbms ipc message 1992 145 0 300 000000000000012C 0 00 0 00 0 00 851275

169 3 17951 0.0 0:23 0 DIAG null DIAG idle wait 1 851275 0 1 0000000000000001 1 0000000000000001 200 00000000000000C8 0 00 851275

170 2 17948 0.0 2:06 0 PMON null pmon timer 7 851246 0 300 000000000000012C 0 00 0 00 0 00 851275

Session Wait query's elapsed time was= 226 msec

---------------SYSTEM STATISTICS:---

CPU used by this session= + 19

CPU used when call started= + 18

DB time= + 1019

DBWR checkpoint buffers written= + 242

DBWR transaction table writes= + 8

DBWR undo block writes= + 50

SQL*Net roundtrips to/from client= + 51

application wait time= + 1364

background timeouts= + 64

buffer is not pinned count= + 28

buffer is pinned count= + 1

bytes received via SQL*Net from client= + 11045

bytes sent via SQL*Net to client= + 17881

calls to get snapshot scn: kcmgss= + 38

cluster key scan block gets= + 8

cluster key scans= + 8

consistent gets= + 71

consistent gets - examination= + 33

consistent gets from cache= + 71

enqueue conversions= + 11

enqueue releases= + 437

enqueue requests= + 437

execute count= + 29

gc CPU used by this session= + 1

global enqueue CPU used by this session= + 1

global enqueue gets sync= + 31

global enqueue releases= + 31

index fetch by key= + 12

index scans kdiixs1= + 13

messages received= + 177

messages sent= + 177

no work - consistent read gets= + 19

opened cursors cumulative= + 18

opened cursors current= + 1

parse count (hard)= + 1

parse count (total)= + 20

parse time cpu= + 5

parse time elapsed= + 5

physical read total IO requests= + 13

physical read total bytes= + 212992

physical write IO requests= + 176

physical write bytes= + 1982464

physical write total IO requests= + 185

physical write total bytes= + 2079232

physical write total multi block requests= + 28

physical writes= + 242

physical writes from cache= + 242

physical writes non checkpoint= + 33

recursive calls= + 272

recursive cpu usage= + 5

redo blocks written= + 29

redo entries= + 176

redo size= + 13200

redo synch writes= + 14

redo wastage= + 1104

redo write time= + 1

redo writes= + 4

rows fetched via callback= + 3

session cursor cache count= + 3

session cursor cache hits= + 8

session logical reads= + 71

session pga memory max= + 131072

session uga memory= + 65408

session uga memory max= + 65408

shared hash latch upgrades - no wait= + 13

sorts (memory)= + 22

sorts (rows)= + 376

table fetch by rowid= + 10

table scan blocks gotten= + 3

table scan rows gotten= + 3

table scans (short tables)= + 3

user calls= + 51

user rollbacks= + 2

workarea executions - optimal= + 14

System Statistics query's elapsed time was= 16 msec

---------------AVERAGE SYSTEM WAITS IN HUNDREDTHS OF SECONDS:---

ASM background timer Average Wait = 489.0

DIAG idle wait Average Wait = 20.0

KJC: Wait for msg sends to complete Average Wait = 20.0

PX Deq: Execute Reply Average Wait = 8.0

PX Deq: Execution Msg Average Wait = 170.0

PX Idle Wait Average Wait = 244.0

SQL*Net message from client Average Wait = 69.0

Streams AQ: qmn coordinator idle wait Average Wait = 1392.0

Streams AQ: qmn slave idle wait Average Wait = 2786.0

Streams AQ: waiting for messages in the queue Average Wait = 977.0

class slave wait Average Wait = 11718.0

control file parallel write Average Wait = 2.0

control file sequential read Average Wait = 1.0

direct path read temp Average Wait = 5.0

direct path write temp Average Wait = 5.0

dispatcher timer Average Wait = 5860.0

gcs remote message Average Wait = 3.0

ges remote message Average Wait = 7.0

lms flush message acks Average Wait = 9.0

pmon timer Average Wait = 259.0

rdbms ipc message Average Wait = 73.0

reliable message Average Wait = 5.0

smon timer Average Wait = 4379.0

virtual circuit status Average Wait = 2931.0

Average System Waits query's elapsed time was= 58 msec

---------------SYSTEM WAITS:---

ASM background timer= + 2

CGS wait for IPC msg= + 125

DIAG idle wait= + 64

SQL*Net break/reset to client= + 2

SQL*Net message from client= + 48

SQL*Net message to client= + 48

SQL*Net more data from client= + 3

Streams AQ: RAC qmn coordinator idle wait= + 2

Streams AQ: qmn coordinator idle wait= + 2

Streams AQ: qmn slave idle wait= + 1

Streams AQ: waiting for messages in the queue= + 1

control file parallel write= + 5

control file sequential read= + 13

db file parallel write= + 176

enq: TM - contention= + 27

gcs remote message= + 360

ges remote message= + 151

ksxr poll remote instances= + 12

log file parallel write= + 4

pmon timer= + 5

rdbms ipc message= + 191

rdbms ipc reply= + 1

reliable message= + 2

virtual circuit status= + 1

System Waits query's elapsed time was= 23 msec

---------------SQL EXECUTED DURING THIS REPORT DETECTED DURING SNAPSHOTS:---

HASH VALUE  SQL_ADDRESS              SQL_TEXT
3630001660  00000003975CD560  lock table carl.junk in exclusive mode
3364942409  00000003975D6AB8      DECLARE      reason_id    dbms_server_alert.REASON_ID_T := N
3364942409  00000003975D6AB8  ULL;      resource_id  NUMBER;      db_name      recent_resource
3364942409  00000003975D6AB8  _incarnations$.db_unique_name%TYPE :=                   :db_uniq
3364942409  00000003975D6AB8  ue_name;      inst_name    recent_resource_incarnations$.instanc
3364942409  00000003975D6AB8  e_name%TYPE :=                   :instance_name;      event_id
3364942409  00000003975D6AB8     NUMBER := :event_id;      event_time   TIMESTAMP WITH TIME ZO
3364942409  00000003975D6AB8  NE      :=                   TO_TIMESTAMP_TZ(:event_time,
3364942409  00000003975D6AB8                              'YYYY-MM-DD HH24:MI:SS.FF TZH:TZM',
3364942409  00000003975D6AB8                                    'NLS_CALENDAR=''Gregorian''');
3364942409  00000003975D6AB8      BEGIN      CASE :reason_name        WHEN 'DATABASE_UP' THEN
3364942409  00000003975D6AB8           reason_id := dbms_server_alert.RSN_FAN_DATABASE_UP;
3364942409  00000003975D6AB8      WHEN 'DATABASE_DOWN' THEN          reason_id := dbms_server_
3364942409  00000003975D6AB8  alert.RSN_FAN_DATABASE_DOWN;        WHEN 'INSTANCE_UP'  THEN
3364942409  00000003975D6AB8        reason_id := dbms_server_alert.RSN_FAN_INSTANCE_UP;
3364942409  00000003975D6AB8    WHEN 'INSTANCE_DOWN' THEN          reason_id := dbms_server_al
3364942409  00000003975D6AB8  ert.RSN_FAN_INSTANCE_DOWN;        WHEN 'SERVICE_UP' THEN
3364942409  00000003975D6AB8    reason_id := dbms_server_alert.RSN_FAN_SERVICE_UP;        WHEN
3364942409  00000003975D6AB8   'SERVICE_DOWN' THEN          reason_id := dbms_server_alert.RSN
3364942409  00000003975D6AB8  _FAN_SERVICE_DOWN;        WHEN 'SERVICE_MEMBER_UP' THEN
3364942409  00000003975D6AB8   reason_id := dbms_server_alert.RSN_FAN_SERVICE_MEMBER_UP;
3364942409  00000003975D6AB8    WHEN 'SERVICE_MEMBER_DOWN' THEN          reason_id := dbms_ser
3364942409  00000003975D6AB8  ver_alert.RSN_FAN_SERVICE_MEMBER_DOWN;        WHEN 'SVC_PRECONNE
3364942409  00000003975D6AB8  CT_UP' THEN          reason_id := dbms_server_alert.RSN_FAN_SVC_
3364942409  00000003975D6AB8  PRECONNECT_UP;        WHEN 'SVC_PRECONNECT_DOWN' THEN          r
3364942409  00000003975D6AB8  eason_id := dbms_server_alert.RSN_FAN_SVC_PRECONNECT_DOWN;
3364942409  00000003975D6AB8    WHEN 'NODE_DOWN' THEN          reason_id := dbms_server_alert.
3364942409  00000003975D6AB8  RSN_FAN_NODE_DOWN;        WHEN 'ASM_INSTANCE_UP'  THEN
3364942409  00000003975D6AB8  reason_id := dbms_server_alert.RSN_FAN_ASM_INSTANCE_UP;
3364942409  00000003975D6AB8  WHEN 'ASM_INSTANCE_DOWN' THEN          reason_id := dbms_server_
3364942409  00000003975D6AB8  alert.RSN_FAN_ASM_INSTANCE_DOWN;      END CASE;      IF :use_res
3364942409  00000003975D6AB8  ource_id = 'Y' THEN        BEGIN          SELECT resource_id
3364942409  00000003975D6AB8          INTO resource_id            FROM recent_resource_incarna
3364942409  00000003975D6AB8  tions$           WHERE resource_type = 'INSTANCE'             AN
3364942409  00000003975D6AB8  D db_unique_name = db_name             AND db_domain=NVL(SYS_CON
3364942409  00000003975D6AB8  TEXT('USERENV','DB_DOMAIN'),'==N/A==')             AND instance_
3364942409  00000003975D6AB8  name = inst_name             AND startup_time = (SELECT MAX(star
3364942409  00000003975D6AB8  tup_time)                                   FROM recent_resource
3364942409  00000003975D6AB8  _incarnations$                                  WHERE resource_t
3364942409  00000003975D6AB8  ype = 'INSTANCE'                                    AND db_uniqu
3364942409  00000003975D6AB8  e_name = db_name                                    AND db_domai
3364942409  00000003975D6AB8  n =                                        NVL(SYS_CONTEXT('USER
3364942409  00000003975D6AB8  ENV',                                                         'D
3364942409  00000003975D6AB8  B_DOMAIN'),                                            '==N/A=='
3364942409  00000003975D6AB8  )                                    AND instance_name = inst_na
3364942409  00000003975D6AB8  me                                    AND from_tz(startup_time,
3364942409  00000003975D6AB8  '+00:00') <                                        event_time);
3364942409  00000003975D6AB8         EXCEPTION          WHEN NO_DATA_FOUND THEN RETURN;
3364942409  00000003975D6AB8     WHEN OTHERS THEN RAISE;        END;        event_id := 214748
3364942409  00000003975D6AB8  3648 + BITAND(event_id * 128, 2147483648-1)
3364942409  00000003975D6AB8            + resource_id;      END IF;      dbms_ha_alerts_prvt.p
3364942409  00000003975D6AB8  ost_ha_alert(        reason_id            => reason_id,        s
3364942409  00000003975D6AB8  ame_transaction     => FALSE,        clear_old_alert      => FAL
3364942409  00000003975D6AB8  SE,        database_unique_name => db_name,        instance_name
3364942409  00000003975D6AB8          => inst_name,        service_name         => :service_na
3364942409  00000003975D6AB8  me,        host_name            => :host_name,        incarnatio
3364942409  00000003975D6AB8  n          => :incarnation,        event_reason         => :even
3364942409  00000003975D6AB8  t_reason,        event_time           => event_time,        card
3364942409  00000003975D6AB8  inality          => :cardinality,        event_id             =>
3364942409  00000003975D6AB8   event_id,        timeout_seconds      => :alert_timeout_seconds
3364942409  00000003975D6AB8  ,        immediate_timeout    => :immed_timeout = 'Y',        du
3364942409  00000003975D6AB8  plicates_ok        => TRUE);    END;
3796581998  000000039ACC0F20  select s.sid, s.type, pid, spid, p.program, s.username, sw.event
3796581998  000000039ACC0F20  , sw.seq# seq,             sw.seconds_in_wait, sw.wait_time, sw.
3796581998  000000039ACC0F20  p1, sw.p2, sw.p3, sw.p1raw, sw.p2raw, sw.p3raw, ss.value,
3796581998  000000039ACC0F20       s.sql_hash_value, s.sql_address, s.last_call_et from
3796581998  000000039ACC0F20       v$session s, v$session_wait sw, v$process p, v$sesstat ss w
3796581998  000000039ACC0F20  here            (p.addr = s.paddr) and (s.sid = ss.sid)
3796581998  000000039ACC0F20     and (s.sid = sw.sid) and (ss.statistic# = 12)             ord
3796581998  000000039ACC0F20  er by s.sid

Back to Contents

Legal Notices and Terms of Use