您的位置：首页 > 移动开发

Enabling User-Controlled Collection of Application Crash Data With DTrace

2011-05-26 10:55 876 查看

Enabling User-Controlled Collection of Application Crash Data With DTrace

By Greg Nakhimovsky and Morgan Herrington, May 2005

Abstract: This article introduces

AppCrash

, a tool for the automatic collection of diagnostic and debugging information when an application crashes under the Solaris Operating System.
Contents:
Introduction

Previous Solutions

AppCrash: A DTrace-Based Solution

Non-root Running of DTrace and Related Security Issues

Implementation Details

Example

Solutions on Other Systems

Conclusion

References

Introduction

AppCrash

is a tool for the automatic collection of diagnostic and debugging information when any application crashes under a Solaris system. The tool does not require any changes to the applications or to the operating system and is based on DTrace (Dynamic Tracing), a new facility introduced in the Solaris 10 OS.

AppCrash

can help significantly reduce the cost of software defects by shortening the time needed to gather data necessary for application's customer support, quality assurance (QA), and development staff, as well as for Sun technical support engineers. This tool can be especially useful to track the specific details involved with sporadic, hard-to-reproduce application failures.

AppCrash

enables creation of postprocessing tools that can sort the failure reports and statistically analyze where the application software crashes most frequently. It also enables tools that can provide advice on how each type of problem can be debugged efficiently.
An important feature making the

AppCrash

method different from others is that the users (including the application developers and/or system administrators at user sites) can control precisely what data is collected and how it is processed. This allows collection of the necessary data without violating any of the users' security or privacy requirements.
Note that by application we mean any software other than the Solaris kernel. This includes middleware such as Mozilla, StarOffice software, various components of GNOME, OpenGL graphics, application servers, and others.
The article is intended for software engineers working at independent software vendors (ISVs), as well as for system administrators and end users working with Solaris applications.
The Problem

As an example, consider a large facility with many users running several multi-process and/or multi-tier applications. What steps are taken if one of the application processes crashes?
In the best of situations, the end user will recognize the specific inputs and conditions needed to reproduce the problem, and the application development team will be able to quickly reproduce and fix the problem.
Unfortunately, in some cases the user will only be able to correctly specify some of the input and environment details, will incorrectly describe others, and won't be able to specify some of them at all. First, the software developers will need to discover the basic details needed to start the debugging process. These details may include the traceback (also known as backtrace or stack trace) of the failed application, memory map, environment variables active in the process, the OS version, patches installed, available swap space, and more. Manually gathering this information for every failure is costly, time consuming and error prone, and may be extremely difficult for sporadic cases.
Automatic and semiautomatic solutions exist for some systems (see the section Solutions on Other Systems below), but they may create security and privacy problems because the user has little or no control of the information gathered and forwarded to the application and system vendors. For example, see reference [1].
Previous Solutions

Core Files: Historically, failure analysis on UNIX systems has been based on collecting and inspecting the core file created when an application crashes. (Note: Application "core dump", "coredump", "core file", and "corefile" all mean the same thing.) However, the ways applications are created and used have dramatically changed since UNIX was created in early 1970s. Typical application size and memory requirements are orders of magnitude larger than they were then. The users are not the same people as the application programmers; many end users do not know what a debugger is or how to use one. Due to the ubiquity of the Internet, security and privacy are much more serious concerns now than they were then.
Currently, the Core File method suffers from several types of problems:
Security and privacy concerns, which are obviously very important, now more than ever. Core files often contain data that must not be sent outside of the user site (for details, see references [1], [2], [3]), such as:

Classified data

Proprietary data

Private (such as financial) data

Size problems. Core files can be huge (multi-gigabyte range), leading to:

Long time to dump, negatively affecting system performance (core files are dumped at kernel priority, making everything else stop)

Possibly overfilling disk with bad to disastrous (if on root partition) consequences

Size too large to send to vendor (cost, time, inconvenience)

Technical difficulties

An alternative to sending the core file to the vendor is for the end user to perform the analysis in place. However, this places an extra burden (time and effort) on the end user, and slows the debugging process down. Also, some users simply can't or don't want to perform such an analysis.

Local analysis may require a debugger and the ability to use it (cost, time, inconvenience).

An application-level debugger (such as

dbx

) used with a core file has severely limited functionality compared to debugging a running program.

Since application binaries at user sites are most often optimized, non-debuggable, and stripped,

dbx

functionality with those core files is even more limited (much more limited). As a result, in most cases the ISVs cannot get much more from an application core dump than a traceback, a memory map, and so on, all of which can be obtained without a core dump.

Most application developers can't and don't want to do hardware-specific assembly-level debugging of such core files. Therefore, even when a core file contains additional debugging information, the ISVs can't use it in practice in most cases.

Many special problems are associated with core files produced on one (end-user) machine and then copied to another (ISV) machine for debugging. For additional information, issue a

help core mismatch

command in

dbx

; also see reference [4]. Due to these problems, the application core dumps often can't be used on a different computer at all, even if the users deliver them to the ISV. These problems are gradually being alleviated, but most of them are still present.

No core file available (largely as a result of all of the above)

Many applications inhibit core dumps altogether (by installing a signal handler or by calling

setrlimit(RLIMIT_CORE, &zero)

), so no core file is available for analysis. Also, some users and user sites either disable generation of core files completely (with user limit set to zero or with the

coreadm(1)

command) or have their user limit set so restrictively that an entire core can't be saved.

Interpose Libraries: Another technical approach to dealing with application crashes is to implement an interpose library that would install its own signal handlers for

SIGBUS

and

SIGSEGV

(which would then capture state information for an error report). For example, see reference [5]. However, this is difficult to implement in production because:
The library must not interfere with the normal signal handling operations of the application.

Each application invocation must have its environment changed in order to interpose this library. This could either be done by having every user change their environment or by adding a wrapper script around the invocation of every application to be watched. Neither solution is convenient for a large number of users or a large number of applications.

Creating library interposers requires a compiler and a certain level of system programming skills. Many users need an easier solution.

Generally, library interposition is a debugging and performance-tuning technique not recommended for production use. We need a method that can be used all the time in production.

Application Signal Handlers: Of course, it is also possible to install signal handlers for

SIGBUS

and

SIGSEGV

in each application and do whatever is deemed necessary there. However, this approach requires the ISVs to change every application. Even if the ISVs do it, their handling of the crashes won't be uniform. Also note that programming signal handlers can be tricky and platform-dependent. For example, a signal handler should never call any routine that is not Async-Signal-Safe. (For the definition of Async-Signal-Safe, see the

attributes(5)

man page for the Solaris OS.)
Some ISVs have handled these signals for years and done in their signal handlers whatever works for them. However, this approach does not work for most applications. We need a way to handle application crashes system-wide, without changing any application in any way.
truss(1)
: One more interesting solution is to use

truss

to monitor an application watching for a signal. Aside from its more common role of tracing system and library calls,

truss

has the ability to silently watch a process and wait for a specific set of signals.
For example, the following sequence invokes an application in the background, with

truss

then watching for

SIGSEGV

and

SIGBUS

. If either signal happens,

truss

leaves the application in the

STOPPED

state. If the

kill

command detects that the process still exists, then the process memory map and traceback are captured using the

/proc

utilities

pmap(1)

and

pstack(1)

. Finally, the process is allowed to exit by being restarted with the

prun(1)

command.

application_invocation &
pid=$!
truss -t /!all -m/!all -s /!all -S segv,bus -p $pid
if kill -0 $pid ; then
pmap   $pid
pstack $pid
prun   $pid
fi

The main problem with the

truss(1)

solution is that it is limited to handling a single process per invocation (whereas many applications create hierarchies of processes). Also, like the library interposition technique, this requires each application to have an invocation script which must be edited to add this extra processing.
AppCrash: A DTrace-Based Solution

The solution we are proposing in this article uses the new Solaris facility, DTrace, to watch for application crashes and to process each with a user-supplied reporting script.
DTrace is a powerful facility introduced in the Solaris 10 OS for kernel and application performance tuning and debugging (see reference [6] and the

dtrace(1M)

man page for the Solaris OS). It works system-wide at the Solaris kernel level by dynamically instrumenting both the kernel and the application code. The dynamically inserted probes impose little or no overhead when disabled and can be used safely on production systems.
Our implementation of this solution consists of the following DTrace script. (Note: Please save file without

.txt

suffix.)

app_crash.d

This is combined with a user-defined shell script such as the following template. (Note: Please save file without

.txt

suffix.)

runme_on_app_crash

While individual users can use

AppCrash

to monitor their own applications, it can also be used by a system administrator to watch applications being run by all users. To accomplish this, the administrator could start it at boot time as a daemon using the historical RC script mechanism (for example by creating

/etc/rc2.d/S97app_crash

) or using the new Solaris Service Management Facility (SMF), see reference [7]. Note: To produce a well-behaved Solaris daemon, starting

app_crash.d

with

nohup(1)

would be a good idea. Also, a Perl script could be used for creating a daemon as described in reference [8].
Once the

app_crash.d

daemon is running, DTrace will react to any application generating a

SIGSEGV

SIGBUS

signal. If the process has environment variable

$ON_APP_CRASH_INVOKE

defined as a path to a user-controlled shell script (such as

runme_on_app_crash

), then the DTrace script will do the following:
Stop the process when the

SIGSEGV

SIGBUS

signal is generated.

Run the user-defined shell script (such as

runme_on_app_crash

) to collect all the necessary debugging data.

Resume normal processing. In particular, if the settings are to produce a core dump, then that is what will happen.

The described design using the environment variable

$ON_APP_CRASH_INVOKE

to specify a user-controlled shell script gives the users (either end users or the ISVs) complete control over the actions to be taken. The usage could be tailored to be application-specific (by setting it in the application wrapper script) or user-specific (by setting it in the user's personal startup script). Of course, it also lets the users or their system administrators do anything they want in the invoked script to fully control the debugging information collected by that script. The

runme_on_app_crash

provided above is only a template.
For example, a particular ISV may wish to install DTrace script

app_crash.d

to be run as a daemon, and then set the

$ON_APP_CRASH_INVOKE

environment variable in the startup script of its application. This way,

AppCrash

will only be used for that application and nothing else.
Once the users have collected all the necessary information, they can review it, make sure that it contains no sensitive information, and then send it to their application vendor. If desired, they can even automate this emailing as a part of the script (see commented lines at the end of shell script

runme_on_app_crash

).
Note that by itself the collected information may not be adequate for the ISV or Sun engineers to resolve the problem. No one can guarantee that the problem will be solved based on this information only. It is always best for the users to come up with a reproducible test case and provide that to their application software vendor. Nevertheless, the information collected with a script like

runme_on_app_crash

will definitely help the debugging process, and in many cases may be enough to resolve the problem.
If the users see question marks instead of the function names in the traceback, they can send the information to the ISV anyway. The application owners should be able to restore the actual function names, for example using a tool called

unstrip_traceback

described in reference [5], Generating and Handling Application Traceback on Crash. Here is an updated version of that Perl script. (Note: Please save file without

.txt

suffix.)

unstrip_traceback

More possibilities are enabled by

AppCrash

:
ISVs may be able to collect the automatically generated crash reports and develop an automatic or semiautomatic system to statistically analyze where their applications crash most frequently. This could seriously help the QA efforts, as well as enable creation of metrics that could be used to provide efficient incentives to the development and QA engineers (for example, the fewer the crashes in your software, the larger the bonus you get).

A rule-based system can also be created to analyze the crash reports and provide advice on how each problem could be debugged most efficiently. For example, if the crash is in

malloc(3C)

free(3C)

, chances are that memory has been corrupted and the best way to debug it is with tools such as

watchmalloc(3MALLOC)

libumem(3LIB)

built into the Solaris OS. Such a rule-based system would reduce the time and the cost of analyzing each crash, thus leading to faster problem report turnaround.

Implementing either or both of the above suggestions would help further reduce the costs of software defects.
Note that any automatic system to analyze multiple crash reports will require an agreed-upon standard defining what each report should contain and in what format. This standard can vary for different applications and user sites. The users and ISVs will still have full control over the gathered information. The involved parties will just need to coordinate it.
Non-root Running of DTrace and Related Security Issues

Running the

app_crash.d

script as a system-wide daemon by

root

is one way of using this method, but strictly speaking it is not the only way. The end users themselves can run this script provided they have been granted DTrace-related permissions, as described in this section.
DTrace scripts like

app_crash.d

can be run either by

root

or by the users who have permissions like the following in the

/etc/user_attr

file:

::::defaultpriv=basic,dtrace_proc,dtrace_kernel

File

/etc/user_attr

is owned by

root

, so only a system administrator with

root

access can modify it. Once

/etc/user_attr

has been modified, the user will need to log off and log on again to activate the new setting. This is a part of the Least Privilege facility providing fine-grained control over the actions of processes. For more information, see

privileges(5)

. The system administrator can also provide such privileges temporarily using the

ppriv(1)

command like this:

# ppriv -s A+dtrace_proc,dtrace_kernel PID

where PID is the process ID of the user's shell.
A word of caution is in order. The DTrace privileges described above will allow the use of all facilities of DTrace (including the kernel facilities). Please use these privileges responsibly and be aware that they could permit Denial of Service (DoS) attacks on your systems.
Using these privileges for running our DTrace script

app_crash.d

described in this article is very safe. However, DTrace scripts can be easily created and used for many other actions, some of which can be destructive. If you do not want to introduce any such risk by allowing ordinary users to have DTrace privileges, you can always run

app_crash.d

as a daemon owned by

root

.
Implementation Details

The DTrace script

app_crash.d

shown in the previous section is quite simple but its operation may not be obvious if you have not been previously exposed to DTrace scripting. Therefore, let us consider what

app_crash.d

does line-by-line.

#!/usr/sbin/dtrace -qws

This means this script can be run directly (assuming that it has appropriate execute permissions) and that it will run the

/usr/sbin/dtrace

binary. "

-q

" means quiet (without generating extra messages). "

" means we allow what are called destructive actions such as the

system()

action that is used to invoke a system command or a shell script. "

" means that what follows is a DTrace script.

#pragma D option strsize=500

This option instructs DTrace to allow strings up to 500 characters long. The default size of 256 is not big enough for our purposes.

proc:::signal-send
/(args[2] == SIGBUS || args[2] == SIGSEGV) &&
pid == args[1]->pr_pid/

In DTrace terminology, these lines specify the use of the

signal-send

probe from the

proc

provider, whenever the

predicate

condition is true. It means the DTrace code following these lines will be executed when any process on the system generates (sends) signal

SIGBUS

SIGSEGV

, where the receiving process (whose process ID is stored in

args[1]->pr_pid

) is the same as the sending process (pid).

stop();

This means the process that generated such a signal is stopped until later notice.

system(
"%s=%d; %s=%d; %s=%d; %s=%s; %s %s %s %s %s %s %s %s %s",
"CRASH_PID",  pid,
"CRASH_UID",  uid,
"DTRACE_UID", $uid,
"PROG",       execname,
"SCRIPT=`/bin/pargs -e $CRASH_PID | ",
"  /bin/grep ON_APP_CRASH_INVOKE | /bin/cut -d= -f2`;",
"[ -z /"$SCRIPT/" -o ! -x /"$SCRIPT/" ] && exit 0;",
"if [ $DTRACE_UID -eq 0 -a $CRASH_UID -ne 0 ] ; then",
"  USER_NAME=`/bin/getent passwd $CRASH_UID|/bin/cut -d: -f1`;",
"  /bin/su $USER_NAME -c /"$SCRIPT $CRASH_PID $PROG/";",
"else ",
"  $SCRIPT $CRASH_PID $PROG; ",
"fi"
);

This long line executes the specified sequence of Bourne shell commands. We could have instead introduced a helper shell script that would be easier to read, but that would complicate installation somewhat, so we chose to use a one-line command.
Note that the

system()

action in DTrace scripts allows argument processing like that of

printf()

.
The above script performs the following steps:
Runs the

pargs(1)

command to extract the value of the environment variable

ON_APP_CRASH_INVOKE

from the crashing process.

Tests if

ON_APP_CRASH_INVOKE

is not defined (

$SCRIPT

is empty) or if the user script it is pointing to (

$SCRIPT

) is not marked executable, in which case the script exits.

Checks if the owner of the

app_crash.d

script is

root

(

$uid

is equal to zero) and the crashing process doesn't belong to

root

. If

root

is running

app_crash.d

, then the script extracts the user name of the owner of the crashing process:

/bin/getent passwd  |/bin/cut -d: -f1

and runs the user-defined script as that user using the

su(1)

command:

su  -c

Alternate ways of determining the user ID, for example by looking at

$USER

, could introduce a security problem (hackers could set their

$USER

environment variable to

root

, set

ON_APP_CRASH_INVOKE

to a script starting a terminal emulator, then crash any application and thus gain access to a root shell).

If the owner of

app_crash.d

is not

root

su

is not used and the user-defined script is run directly. This way, any user can run

app_crash.d

, but it will work only for the processes owned by that user.

As the last step,

app_crash.d

resumes the crashing process with the

prun(1)

command:

system("/bin/prun %d", pid);

The example user-defined script

runme_on_app_crash

does the following:
Obtains the process ID (

$PID

) and program name from the input arguments.

Sends a message to system console (if the permissions allow it).

Runs the following Solaris commands for the crashing process (note that the

pfiles(1)

command prints the relevant pathnames starting with the Solaris 10 OS):

/bin/pstack $PID
/bin/pmap -x $PID
/bin/pldd $PID
/bin/ptree $PID
/bin/pargs -ace $PID
/bin/plimit -m $PID
/bin/pwdx $PID
/bin/pfiles $PID

Extracts system configuration data (see the script for details).

The commands specific to the crashing process are based on the Solaris

proc(4)

facility. They collect potentially useful information about the crashing process: traceback, memory map, library dependencies, process tree, process arguments and environment strings, process limits, working directory, and information about all open files. This information, while brief, in ASCII-text form and easily accessible by the user, can be very useful in the debugging process. The output of all these commands is redirected to file

/var/tmp/appcrash.$PROG.$PID

. Note that files in

/var/tmp

will survive a reboot, while those in

/tmp

normally won't.
A possibility exists that some applications use the signals

SIGSEGV

and

SIGBUS

for some special purposes unrelated to crashing. We have not encountered such programs so far, but they may exist. For such programs,

AppCrash

may create a lot of files in

/var/tmp

and degrade system performance, given enough of those

SIGSEGV/SIGBUS

signals. If this happens, the

AppCrash

scripts may have to be adjusted to account for the unusual situation, for example to exclude certain applications by name. It could be done using the predicate in the DTrace probe or in the

runme_on_app_crash

script.
Also note that one of the advantages of

AppCrash

is that all of its components are scripts that are easy to customize.
Example

Consider the following simple test program which contains a bug. It dereferences a null pointer in subroutine

sub2()

% cat test1.c
#include
#include

static void sub2(int *p)
{
int i;

i = *p;
}

static void sub(int *p)
{
sub2(p);
}

int main()
{
int *p=NULL;

sub(p);
return 0;
}
% cc -o test1 test1.c

Step 1 Let us start the

app_crash.d

daemon, assuming the current user has the necessary permissions to run DTrace as described above.

% ./app_crash.d &
[1] 5707

Step 2 Now define the necessary environment variable in a different terminal window:

% setenv ON_APP_CRASH_INVOKE $HOME/tests/runme_on_app_crash

Step 3 Execute the test1 program in the terminal window where

ON_APP_CRASH_INVOKE

has been defined:

% test1
Segmentation Fault

At the time of the crash, the information was collected in the

/var/tmp

directory as specified in the

runme_on_app_crash

shell script (note that some output lines below have been reformatted for readability):

% ls -lt /var/tmp/ | head -2
total 42
-rw-r--r--   1 gregns   staff
4037 Apr 20 11:30 /var/tmp/appcrash.test1.5174
% cat /var/tmp/appcrash.test1.5174

Output from runme_on_app_crash
Program: test1
Process ID: 5174

Application Debugging Data
--------------------------

> /bin/pstack 5174
5174: test1
08050652 sub2     (0) + 12
08050688 sub      (0) + 18
080506bf main     (1, 8047cec, 8047cf4) + 1f
080505aa ???????? (1, 8047db0, 0, 8047db6, 8047dc8, 8047e49)

> /bin/pmap -x 5174
5174: test1
Address  Kbytes     RSS    Anon  Locked Mode   Mapped File
08047000       4       4       4       - rwx--    [ stack ]
08050000       4       4       -       - r-x--  test1
08060000       4       4       4       - rwx--  test1
FEEE0000       4       4       4       - rwx--    [ anon ]
FEEF0000      24      12      12       - rwx--    [ anon ]
FEF00000     724     724       -       - r-x--  libc.so.1
FEFC5000      24      24      24       - rw---  libc.so.1
FEFCB000       8       8       8       - rw---  libc.so.1
FEFDA000     128     128       -       - r-x--  ld.so.1
FEFFA000       4       4       4       - rwx--  ld.so.1
FEFFB000       8       8       8       - rwx--  ld.so.1
-------- ------- ------- ------- -------
total Kb     936     924      68       -

> /bin/pldd 5174
5174: test1
/lib/libc.so.1

> /bin/ptree 5174
225   /usr/lib/inet/inetd start
5139  /usr/sbin/in.rlogind
5141  -csh
5174  test1

> /bin/pargs -ace 5174
5174: test1
argv[0]: test1

envp[0]: HOME=/home/gregns
... [removed more environment variable settings] ...
envp[19]: ON_APP_CRASH_INVOKE=
/home/gregns/tests/runme_on_app_crash

> /bin/plimit -m 5174
5174: test1
resource             current         maximum
time(seconds)         unlimited       unlimited
file(mbytes)          unlimited       unlimited
data(mbytes)          unlimited       unlimited
stack(mbytes)         10              unlimited
coredump(mbytes)      0               unlimited
nofiles(descriptors)  256             65536
vmemory(mbytes)       unlimited       unlimited

> /bin/pwdx 5174
5174: /home/gregns/tests

> /bin/pfiles 5174
5174: test1
Current rlimit: 256 file descriptors
0: S_IFCHR mode:0620 dev:270,0 ino:12582924 uid:28715
gid:7 rdev:24,4
O_RDWR
/devices/pseudo/pts@0:4
1: S_IFCHR mode:0620 dev:270,0 ino:12582924 uid:28715
gid:7 rdev:24,4
O_RDWR
/devices/pseudo/pts@0:4
2: S_IFCHR mode:0620 dev:270,0 ino:12582924 uid:28715
gid:7 rdev:24,4
O_RDWR
/devices/pseudo/pts@0:4

System Configuration Data
-------------------------

> /bin/uname -a
SunOS rahova 5.10 Generic i86pc i386 i86pc

> /bin/cat /etc/release
Solaris 10 3/05 s10_74L2a X86
Copyright 2005 Sun Microsystems, Inc.
All Rights Reserved.
Use is subject to license terms.
Assembled 22 January 2005

> /usr/sbin/psrinfo -v
Status of virtual processor 0 as of: 04/20/2005 11:30:49
on-line since 03/30/2005 14:43:48.
The i386 processor operates at 2393 MHz,
and has an i387 compatible floating point processor.
Status of virtual processor 1 as of: 04/20/2005 11:30:49
on-line since 03/30/2005 14:43:53.
The i386 processor operates at 2393 MHz,
and has an i387 compatible floating point processor.

> /usr/sbin/swap -s
total: 62464k bytes allocated + 12248k reserved =
74712k used, 6891300k available

> /usr/sbin/swap -l
swapfile             dev  swaplo blocks   free
/dev/dsk/c1t0d0s1   28,65      8 8389432 8389432

> /usr/sbin/prtconf|/bin/head -2
System Configuration:  Sun Microsystems  i86pc
Memory size: 3327 Megabytes

> /bin/showrev -p|/bin/cut -d' ' -f2|/bin/sort
116299-08
116303-02

The above file is ready to be sent to the owner of the faulty application for debugging. Note that the traceback produced by

pstack(1)

clearly points to the routine containing the bug,

sub2()

in this case. It also contains a chain of function calls leading to the faulty routine.
Solutions on Other Systems

Microsoft Windows has an interesting functionality in this area: see reference [9], Windows Error Reporting for Developers.
Not only does Microsoft provide the infrastructure for the ISVs to automatically collect the crash data (which is what

AppCrash

is all about), but it actually collects those error reports from the users and allows the ISVs to access those reports from the above Microsoft site.
Microsoft encrypts the collected data such that only the intended ISV or Microsoft employees can decrypt it. This is not a bad idea, but that method still doesn't allow the users to inspect the data before allowing it to leave their sites. Nor does it let the users control what information to collect.
For more information on how Microsoft collects the data on crashes, see reference [10], Microsoft Online Crash Analysis Data Collection Policy.
For further information about Microsoft minidumps, see reference [11], Post-Mortem Debugging Your Application with Minidumps and Visual Studio .NET.
We think the

AppCrash

method described in this article is more flexible and provides more freedom and power to the users and to the ISVs. Of course the users will decide which approach they prefer.

Apple MAC OS X also appears to have impressive capabilities in this area, although we haven't tested them. For details, see reference [12], Mac OS X CrashReporter.
Specialized commercial products and services are available to perform automated crash monitoring and analysis for applications. For one example, see reference [13].
Related discussions are also available in reference [1] and reference [3].
Conclusion

This article describes a DTrace-based solution allowing ISVs and users of the Solaris OS to safely collect debugging information when any application crashes, and thus help improve the quality of the applications and reduce the costs of software defects. The users can fully automate such diagnostic data collection and transmission if they want, while having full control over which information is collected and sent to the application developer and/or system vendor for analysis and remediation.
For AppCrash updates and related discussions, see reference [14].
References

[1] US Department of Energy: Office XP Error Reporting May Send Sensitive Documents to Microsoft

[2] Protecting sensitive data in memory, by John Viega

[3] Scrash: A System for Generating Secure Crash Information (pdf), by Pete Broadwell, et al.

[4] dbx and System Libraries: Why Can't dbx Read My Process or Core File?, by Chris Quenelle and Ann Rice

[5] Generating and Handling Application Traceback on Crash, by Greg Nakhimovsky

[6] Solaris Dynamic Tracing Guide

[7] Solaris Service Management Facility - Quickstart Guide

[8] Unix Daemons in Perl

[9] Windows Error Reporting for Developers

[10] Microsoft Online Crash Analysis Data Collection Policy

[11] Post-Mortem Debugging Your Application with Minidumps and Visual Studio .NET, by Andy Pennell

[12] Mac OS X CrashReporter

[13] BugSplat Launches Automated Crash Monitoring and Analysis Service For Applications Deployed at a Software Vendor's Customer Sites

[14] Solaris AppCrash Updates

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： application crash debugging solaris reference microsoft

相关文章推荐

新的分享

章节导航