您的位置:首页 > 其它

使用Windbg解析dump文件

2015-10-16 19:44 471 查看
第一章 常用的Windbg指令

①!analyze -v 

②kP                                               可以看函数的入参

③!for_each_frame dv /t                            可以看函数中的局部变量

④dc , db                                          产看某一内存中的值    可以直接接变量名 不过可能需要回溯栈

⑤!threads                                         显示所有线程

⑥~0s , ~1s                                       进入某个线程

⑦!frame ProcessA!FunctionA                        查看某一变量有时需要。 回溯栈  

⑧!uniqstack                                       扩展命令显示当前进程中所有线程的调用堆栈,除开重复的那些。   

⑨!teb                                             扩展以的格式化后的形式显示线程环境块(TEB)的信息。 

⑩s-sa 和 s-su                                     命令搜索未指定的 ASCII 和 Unicode 字符串。这在检查某段内存是否包含可打印字符时有用。

⑪dds、dps 和 dqs 命令显示给定范围内存的内容。     该内存被假定为符号表中的一连串地址。相应的符号也会被显示出来。命令显示给定范围内存的内容,它们是把内存区域转储出来,并把内存中每个元素都视为一个符号对其进行解析,dds是四字节视为一个符号,dqs是每8字节视为一个符号,dps是根据当前处理器架构来选择最合适的长度

⑫.kframes                                        命令设置堆栈回溯显示的默认长度。默认20

⑬k, kb, kd, kp, kP, kv (Display Stack Backtrace) k*命令显示给定线程的调用堆栈,以及其他相关信息。通常要结合12)使用否则显示出来的东西很少

⑭.reload /i xxx.dll                              忽略.pdb 文件版本不匹配的情况。
第二章 Symbol的设置方法
2.1 将远程的系统函数的PDB文件拷贝到本地「D:\mysymbol」目录下

    SRV*D:\mysymbol*http://msdl.microsoft.com/download/symbols

2.2 加载设置的符号文件

    .reload

    可以使用菜单中的 Debug -> Modules 查看有没有加载进来
第三章 实例
实例1 如何调查堆被破坏问题。

    错误代码:0xc0000374

    错误含义:ACTIONABLE_HEAP_CORRUPTION_heap_failure_buffer_overrun

第一步、先用「!analyze -v」分析出错误的地方以及由于什么原因导致程序Dump掉的。

       无非是内存溢出,访问非法地址等几种。

0:009> !analyze -v

*******************************************************************************

*                                                                             *

*                        Exception Analysis                                   *

*                                                                             *

*******************************************************************************

GetPageUrlData failed, server returned HTTP status 404

URL requested: http://watson.microsoft.com/StageOne/ProcessA_exe/1_0_0_1/5134aefd/ntdll_dll/6_1_7601_18229/51fb164a/c0000374/000c4102.htm?Retriage=1
FAULTING_IP: 

ntdll!RtlReportCriticalFailure+62

00000000`777b4102 eb00            jmp     ntdll!RtlReportCriticalFailure+0x64 (00000000`777b4104)

EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)

ExceptionAddress: 00000000777b4102 (ntdll!RtlReportCriticalFailure+0x0000000000000062)

   ExceptionCode: c0000374

  ExceptionFlags: 00000001

NumberParameters: 1

   Parameter[0]: 000000007782b4b0

PROCESS_NAME:  ProcessA.exe

ERROR_CODE: (NTSTATUS) 0xc0000374 - <Unable to get error code text>

EXCEPTION_CODE: (NTSTATUS) 0xc0000374 - <Unable to get error code text>

EXCEPTION_PARAMETER1:  000000007782b4b0

MOD_LIST: <ANALYSIS/>

NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

FAULTING_THREAD:  0000000000002f8c

DEFAULT_BUCKET_ID:  ACTIONABLE_HEAP_CORRUPTION_heap_failure_buffer_overrun

PRIMARY_PROBLEM_CLASS:  ACTIONABLE_HEAP_CORRUPTION_heap_failure_buffer_overrun

BUGCHECK_STR:  APPLICATION_FAULT_ACTIONABLE_HEAP_CORRUPTION_heap_failure_buffer_overrun

LAST_CONTROL_TRANSFER:  from 00000000777b4746 to 00000000777b4102

STACK_TEXT:  

00000000`0548e170 00000000`777b4746 : 00000000`00000002 00000000`00000023 00000000`00000000 00000000`00000003 : ntdll!RtlReportCriticalFailure+0x62

00000000`0548e240 00000000`777b5952 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`1c01001d : ntdll!RtlpReportHeapFailure+0x26

00000000`0548e270 00000000`777b7604 : 00000000`00c50000 00000000`00c50000 00000000`0000000a 00000000`00000000 : ntdll!RtlpHeapHandleError+0x12

00000000`0548e2a0 00000000`777b79e8 : 00000000`00c50000 00000000`00000000 00000000`00100000 00000000`00000000 : ntdll!RtlpLogHeapFailure+0xa4

00000000`0548e2d0 00000000`7774fad6 : 00000000`00c50000 00000000`00c59e50 00000000`00c50000 00000000`00000000 : ntdll!RtlpAnalyzeHeapFailure+0x3a8

00000000`0548e330 00000000`777434d8 : 00000000`00c50000 00000000`00000003 00000000`000006cc 00000000`000006e0 : ntdll!RtlpAllocateHeap+0x1d2a

00000000`0548e8d0 00000000`777247ea : 00000000`00000003 00000000`00c5ee80 00000000`00c50278 00000000`000006cc : ntdll!RtlAllocateHeap+0x16c

00000000`0548e9e0 00000000`77723ff2 : 00000000`00c50000 00000000`00000003 00000000`00c5ee90 00000000`000006cc : ntdll!RtlpReAllocateHeap+0x648

00000000`0548eca0 00000000`750c712f : 00000000`0548fbe8 00000000`00c5ee90 00000000`00000000 00000000`000005ac : ntdll!RtlReAllocateHeap+0xa2

00000000`0548edb0 00000001`40010f6f : 00000000`00000000 00000000`0548fbe8 00000000`00000000 00000000`00000661 : msvcr80!realloc+0x6f [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\realloc.c @ 332]

00000000`0548ede0 00000001`4000f63c : ffffffff`ffffffff 00000000`0548ff10 00000000`00c97fd0 00000000`0548fe48 : ProcessA!FunctionA_AnalyzeEventData+0xfcf [e:\ProcessA\FunctionA_sockserv.cpp @ 1666]

00000000`0548f8a0 00000000`774e652d : 00000000`000002a0 00000000`00000000 00000000`00000000 00000000`00000000 : ProcessA!FunctionA_SockWork+0xe1c [e:\ProcessA\FunctionA_sockserv.cpp @ 1102]

00000000`0548ff60 00000000`7771c541 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd

00000000`0548ff90 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d

STACK_COMMAND:  !heap ; ~9s; .ecxr ; kb

FOLLOWUP_IP: 

msvcr80!realloc+6f [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\realloc.c @ 332]

00000000`750c712f 4885c0          test    rax,rax

SYMBOL_STACK_INDEX:  9

SYMBOL_NAME:  msvcr80!realloc+6f

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: msvcr80

IMAGE_NAME:  msvcr80.dll

DEBUG_FLR_IMAGE_TIMESTAMP:  4ec3407e

FAILURE_BUCKET_ID:  ACTIONABLE_HEAP_CORRUPTION_heap_failure_buffer_overrun_c0000374_msvcr80.dll!realloc

BUCKET_ID:  X64_APPLICATION_FAULT_ACTIONABLE_HEAP_CORRUPTION_heap_failure_buffer_overrun_msvcr80!realloc+6f

WATSON_STAGEONE_URL:  http://watson.microsoft.com/StageOne/ProcessA_exe/1_0_0_1/5134aefd/ntdll_dll/6_1_7601_18229/51fb164a/c0000374/000c4102.htm?Retriage=1

Followup: MachineOwner

---------

第二步、使用「!heap」找出出错的堆。分析出错的原因。

       0000000000c59c80

       0000000000c59e50  ←出错的堆地址。

       0000000000c59fd0

大家应该有这样的常识,在使用malloc()或者realloc()分配出来的空间的前面都有

相应的管理情报,用来记录这块分配的内存的大小以及返回的时候用的情报。

从这里很自然的猜想到,在写往0000000000c59c80里面写数据的时候写过了,

写到0000000000c59e50上去了,导致它的管理情报被覆盖了。从而程序dump掉了。

0:009> !heap

**************************************************************

*                                                            *

*                  HEAP ERROR DETECTED                       *

*                                                            *

**************************************************************

Details:

Error address: 0000000000c59e50

Heap handle: 0000000000c50000

Error type heap_failure_buffer_overrun (6)

Parameter 1: 000000000000000a

Last known valid blocks: before - 0000000000c59c80, after -0000000000c59fd0

Stack trace:

                00000000777b79e8: ntdll!RtlpAnalyzeHeapFailure+0x00000000000003a8

                000000007774fad6: ntdll!RtlpAllocateHeap+0x0000000000001d2a

                00000000777434d8: ntdll!RtlAllocateHeap+0x000000000000016c

                00000000777247ea: ntdll!RtlpReAllocateHeap+0x0000000000000648

                0000000077723ff2: ntdll!RtlReAllocateHeap+0x00000000000000a2

                00000000750c712f: msvcr80!realloc+0x000000000000006f

                0000000140010f6f: ProcessA!FunctionA_AnalyzeEventData+0x0000000000000fcf

                000000014000f63c: ProcessA!FunctionA_SockWork+0x0000000000000e1c

                00000000774e652d: kernel32!BaseThreadInitThunk+0x000000000000000d

                000000007771c541: ntdll!RtlUserThreadStart+0x000000000000001d

Index   Address  Name      Debugging options enabled

  1:   001f0000                

  2:   00010000                

  3:   00020000                

  4:   00670000                

  5:   00950000                

  6:   00c50000                

  7:   00910000                

  8:   00bc0000                

  9:   010e0000                

 10:   01220000                

 11:   01420000                

 12:   00c30000                

 13:   03660000                

 14:   00ba0000                

 15:   037b0000                

 16:   01340000                

 17:   039a0000                

第三步、使用「!for_each_frame dv /t」打印出错函数的局部变量,找出元凶。

       从下面的变量里面找到距离0000000000c59c80地址最近的变量,对了就是它:

       char * pData_n = 0x00000000`00c59c90 "SE:Security: ???"

       ※注意如果变量值指针的指针需要先用dc看一下该指针指向的地址。

       之后看代码知道,程序在读取pData_n的数据的时候如果遇到是0a(Windos换行符)就自动在后面加上

       0d变成0a0d。导致pData_n内存越界了。

0:009> !for_each_frame dv /t

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

12 00000000`0548edb0 00000001`40010f6f msvcr80!realloc+0x6f [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\realloc.c @ 332]

void * pBlock = 0x00000000`00000000

unsigned int64 newsize = 0x548fbe8

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

13 00000000`0548ede0 00000001`4000f63c ProcessA!FunctionA_AnalyzeEventData+0xfcf [e:\ProcessA\FunctionA_sockserv.cpp @ 1666]

void * cd = 0xffffffff`ffffffff

struct _MpEvsHead * Head = 0x00000000`0548ff10

char * pEventData = 0x00000000`00c97fd0 "???"

char ** pNewData = 0x00000000`0548fe48

char * SiteName = 0x00000000`0548fe18 ""

int oval_check = 0n0

char * pszHostIp = 0x00000000`0548fbf0 "192.168.1.1"

int j = 0n469

int NodeName_check = 0n0

char [2068] eventtext = char [2068] "SE:Security: ???"

unsigned long err = 0

int NL_henkan = 0n1

int Evttxt_check = 0n1

char [129] nameWork = char [129] "`_???"

int ret = 0n0

struct NameObject_t * pNameObj_n = 0x00000000`00c5eee8

char * pData_n = 0x00000000`00c59c90 "SE:Security: ???"

long lWork = 0n9

char [257] szTrcBuff = char [257] "safely divided text.([453]bytes --> [469]bytes)"

long nNameNum = 0n44

long nNewLen = 0n1740

struct NameObject_t * pNameObj_o = 0x00000000`00c98028

char * pData_o = 0x00000000`00c984c6 "SE:Security: ???"

char * pt = 0x00000000`00c59e55 "[???"

long i = 0n20

int IpAddr_check = 0n0

int res = 0n1

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

14 00000000`0548f8a0 00000000`774e652d ProcessA!FunctionA_SockWork+0xe1c [e:\ProcessA\FunctionA_sockserv.cpp @ 1102]

void * ns = 0x00000000`000002a0

char * pRead_str = 0x00000000`00c562f0 ","

int bTableRegisterd = 0n0

unsigned long err = 0

char [3] traceflg = char [3] ""

int ret = 0n0

short sWork = 0n2

int oval_check = 0n0

char * pNewData = 0x00000000`00c5ee90 "???"

char * wk = 0x00000000`0548f930 "192.168.1.1"

char [33] SiteName = char [33] ""

long lWork = 0n2032

char [257] szTrcBuff = char [257] "recv event OK"

int iLastSerchedIndex = 0n0

char [256] HostIp = char [256] "192.168.1.1"

int ret2 = 0n0

struct _MpEvsHead Head = struct _MpEvsHead

long nDataLen = 0n3

char [257] szTrcBuff2 = char [257] ""

char [20] szSendData = char [20] "OK"

struct addrinfo hinst = struct addrinfo

int conv_disc_set = 0n1

long lRc = 0n0

void * conv_disc = 0xffffffff`ffffffff

int res = 0n1

char * pData = 0x00000000`00c97fd0 "???"

long nRead = 0n3726

char [16] evttype = char [16] "Alarm.sys"

char * lpszEventid = 0x00000000`00c5f180 ""

long nSend = 0n12

char [256] ipTmp = char [256] "192.168.1.1"

char [20] szToCode = char [20] "sjis"

char [20] szFromCode = char [20] "sjis"

int bWriteEvent = 0n1

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
实例2  无效参数(STATUS_INVALID_PARAMETER)。

    错误代码:0xc000000d

    错误含义:STATUS_INVALID_PARAMETER

第一步、先用「!analyze -v」分析出错误的地方以及由于什么原因导致程序Dump掉的。

 

0:000> !analyze -v

*******************************************************************************

*                                                                             *

*                        Exception Analysis                                   *

*                                                                             *

*******************************************************************************

*** ERROR: Symbol file could not be found.  Defaulted to export symbols for user32.dll - 

Unable to load image C:\Windows\Odsv.dll, Win32 error 0n2

*** WARNING: Unable to verify timestamp for Odsv.dll

*** ERROR: Module load completed but symbols could not be loaded for Odsv.dll

GetPageUrlData failed, server returned HTTP status 404

URL requested: http://watson.microsoft.com/StageOne/ProcessB_exe/1_0_0_1/4e362265/msvcr80_dll/8_0_50727_6195/4dcdd833/c000000d/0001d5fa.htm?Retriage=1
FAULTING_IP: 

msvcr80!strncpy_s+10a [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\tcsncpy_s.inl @ 62]

00000000`74e6d5fa b822000000      mov     eax,22h

EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)

ExceptionAddress: 0000000074e6d5fa (msvcr80!strncpy_s+0x000000000000010a)

   ExceptionCode: c000000d

  ExceptionFlags: 00000000

NumberParameters: 0

PROCESS_NAME:  ProcessB.exe

ERROR_CODE: (NTSTATUS) 0xc000000d - <Unable to get error code text>

EXCEPTION_CODE: (NTSTATUS) 0xc000000d - <Unable to get error code text>

MOD_LIST: <ANALYSIS/>

NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

LAST_CONTROL_TRANSFER:  from 0000000000124250 to 0000000074e5b0ec

FAULTING_THREAD:  ffffffffffffffff

DEFAULT_BUCKET_ID:  STATUS_INVALID_PARAMETER

PRIMARY_PROBLEM_CLASS:  STATUS_INVALID_PARAMETER

BUGCHECK_STR:  APPLICATION_FAULT_STATUS_INVALID_PARAMETER

IP_ON_STACK: 

+2e32faf01dedf58

00000000`00124250 60              ???

FRAME_ONE_INVALID: 1

STACK_TEXT:  

00000000`00124220 00000000`00124250 : 00000000`00000006 00000000`00000000 00000000`00000001 00000000`00000000 : msvcr80!_invalid_parameter+0x6c [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\invarg.c @ 88]

00000000`00124228 00000000`00000006 : 00000000`00000000 00000000`00000001 00000000`00000000 00000000`00000000 : 0x124250

00000000`00124230 00000000`00000000 : 00000000`00000001 00000000`00000000 00000000`00000000 00000000`00124260 : 0x6

STACK_COMMAND:  ~0s; .ecxr ; kb

FOLLOWUP_IP: 

msvcr80!strncpy_s+10a [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\tcsncpy_s.inl @ 62]

00000000`74e6d5fa b822000000      mov     eax,22h

FAULTING_SOURCE_CODE:  

No source found for 'f:\dd\vctools\crt_bld\self_64_amd64\crt\src\tcsncpy_s.inl'

SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  msvcr80!strncpy_s+10a

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: msvcr80

IMAGE_NAME:  msvcr80.dll

DEBUG_FLR_IMAGE_TIMESTAMP:  4dcdd833

FAILURE_BUCKET_ID:  STATUS_INVALID_PARAMETER_c000000d_msvcr80.dll!strncpy_s

BUCKET_ID:  X64_APPLICATION_FAULT_STATUS_INVALID_PARAMETER_msvcr80!strncpy_s+10a

WATSON_STAGEONE_URL:  http://watson.microsoft.com/StageOne/ProcessB_exe/1_0_0_1/4e362265/msvcr80_dll/8_0_50727_6195/4dcdd833/c000000d/0001d5fa.htm?Retriage=1

Followup: MachineOwner

---------

这次运气很不好,从「!analyze -v」打出来的结果来看看不出啥东西来,只知道

在调用strncpy_s的时候dmp掉了,无法定位具体是哪个函数出错的原因很多,有可能

客户采集的不是全dmp文件或者dmp文件中的栈被破坏了。   

这的确很伤脑筋,就针对这个我可是花了3个星期一行行的解析栈里面的内容 才解决的。

第二步、先用「!teb」看一下这个程序的栈是从哪里到哪里的。

0:000>!teb

TEB at 000007ffffeee000

    ExceptionList:        0000000000000000

    StackBase:            0000000008d50000

    StackLimit:           0000000008d4d000

    SubSystemTib:         0000000000000000

    FiberData:            0000000000001e00

    ArbitraryUserPointer: 0000000000000000

    Self:                 000007ffffeee000

    EnvironmentPointer:   0000000000000000

    ClientId:             0000000000001bdc . 0000000000001868

    RpcHandle:            0000000000000000

    Tls Storage:          000007ffffeee058

    PEB Address:          000007fffffd6000

    LastErrorValue:       87

    LastStatusValue:      c000000d

    Count Owned Locks:    0

    HardErrorMode:        0

第三步、先用「dps」看一下这个程序的栈中的内存的内容。 下面截取其中比较重要的一段。

-------------------------------------------------------------------------------------------------------------------------------

00000000`001247d8  00000000`74e6d5fa msvcr80!strncpy_s+0x10a [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\tcsncpy_s.inl @ 62]

00000000`001247e0  00000000`009c01e0

00000000`001247e8  00000000`030f5810

00000000`001247f0  00000000`0057e310 ProcessB2!work   

★「ProcessB2!work」的内容本应该是像这样的数据「DNxxxxxxxx_150_109」

但是现在「ProcessB2!work」中的内容却是「VIP_rtcrx00184-004a/b-y3b-d」这个。

00000000`001247f8  00000000`005782c0 ProcessB2!trcData 

▲「ProcessB2!trcData」的内容是「Function:testB call」。

 函数List::testB の trace("testB", __FILE__, __LINE__, TRCLV_3);

00000000`00124800  00000000`00000000

00000000`00124808  00000000`00000000

00000000`00124810  00000000`004a3150 ProcessB2!`string' 

▲「 ProcessB2!`string'」的内容是「e:\ProcessB\FunctionB.cpp  __FILE__」。

00000000`00124818  00000000`00455b65 ProcessB2!List::testB+0x55 [e:\ProcessB\Listset.cpp @ 719]

00000000`00124820  00000000`009c01e0

00000000`00124828  00000000`030f5810

00000000`00124830  00000000`0057e310 ProcessB2!work

00000000`00124838  00000000`001249e0

00000000`00124840  32322e35`322e3000

00000000`00124848  30614031`33312e34

00000000`00124850  7097fb8e`bc923730

00000000`00124858  5049565f`5753334c

00000000`00124860  00000000`0000125f

00000000`00124868  000082bd`b1200d5e

00000000`00124870  00000000`009c01e0

00000000`00124878  00000000`00467bda ProcessB2!FunctionB+0x73a [e:\ProcessB\FunctionB.cpp @ 181]   

-------------------------------------------------------------------------------------------------------------------------------

这里终于定位到是哪个函数出问题。搞清楚这些函数的功能,然后打印出所有可能打印的内容,发现

函数传递了一个不合法的数据。在这里要说一下为啥传的数据不合法就会Dmp掉。

首先strncpy 这个函数在使用的时候只要有个宏定义(默认是有的)在编译的时候就会使用strncpy_s这个安全函数。

详情可以参考下面微软的说明文档。 http://msdn.microsoft.com/zh-cn/LIBRARY/ms175759(v=vs.80)
其次说明一下为什么会dmp掉。strncpy在使用的时候如果转化成strncpy_s的时候是这样一种形式。

char dst[5];

strncpy(dst, "a long string", 5);    ---->  strncpy_s(dst, 5, "a long string", 5);

而这样就会到时报STATUS_INVALID_PARAMETER这个错误这是strncpy_s的特性。具体使用方法可以参考下面的文档。 http://msdn.microsoft.com/zh-cn/library/5dae5d43(v=vs.90).aspx
节选:

char dst[5];

strncpy_s(dst, 5, "a long string", 5);

means that we are asking strncpy_s to copy five characters into a buffer five bytes long; this would leave no space for the null terminator, hence strncpy_s zeroes out the string and calls the invalid parameter handler.

If truncation behavior is needed, use _TRUNCATE or (size – 1):

strncpy_s(dst, 5, "a long string", _TRUNCATE);

strncpy_s(dst, 5, "a long string", 4);
详细的ACTIONABLE_HEAP_CORRUPTION_heap_failure_buffer_overrun方法还可以参考以下的例子:
http://blogs.msdn.com/b/jiangyue/archive/2010/03/16/windows-heap-overrun-monitoring.aspx
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  windbg 调试