您的位置:首页 > 其它

死锁调试

2016-06-03 12:26 393 查看
先Copy个死锁的定义, 所谓死锁:是指两个或两个以上的进程在执行过程中,因争夺资源而造成的一种互相等待的现象,若无外力作用,它们都将无法推进下去。我觉得这里使用进程并不是十分准确,更准确点应该是线程。下面我从应用层和内核层分别讨论使用windbg调试死锁的方法。

--------------------------------------应用层---------------------------------------------------------------

当发现进程处于僵死状态时,可以使用指令~*kb查看当前进程所有线程的调用堆栈,若发现存在ntdll!RtlEnterCriticalSection的调用,如下

2  Id: 4cb4.f10 Suspend: 1 Teb: 7efd7000 Unfrozen
ChildEBP RetAddr  Args to Child
0232fe30 773de70a 000000dc 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15
0232fe94 773de5ee 00000000 00000000 00000000 ntdll!RtlpWaitOnCriticalSection+0x13e
0232febc 00cb1054 00cb33b4 0232fed4 748c338a ntdll!RtlEnterCriticalSection+0x150
0232fec8 748c338a 00000000 0232ff14 773c9902 DeadLock!trd4Fun+0x24
0232fed4 773c9902 00000000 61ce2210 00000000 kernel32!BaseThreadInitThunk+0xe
0232ff14 773c98d5 00cb1030 00000000 00000000 ntdll!__RtlUserThreadStart+0x70
0232ff2c 00000000 00cb1030 00000000 00000000 ntdll!_RtlUserThreadStart+0x1b


可以确认此线程在等待进入临界区,其临界区地址地址为00cb33b4,然后使用!cs指令查询此临界区信息

0:003> !cs 00cb33b4
-----------------------------------------
Critical section   = 0x00cb33b4 (DeadLock!cs1+0x0)
DebugInfo          = 0x00510c70
LOCKED
LockCount          = 0x1
WaiterWoken        = No
OwningThread       = 0x00003600
RecursionCount     = 0x1
LockSemaphore      = 0xDC
SpinCount          = 0x00000000
可以看到此临界区已被其它线程占有,线程id为0x3600,查看此线程的调用堆栈

1  Id: 4cb4.3600 Suspend: 1 Teb: 7efda000 Unfrozen
ChildEBP RetAddr  Args to Child
0222fb50 773de70a 000000e0 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15
0222fbb4 773de5ee 00000000 00000000 00000000 ntdll!RtlpWaitOnCriticalSection+0x13e
0222fbdc 00cb1024 00cb339c 0222fbf4 748c338a ntdll!RtlEnterCriticalSection+0x150
0222fbe8 748c338a 00000000 0222fc34 773c9902 DeadLock!trd3Fun+0x24 [e:\work\9tenl-firewall\9tenl-firewall\passthru5x\deadlock\deadlock.cpp @ 53]
0222fbf4 773c9902 00000000 61de2130 00000000 kernel32!BaseThreadInitThunk+0xe
0222fc34 773c98d5 00cb1000 00000000 00000000 ntdll!__RtlUserThreadStart+0x70
0222fc4c 00000000 00cb1000 00000000 00000000 ntdll!_RtlUserThreadStart+0x1b
发现线程0x3600也在等待进入临界区,查看等待的临界区

0:003> !cs 00cb339c
-----------------------------------------
Critical section   = 0x00cb339c (DeadLock!cs2+0x0)
DebugInfo          = 0x00510c98
LOCKED
LockCount          = 0x1
WaiterWoken        = No
OwningThread       = 0x00000f10
RecursionCount     = 0x1
LockSemaphore      = 0xE0  //其实为一个事件句柄,若等待进入临界区,则在此事件上等待,请看相关线程堆栈
SpinCount          = 0x00000000
发现它等待的临界区被线程0xf10占有,而线程0xf10正在等待线程0x3600线程占有的临界区,故造成死锁。

若想查看某临界区占有线程的调用堆栈,可以直接使用!cs -o 指令,如下

0:003> !cs  -o 00cb33b4
-----------------------------------------
Critical section   = 0x00cb33b4 (DeadLock!cs1+0x0)
DebugInfo          = 0x00510c70
LOCKED
LockCount          = 0x1
WaiterWoken        = No
OwningThread       = 0x00003600
RecursionCount     = 0x1
LockSemaphore      = 0xDC
SpinCount          = 0x00000000
OwningThread DbgId = ~1s
OwningThread Stack =
ChildEBP RetAddr  Args to Child
0222fb50 773de70a 000000e0 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15 (FPO: [3,0,0]) //这儿等待的其实就是临界区相关的LockSemaphore成员
0222fbb4 773de5ee 00000000 00000000 00000000 ntdll!RtlpWaitOnCriticalSection+0x13e (FPO: [Non-Fpo])
0222fbdc 00cb1024 00cb339c 0222fbf4 748c338a ntdll!RtlEnterCriticalSection+0x150 (FPO: [Non-Fpo])
0222fbe8 748c338a 00000000 0222fc34 773c9902 DeadLock!trd3Fun+0x24 (FPO: [Non-Fpo]) (CONV: stdcall)
0222fbf4 773c9902 00000000 61de2130 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
0222fc34 773c98d5 00cb1000 00000000 00000000 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
0222fc4c 00000000 00cb1000 00000000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])
方便了调试。

另外可以使用!critsec替换!cs指令,得到稍有不同的dump信息

0:003> !critsec  00cb33b4

CritSec DeadLock!cs1+0 at 00cb33b4
WaiterWoken        No
LockCount          1
RecursionCount     1
OwningThread       3600
EntryCount         0
ContentionCount    1
*** Locked
0:003> !cs  00cb33b4
-----------------------------------------
Critical section   = 0x00cb33b4 (DeadLock!cs1+0x0)
DebugInfo          = 0x00510c70
LOCKED
LockCount          = 0x1
WaiterWoken        = No
OwningThread       = 0x00003600
RecursionCount     = 0x1
LockSemaphore      = 0xDC
SpinCount          = 0x00000000
其中关于各个字段的含义请参考windbg帮助文档“Displaying a Critical Section”,另外!cs指令还有一些高级用法,例如!cs -t 打印临界区树,但是需要Application Verifier的支持,具体用法请参考文档。

应用层调试器也支持!locks指令,它默认打印那些已被占有的临界区

0:003> !locks

CritSec MSVCR100!__app_type+38 at 715f5460
WaiterWoken        No
LockCount          0
RecursionCount     1
OwningThread       1a80
EntryCount         0
ContentionCount    0
*** Locked

CritSec DeadLock!cs1+0 at 00cb33b4
WaiterWoken        No
LockCount          1
RecursionCount     1
OwningThread       3600
EntryCount         0
ContentionCount    1
*** Locked

CritSec DeadLock!cs2+0 at 00cb339c
WaiterWoken        No
LockCount          1
RecursionCount     1
OwningThread       f10
EntryCount         0
ContentionCount    1
*** Locked

Scanned 166 critical sections
若要打印所有的临界区,请使用!locks -v,其它用法请参考文档。

另外我想的如果在某些服务器应用中,如果使用了临界区作为同步手段并且性能遭遇了某些瓶颈,怀疑是临界区原因的,可以在进入临界区之前,打印临界区的相关信息(占有线程、等待线程个数等)帮助确认瓶颈位置。

临界区的讨论先到这。若发现线程调用WaitForSingleObject,可以先使用!handle [Handle] f 打印资源的详细信息,例如

1  Id: 4cbc.337c Suspend: 1 Teb: 7efda000 Unfrozen
ChildEBP RetAddr  Args to Child
00c5f860 76ae15ce 000000d8 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15
00c5f8cc 748c1194 000000d8 ffffffff 00000000 KERNELBASE!WaitForSingleObjectEx+0x98
00c5f8e4 748c1148 000000d8 ffffffff 00000000 kernel32!WaitForSingleObjectExImplementation+0x75
00c5f8f8 0125102b 000000d8 ffffffff 00c5f914 kernel32!WaitForSingleObject+0x12
00c5f908 748c338a 00000000 00c5f954 773c9902 DeadLock!trd5Fun+0x2b
00c5f914 773c9902 00000000 637cc04a 00000000 kernel32!BaseThreadInitThunk+0xe
00c5f954 773c98d5 01251000 00000000 00000000 ntdll!__RtlUserThreadStart+0x70
00c5f96c 00000000 01251000 00000000 00000000 ntdll!_RtlUserThreadStart+0x1b

0:003> !handle d8 f
Handle d8
Type         	Mutant
Attributes   	0
GrantedAccess	0x1f0001:
Delete,ReadControl,WriteDac,WriteOwner,Synch
QueryState
HandleCount  	2
PointerCount 	5
Name         	\Sessions\1\BaseNamedObjects\DL2
Object Specific Information
Mutex is Owned
Mutant Owner 4cbc.4474
可以看到线程4cbc.337c正在等待获取handle值为d8的互斥量,而此时此互斥量的拥有者为4cbc.4474(进程id.线程id)。

在查看4cbc.4474的调用堆栈

2  Id: 4cbc.4474 Suspend: 1 Teb: 7efd7000 Unfrozen
ChildEBP RetAddr  Args to Child
00dcf92c 76ae15ce 000000d4 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15
00dcf998 748c1194 000000d4 ffffffff 00000000 KERNELBASE!WaitForSingleObjectEx+0x98
00dcf9b0 748c1148 000000d4 ffffffff 00000000 kernel32!WaitForSingleObjectExImplementation+0x75
00dcf9c4 0125106b 000000d4 ffffffff 00dcf9e0 kernel32!WaitForSingleObject+0x12
00dcf9d4 748c338a 00000000 00dcfa20 773c9902 DeadLock!trd6Fun+0x2b
00dcf9e0 773c9902 00000000 6365c33e 00000000 kernel32!BaseThreadInitThunk+0xe
00dcfa20 773c98d5 01251040 00000000 00000000 ntdll!__RtlUserThreadStart+0x70
00dcfa38 00000000 01251040 00000000 00000000 ntdll!_RtlUserThreadStart+0x1b

0:003> !handle d4 f
Handle d4
Type         	Mutant
Attributes   	0
GrantedAccess	0x1f0001:
Delete,ReadControl,WriteDac,WriteOwner,Synch
QueryState
HandleCount  	2
PointerCount 	5
Name         	\Sessions\1\BaseNamedObjects\DL1
Object Specific Information
Mutex is Owned
Mutant Owner 4cbc.337c
线程4cbc.4474正在等待4cbc.337c拥有的互斥量,故造成死锁。

如果等待的资源为事件(Event),比较难以应用以上的逻辑,因为事件是没有拥有者概念的,只有有信号、无信号两种状态,具体情况只能联系代码了。

附送!handle指令的完整用法

!handle [Handle [UMFlags [TypeName]]]
!handle -?
其中UMFlags为f时显示完整的句柄信息。

------------------------------------------------内核层-------------------------------------------------------------

当使用内核调试器时,想查看某个进程打开的句柄,首先使用.process /r /p [Process]指令切换当前进程,而后使用!handle指令查看与此进程相关的句柄。也可以不切换当前进程,直接使用命令!handle 0 3 [Process] [Typename]显示指定的进程句柄信息,例如

kd> !handle 0 3 88549990  Mutant

Searching for handles of type Mutant
PROCESS 88549990  SessionId: 1  Cid: 097c    Peb: 7ffd7000  ParentCid: 071c
DirBase: 3eded560  ObjectTable: 8c073fc0  HandleCount:  13.
Image: DeadLock2.exe

Handle table at 95335000 with 13 entries in use

0028: Object: 8846d610  GrantedAccess: 001f0001 Entry: 95335050
Object: 8846d610  Type: (863c9470) Mutant
ObjectHeader: 8846d5f8 (new version)
HandleCount: 1  PointerCount: 3
Directory Object: 92fc5a28  Name: DL3

002c: Object: 88522fe0  GrantedAccess: 001f0001 Entry: 95335058
Object: 88522fe0  Type: (863c9470) Mutant
ObjectHeader: 88522fc8 (new version)
HandleCount: 1  PointerCount: 3
Directory Object: 92fc5a28  Name: DL4


若想查看某个内核派发对象的详细信息,可以使用dt命令,注意此时并不需要切换至内核对象关联的进程上下文,因为内核对象的地址和内核空间,对所有进程都是一样的。
kd> dt nt!_KMUTANT 87cc9f98
+0x000 Header           : _DISPATCHER_HEADER
+0x010 MutantListEntry  : _LIST_ENTRY [ 0x88750218 - 0x88750218 ]
+0x018 OwnerThread      : 0x88750030 _KTHREAD
+0x01c Abandoned        : 0 ''
+0x01d ApcDisable       : 0 ''
利用以上信息进一步分析死锁情形。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: