您的位置:首页 > 其它

怎样分析crash dump(软件看门狗超时)

2014-09-15 10:28 281 查看
软件看门狗超时


1.Log | tail -xxx

从log得知,发生softlock的CPU为CPU0

[ 240.220702:0] BUG: soft lockup - CPU#0stuck for 22s! [kworker/u:2:803]

[ 240.227296:0] Modules linked in:

[ 240.227305:0]

[ 240.227311:0] Pid: 803, comm: kworker/u:2

[ 240.227322:0] CPU: 0 Not tainted (3.4.0-gc37fe8c-dirty #654)

[ 240.227344:0] PC is at smd_ch_irq_tasklet_handler+0x58/0x124

[ 240.227359:0] LR is at console_unlock+0x1c0/0x1d4

[ 240.227370:0] pc : [<c0293318>] lr : [<c0028fe0>] psr: 60000113

[ 240.227376:0] sp : edb9fde0 ip : edb9fcf0 fp : edb9fe04

[ 240.227386:0] r10: 00902000 r9 : 00000089 r8 : c0830db0

[ 240.227396:0] r7 : 00000000 r6 : c0857394 r5 : ee218800 r4 :c09ceb68

[ 240.227406:0] r3 : c0841c24 r2 : c0841c24 r1 : 20000193 r0 :0000003d

[ 240.227418:0] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel

[ 240.227430:0] Control: 10c53c7d Table: af53c04a DAC: 00000015


2.ps-l

通过ps-l得到最后运行的进程和每个进程的开始时间点

helpps

-l display the task last_run or timestamp value, whichever applies,

ofselected, or all, tasks; the list is sorted with the most

recently-runtask (largest last_run/timestamp) shown first,

followedby the task's current state.

crash>ps -l

[240535764861][RU] PID: 1909 TASK: df34d260 CPU: 1 COMMAND: "MediaScannerSer"

[240535531694][IN] PID: 1685 TASK: ee237700 CPU: 1 COMMAND: "ActivityManager"

[240535306361][IN] PID: 1673 TASK: ee202200 CPU: 1 COMMAND: "Compiler"

[240427327861][IN] PID: 1583 TASK: ee1f0a20 CPU: 1 COMMAND: "SurfaceFlinger"

[240300713777][IN] PID: 11 TASK: ee0a88a0 CPU: 1 COMMAND: "watchdog/1"

[240296101611][IN] PID: 1694 TASK: ed545380 CPU: 1 COMMAND: "PowerManagerSer"

[240040696777][IN] PID: 761 TASK: ee1f1aa0 CPU: 1 COMMAND: "kworker/1:2"

[239822763611][IN] PID: 861 TASK: ee1af640 CPU: 1 COMMAND: "file-storage"

[235566477694][IN] PID: 1751 TASK: e286d560 CPU: 1 COMMAND: "Compiler"

[235563934944][IN] PID: 1683 TASK: ee1f01e0 CPU: 1 COMMAND: "UI"

[235560073027][IN] PID: 1742 TASK: e2871500 CPU: 1 COMMAND: "ndroid.systemui"

[235559963110][IN] PID: 1586 TASK: ee15cd60 CPU: 1 COMMAND: "EventThread"

[235543914860][IN] PID: 1844 TASK: e2adda20 CPU: 1 COMMAND: "ndroid.launcher"

[235543787944][IN] PID: 1382 TASK: ee225b00 CPU: 1 COMMAND: "surfaceflinger"

[235541809610][IN] PID: 1682 TASK: ee2372e0 CPU: 1 COMMAND: "er.ServerThread"

[235540563194][IN] PID: 1918 TASK: e2add600 CPU: 1 COMMAND: "Binder_7"

[235537902277][IN] PID: 1878 TASK: e2a285c0 CPU: 1 COMMAND: "Binder_1"

[235536598110][IN] PID: 1902 TASK: e2bdba40 CPU: 1 COMMAND: "Binder_5"

[235536269277][IN] PID: 1587 TASK: ee202620 CPU: 1 COMMAND: "Binder_1"

[235513680610][IN] PID: 1760 TASK: e28708a0 CPU: 1 COMMAND: "Binder_2"

[235512269110][IN] PID: 1666 TASK: ee1102a0 CPU: 1 COMMAND: "system_server"

[235511906694][IN] PID: 1614 TASK: ee111320 CPU: 1 COMMAND: "Binder_2"

[235509703360][IN] PID: 1759 TASK: e4fd5120 CPU: 1 COMMAND: "Binder_1"

[235508464944][IN] PID: 1888 TASK: df34ce40 CPU: 1 COMMAND: "Binder_3"

[235506297110][IN] PID: 1994 TASK: ddb6dbc0 CPU: 1 COMMAND: "Binder_4"

[235505862777][IN] PID: 1960 TASK: df324a00 CPU: 1 COMMAND: "Binder_3"

[235500523194][IN] PID: 1696 TASK: ee0da080 CPU: 1 COMMAND: "AlarmManager"

[234844758777][IN] PID: 1702 TASK: ee202e60 CPU: 1 COMMAND: "InputDispatcher"

[234844638193][IN] PID: 1703 TASK: e2915900 CPU: 1 COMMAND: "InputReader"

[234844421777][IN] PID: 1811 TASK: ed544720 CPU: 1 COMMAND: "watchdog"

[220496212025][IN] PID: 1966 TASK: ddbd0260 CPU: 1 COMMAND: "Binder_3"

[219740084192][IN] PID: 1862 TASK: e2bdade0 CPU: 1 COMMAND: "Compiler"

[218960706192][IN] PID: 1715 TASK: ee14ad40 CPU: 1 COMMAND: "WifiStateMachin"

[218916116442][IN] PID: 1684 TASK: ee202a40 CPU: 1 COMMAND: "WindowManager"

[218891194192][IN] PID: 1916 TASK: e2a6f160 CPU: 1 COMMAND: "RILReceiver"

[218891084358][IN] PID: 1427 TASK: ee25af40 CPU: 1 COMMAND: "rild-tl"

[218890785025][IN] PID: 1912 TASK: df325a80 CPU: 1 COMMAND: "RILSender"

[218878943358][IN] PID: 1736 TASK: e4fd44c0 CPU: 1 COMMAND: "AudioService"

[218863686442][IN] PID: 1377 TASK: ee18e160 CPU: 1 COMMAND: "servicemanager"

[218817791608][IN] PID: 2432 TASK: e36919a0 CPU: 1 COMMAND: "KeyguardWidgetP"

[218803093692][IN] PID: 1836 TASK: e2b2d1c0 CPU: 1 COMMAND: "m.android.phone"

[218801150275][IN] PID: 1915 TASK: dd903ae0 CPU: 1 COMMAND: "Binder_6"

[218799726775][IN] PID: 1886 TASK: df34d680 CPU: 1 COMMAND: "Binder_2"

[218798570692][IN] PID: 1679 TASK: ee236680 CPU: 1 COMMAND: "Binder_1"

[218798175858][IN] PID: 1890 TASK: df34c600 CPU: 1 COMMAND: "Binder_4"

[218797752775][IN] PID: 1680 TASK: ee24a2c0 CPU: 1 COMMAND: "Binder_2"

[218793706108][IN] PID: 1856 TASK: e2adc9a0 CPU: 1 COMMAND: "Compiler"

[218790382442][IN] PID: 1872 TASK: e2a28e00 CPU: 1 COMMAND: "Binder_1"

[218788745108][IN] PID: 2150 TASK: da3af160 CPU: 1 COMMAND: "Binder_5"

[218786199358][IN] PID: 2152 TASK: e6f8a120 CPU: 1 COMMAND: "Binder_6"

[218782019442][IN] PID: 1713 TASK: ee2152a0 CPU: 1 COMMAND: "NetworkPolicy"

[218780936858][IN] PID: 1385 TASK: ee2252c0 CPU: 1 COMMAND: "mediaserver"

[218768044275][IN] PID: 1724 TASK: e2937100 CPU: 1 COMMAND: "WifiWatchdogSta"

[218760058275][IN] PID: 2136 TASK: ee25a2e0 CPU: 1 COMMAND: "Binder_4"

[218703090608][IN] PID: 1448 TASK: ee2036a0 CPU: 1 COMMAND: "SurfaceFlinger"

[218262044192][IN] PID: 1978 TASK: dd04a6e0 CPU: 1 COMMAND: "Binder_3"

[218241775108][IN] PID: 10 TASK: ee0a90e0 CPU: 1 COMMAND: "ksoftirqd/1"

[218241682358][IN] PID: 5 TASK: ee02e880 CPU: 1 COMMAND: "kworker/u:0"

[218164762192][IN] PID: 523 TASK: ee18e9a0 CPU: 1 COMMAND: "irq/47-ricoh61x"

[216168824525][IN] PID: 1852 TASK: e2828960 CPU: 1 COMMAND: "UI"

[216168802275][IN] PID: 1848 TASK: e2828d80 CPU: 1 COMMAND: "UI"

[216168734858][IN] PID: 1851 TASK: e2a6e500 CPU: 1 COMMAND: "UI"

[216160701275][IN] PID: 14 TASK: ee0db940 CPU: 1 COMMAND: "kworker/u:1"

[216140694191][IN] PID: 1664 TASK: ee0f5560 CPU: 1 COMMAND: "flush-179:8"

[215897614108][IN] PID: 1676 TASK: ee236aa0 CPU: 1 COMMAND: "FinalizerWatchd"

[214352551858][RU] PID: 803 TASK: ee180280 CPU: 0 COMMAND: "kworker/u:2"

从上可知CPU0从时间点[214352551858]开始运行在worker/u:2直到在时间点[240.220702]crash发生。

[240.220702:0]BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u:2:803]

到此,可以看出问题肯定发生在kworker/u:2:803进程中

3.各进程使用CPU0的情况

仍然分析ps-l的内容,问题肯定出在上次调用watchdog/0的时间之后,看有没有占据CPU0很长时间4S左右的进程,可以看出只有 kworker/u:2存在可能性。

[214352551858][RU] PID: 803 TASK: ee180280 CPU: 0 COMMAND: "kworker/u:2"

[214352538524][IN] PID: 1428 TASK: ee25ab20 CPU: 1 COMMAND: "rild-tl"

[214352414274][IN] PID: 1678 TASK: ee180ee0 CPU: 0 COMMAND: "rild-tl"

[214352399358][RU] PID: 873 TASK: ee16d9e0 CPU: 0 COMMAND: "kworker/0:2"

[214276534358][IN] PID: 1986 TASK: dd04a2c0 CPU: 0 COMMAND: "ndroid.systemui"

[214276494441][IN] PID: 1988 TASK: dd3dc6c0 CPU: 0 COMMAND: "ndroid.systemui"

[214276416441][IN] PID: 1987 TASK: dd3dcae0 CPU: 0 COMMAND: "ndroid.systemui"

[214223750024][IN] PID: 1991 TASK: dd3e4b20 CPU: 0 COMMAND: "ndroid.launcher"

[214218263941][IN] PID: 1885 TASK: df34daa0 CPU: 1 COMMAND: "Binder_2"

[214217474524][IN] PID: 1993 TASK: dd3e42e0 CPU: 0 COMMAND: "ndroid.launcher"

[214217265441][IN] PID: 1992 TASK: dd3e4700 CPU: 0 COMMAND: "ndroid.launcher"

[214157832524][IN] PID: 2350 TASK: e78dd900 CPU: 0 COMMAND: "m.android.phone"

[214156591108][IN] PID: 2352 TASK: e78dd0c0 CPU: 0 COMMAND: "m.android.phone"

[214156309524][IN] PID: 2351 TASK: e78dd4e0 CPU: 0 COMMAND: "m.android.phone"

[214136821691][IN] PID: 1657 TASK: ee17e980 CPU: 1 COMMAND: "Binder_1"

[214116278774][IN] PID: 2246 TASK: e368aaa0 CPU: 0 COMMAND: "AsyncTask#4"

[214115527108][IN] PID: 1681 TASK: ee25b780 CPU: 0 COMMAND: "SensorService"

[214111774608][IN] PID: 3 TASK: ee02f0c0 CPU: 0 COMMAND: "ksoftirqd/0"

[212220695691][RU] PID: 7 TASK: ee02e040 CPU: 0 COMMAND: "watchdog/0"

4.traceshow

使用traceshow分析进程的调用情况,下面只关心watchdog/0进程的调度情况。

可以看出watchdog/0大概每4S被调度一次。

正常情况下。在[212.220733+4s]时间点,watchdog/0应该得到运行,但是这个时间被kworker/u:2

占用。

rild-tl-1678 [000] 214.352559: sched_switch: 1678:120:S ==> 803:120:kworker/u:2

traceshow | grep watchdog/0

<idle>-0 [000] 136.220727: sched_switch: 0:120:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 136.220748: sched_switch: 7:0:S ==> 0:120:swapper/0

<idle>-0 [000] 140.220727: sched_switch: 0:120:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 140.220748: sched_switch: 7:0:S ==> 0:120:swapper/0

<idle>-0 [000] 144.220716: sched_switch: 0:120:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 144.220735: sched_switch: 7:0:S ==> 0:120:swapper/0

<idle>-0 [000] 148.220727: sched_switch: 0:120:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 148.220749: sched_switch: 7:0:S ==> 0:120:swapper/0

<idle>-0 [000] 152.220729: sched_switch: 0:120:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 152.220750: sched_switch: 7:0:S ==> 0:120:swapper/0

<idle>-0 [000] 156.220726: sched_switch: 0:120:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 156.220747: sched_switch: 7:0:S ==> 0:120:swapper/0

<idle>-0 [000] 160.220727: sched_switch: 0:120:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 160.220749: sched_switch: 7:0:S ==> 0:120:swapper/0

<idle>-0 [000] 164.220728: sched_switch: 0:120:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 164.220749: sched_switch: 7:0:S ==> 0:120:swapper/0

<idle>-0 [000] 168.220727: sched_switch: 0:120:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 168.220749: sched_switch: 7:0:S ==> 0:120:swapper/0

<idle>-0 [000] 172.220728: sched_switch: 0:120:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 172.220750: sched_switch: 7:0:S ==> 0:120:swapper/0

MediaScannerSer-1909 [000] 176.220720: sched_switch: 1909:131:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 176.220736: sched_switch: 7:0:S ==> 1909:131:MediaScannerSer

MediaScannerSer-1909 [000] 180.220719: sched_switch: 1909:131:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 180.220734: sched_switch: 7:0:S ==> 1909:131:MediaScannerSer

MediaScannerSer-1909 [000] 184.220755: sched_switch: 1909:131:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 184.220780: sched_switch: 7:0:S ==> 1909:131:MediaScannerSer

MediaScannerSer-1909 [000] 188.220725: sched_switch: 1909:131:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 188.220743: sched_switch: 7:0:S ==> 1909:131:MediaScannerSer

<idle>-0 [000] 192.220715: sched_switch: 0:120:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 192.220731: sched_switch: 7:0:S ==> 0:120:swapper/0

<idle>-0 [000] 196.220719: sched_switch: 0:120:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 196.220738: sched_switch: 7:0:S ==> 0:120:swapper/0

MediaScannerSer-1909 [000] 200.220716: sched_switch: 1909:131:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 200.220731: sched_switch: 7:0:S ==> 1909:131:MediaScannerSer

<idle>-0 [000] 204.220721: sched_switch: 0:120:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 204.220739: sched_switch: 7:0:S ==> 0:120:swapper/0

<idle>-0 [000] 208.220713: sched_switch: 0:120:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 208.220729: sched_switch: 7:0:S ==> 0:120:swapper/0

MediaScannerSer-1909 [000] 212.220718: sched_switch: 1909:131:R ==> 7:0:watchdog/0

watchdog/0-7 [000] 212.220733: sched_switch: 7:0:S ==> 1909:131:MediaScannerSer


找到问题点

已经确认出问题的进程为kworker/u:2

crash>bt -a

PID:803 TASK: ee180280 CPU: 0 COMMAND: "kworker/u:2"

#0[<c00772c0>] (crash_kexec) from [<c05d3bd0>]

#1[<c05d3b34>] (panic) from [<c007c348>]

#2[<c007c1f8>] (watchdog_timer_fn) from [<c004af9c>]

#3[<c004ae48>] (__run_hrtimer) from [<c004b338>]

#4[<c004b220>] (hrtimer_interrupt) from [<c0014864>]

#5[<c0014828>] (twd_handler) from [<c007fac8>]

#6[<c007f9c4>] (handle_percpu_devid_irq) from [<c007c7d0>]

#7[<c007c79c>] (generic_handle_irq) from [<c000f4a0>]

#8[<c000f41c>] (handle_IRQ) from [<c00084a4>]

#9[<c000845c>] (gic_handle_irq) from [<c000df40>]

#10[<c02932c4>] (smd_ch_irq_tasklet_handler) from [<c002f05c>]

#11[<c002efb8>] (tasklet_action) from [<c002f488>]

#12[<c002f388>] (__do_softirq) from [<c002f78c>]

#13[<c002f73c>] (irq_exit) from [<c000f4a4>]

#14[<c000f41c>] (handle_IRQ) from [<c00084a4>]

#15[<c000845c>] (gic_handle_irq) from [<c000df40>]

#16[<c0223ae4>] (dsi_config) from [<c0224880>]

#17[<c02247b8>] (nu7_dsi_early_resume) from [<c0064558>]

#18[<c006449c>] (late_resume) from [<c0040740>]

#19[<c00404d0>] (process_one_work) from [<c0040c88>]

#20[<c0040ac8>] (worker_thread) from [<c0046e34>]

#21[<c0046da0>] (kthread) from [<c002c3e8>]

#14[<c000f41c>] (handle_IRQ) from [<c00084a4>]

#15[<c000845c>] (gic_handle_irq) from [<c000df40>]

#16[<c0223ae4>] (dsi_config) from [<c0224880>]

#17[<c02247b8>] (nu7_dsi_early_resume) from [<c0064558>]

#18[<c006449c>] (late_resume) from [<c0040740>]

#19[<c00404d0>] (process_one_work) from [<c0040c88>]

#20[<c0040ac8>] (worker_thread) from [<c0046e34>]

#21[<c0046da0>] (kthread) from [<c002c3e8>]

问题是当执行函数smd_ch_irq_tasklet_handler时,timer中断发生,从此处开始看

staticvoid smd_ch_irq_tasklet_handler(unsigned long data)

{

staticnumber=0;

unsignedchar *ptr;

intavail;

structtty_struct *tty;

intindex=data;

unsignedlong flags;

structsmd_tty_info *tty_info = &driver_info->smd_tty[index];

if(index==0)

pr_err("smd_ch_irq_tasklet_handler:get number %d\n", number++);

if(number> 2700){

pr_err("smd_ch_irq_tasklet_handler:dead loop\n");

for(;;){

}

}

---------------------

}

问题是在tasklet函数中出现死循环导致的,因为tasklet的优先级高于watchdog/0的,导致watchdog/0得不到机会运行,最终导致看门狗超时softlock.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: