您的位置:首页 > 运维架构 > Linux

使用 ftrace 调试 Linux 内核

2012-08-31 19:52 477 查看

使用ftrace调试Linux内核,第1部分

/article/2458804.html


简介:ftrace是Linux内核中提供的一种调试工具。使用ftrace可以对内核中发生的事情进行跟踪,这在调试bug或者分析内核时非常有用。本系列文章对ftrace进行了介绍,分为三部分。本文是第一部分,介绍了内核相关的编译选项、用户态访问ftrace的接口、ftrace的数据文件,并对ftrace提供的跟踪器的用途进行了介绍,以使读者更好的了解和使用该工具。

ftrace是内建于Linux内核的跟踪工具,从2.6.27开始加入主流内核。使用ftrace可以调试或者分析内核中发生的事情。ftrace提供了不同的跟踪器,以用于不同的场合,比如跟踪内核函数调用、对上下文切换进行跟踪、查看中断被关闭的时长、跟踪内核态中的延迟以及性能问题等。系统开发人员可以使用ftrace对内核进行跟踪调试,以找到内核中出现的问题的根源,方便对其进行修复。另外,对内核感兴趣的读者还可以通过ftrace来观察内核中发生的活动,了解内核的工作机制。

让内核支持ftrace

使用ftrace,首先要将其编译进内核。内核源码目录下的kernel/trace/Makefile文件给出了ftrace相关的编译选项。

清单1.ftrace相关的配置选项列表

CONFIG_FUNCTION_TRACER
CONFIG_FUNCTION_GRAPH_TRACER
CONFIG_CONTEXT_SWITCH_TRACER
CONFIG_NOP_TRACER
CONFIG_SCHED_TRACER
...

ftrace相关的配置选项比较多,针对不同的跟踪器有各自对应的配置选项。不同的选项有不同的依赖关系,内核源码目录下的kernel/trace/Kconfig文件描述了这些依赖关系。读者可以参考Makefile文件和Konfig文件,然后选中自己所需要的跟踪器。

通常在配置内核时,使用makemenuconfig会更直观一些。以2.6.33.1版本的内核为例,要将ftrace编译进内核,可以选中Kernelhacking(图1)下的Tracers菜单项(图2)。

图1.Kernelhacking



图2.Tracers



进入Tracers菜单下,可以看到内核支持的跟踪器列表。如图3所示,这里选中了所有的跟踪器,读者可以根据自己的需要选中特定的跟踪器。

图3.内核支持的跟踪器列表



这里要注意,如果是在32位x86机器上,编译时不要选中Generalsetup菜单项(图4)下的Optimizeforsize选项(图5),否则就无法看到图3中的KernelFunctionGraphTracer选项。这是因为在Konfig文件中,针对32位x86机器,表项FUNCTION_GRAPH_TRACER有一个特殊的依赖条件:

dependson!X86_32||!CC_OPTIMIZE_FOR_SIZE

图4.Generalsetup



图5.Optimizeforsize



ftrace通过debugfs向用户态提供了访问接口,所以还需要将debugfs编译进内核。激活对debugfs的支持,可以直接编辑内核配置文件.config,设置CONFIG_DEBUG_FS=y;或者在makemenuconfig时到Kernelhacking菜单下选中对debugfs文件系统的支持,如图6所示。

图6.debugfs编译选项



配置完成后,编译安装新内核,然后启动到新内核。注意,激活ftrace支持后,编译内核时会使用编译器的-pg选项,这是在kernel/trace/Makefile文件中定义的,如清单2所示。

清单2.激活编译选项-pg

ifdefCONFIG_FUNCTION_TRACER
ORIG_CFLAGS:=$(KBUILD_CFLAGS)
KBUILD_CFLAGS=$(subst-pg,,$(ORIG_CFLAGS))
...
endif
...

使用-pg选项会在编译得到的内核映像中加入大量的调试信息。一般情况下,只是在开发测试阶段激活ftrace支持,以调试内核,修复bug。最终用于发行版的内核则会关闭-pg选项,也就无法使用ftrace。

通过debugfs访问ftrace

ftrace通过debugfs向用户态提供访问接口。配置内核时激活debugfs后会创建目录/sys/kernel/debug,debugfs文件系统就是挂载到该目录。要挂载该目录,需要将如下内容添加到/etc/fstab文件:

debugfs/sys/kernel/debugdebugfsdefaults00

或者可以在运行时挂载:

mount-tdebugfsnodev/sys/kernel/debug

激活内核对ftrace的支持后会在debugfs下创建一个tracing目录/sys/kernel/debug/tracing。该目录下包含了ftrace的控制和输出文件,如图7所示。根据编译内核时针对ftrace的设定不同,该目录下实际显示的文件和目录与这里也会不同。

图7.tracing目录下的文件



回页首
ftrace的数据文件

/sys/kernel/debug/trace目录下文件和目录比较多,有些是各种跟踪器共享使用的,有些是特定于某个跟踪器使用的。在操作这些数据文件时,通常使用echo命令来修改其值,也可以在程序中通过文件读写相关的函数来操作这些文件的值。下面只对部分文件进行描述,读者可以参考内核源码包中Documentation/trace目录下的文档以及kernel/trace下的源文件以了解其余文件的用途。

README文件提供了一个简短的使用说明,展示了ftrace的操作命令序列。可以通过cat命令查看该文件以了解概要的操作流程。
current_tracer用于设置或显示当前使用的跟踪器;使用echo将跟踪器名字写入该文件可以切换到不同的跟踪器。系统启动后,其缺省值为nop,即不做任何跟踪操作。在执行完一段跟踪任务后,可以通过向该文件写入nop来重置跟踪器。
available_tracers记录了当前编译进内核的跟踪器的列表,可以通过cat查看其内容;其包含的跟踪器与图3中所激活的选项是对应的。写current_tracer文件时用到的跟踪器名字必须在该文件列出的跟踪器名字列表中。
trace文件提供了查看获取到的跟踪信息的接口。可以通过cat等命令查看该文件以查看跟踪到的内核活动记录,也可以将其内容保存为记录文件以备后续查看。
tracing_enabled用于控制current_tracer中的跟踪器是否可以跟踪内核函数的调用情况。写入0会关闭跟踪活动,写入1则激活跟踪功能;其缺省值为1。
set_graph_function设置要清晰显示调用关系的函数,显示的信息结构类似于C语言代码,这样在分析内核运作流程时会更加直观一些。在使用function_graph跟踪器时使用;缺省为对所有函数都生成调用关系序列,可以通过写该文件来指定需要特别关注的函数。
buffer_size_kb用于设置单个CPU所使用的跟踪缓存的大小。跟踪器会将跟踪到的信息写入缓存,每个CPU的跟踪缓存是一样大的。跟踪缓存实现为环形缓冲区的形式,如果跟踪到的信息太多,则旧的信息会被新的跟踪信息覆盖掉。注意,要更改该文件的值需要先将current_tracer设置为nop才可以。
tracing_on用于控制跟踪的暂停。有时候在观察到某些事件时想暂时关闭跟踪,可以将0写入该文件以停止跟踪,这样跟踪缓冲区中比较新的部分是与所关注的事件相关的;写入1可以继续跟踪。
available_filter_functions记录了当前可以跟踪的内核函数。对于不在该文件中列出的函数,无法跟踪其活动。
set_ftrace_filterset_ftrace_notrace在编译内核时配置了动态ftrace(选中CONFIG_DYNAMIC_FTRACE选项)后使用。前者用于显示指定要跟踪的函数,后者则作用相反,用于指定不跟踪的函数。如果一个函数名同时出现在这两个文件中,则这个函数的执行状况不会被跟踪。这些文件还支持简单形式的含有通配符的表达式,这样可以用一个表达式一次指定多个目标函数;具体使用在后续文章中会有描述。注意,要写入这两个文件的函数名必须可以在文件
available_filter_functions中看到。缺省为可以跟踪所有内核函数,文件set_ftrace_notrace的值则为空。

ftrace跟踪器

ftrace当前包含多个跟踪器,用于跟踪不同类型的信息,比如进程调度、中断关闭等。可以查看文件available_tracers获取内核当前支持的跟踪器列表。在编译内核时,也可以看到内核支持的跟踪器对应的选项,如之前图3所示。

nop跟踪器不会跟踪任何内核活动,将nop写入current_tracer文件可以删除之前所使用的跟踪器,并清空之前收集到的跟踪信息,即刷新trace文件。
function跟踪器可以跟踪内核函数的执行情况;可以通过文件set_ftrace_filter显示指定要跟踪的函数。
function_graph跟踪器可以显示类似C源码的函数调用关系图,这样查看起来比较直观一些;可以通过文件set_grapch_function显示指定要生成调用流程图的函数。
sched_switch跟踪器可以对内核中的进程调度活动进行跟踪。
irqsoff跟踪器和preemptoff跟踪器分别跟踪关闭中断的代码和禁止进程抢占的代码,并记录关闭的最大时长,preemptirqsoff跟踪器则可以看做它们的组合。

ftrace还支持其它一些跟踪器,比如initcall、ksym_tracer、mmiotrace、sysprof等。ftrace框架支持扩展添加新的跟踪器。读者可以参考内核源码包中Documentation/trace目录下的文档以及kernel/trace下的源文件,以了解其它跟踪器的用途和如何添加新的跟踪器。

小结

本系列文章对ftrace的配置和使用进行了介绍。本文是其中的第一部分,介绍了ftrace的编译配置、用户态访问ftrace的接口和ftrace的数据文件,并对ftrace所提供的部分跟踪器的用途进行了描述。由于篇幅的限制,本文没有对ftrace的具体使用和如何在代码中与ftrace进行交互进行描述,这些内容将分别在本系列文章的后续篇章中给出。


使用ftrace调试Linux内核,第2部分

/article/2458805.html

简介:ftrace是Linux内核中提供的一种调试工具。使用ftrace可以对内核中发生的事情进行跟踪,这在调试bug或者分析内核时非常有用。本系列文章对ftrace进行了介绍,分为三部分。本文是第二部分,通过示例介绍了如何使用ftrace中的跟踪器,并对不同格式的跟踪信息进行了描述,同时还对如何使用ftrace调试指定内核模块进行了介绍。通过对本文的学习,读者可以了解如何在实践中使用ftrace。

ftrace操作概述

使用ftrace提供的跟踪器来调试或者分析内核时需要如下操作:

切换到目录/sys/kernel/debug/tracing/下
查看available_tracers文件,获取当前内核支持的跟踪器列表
关闭ftrace跟踪,即将0写入文件tracing_enabled
激活ftrace_enabled,否则function跟踪器的行为类似于nop;另外,激活该选项还可以让一些跟踪器比如irqsoff获取更丰富的信息。建议使用ftrace时将其激活。要激活ftrace_enabled,可以通过proc文件系统接口来设置:

echo1>/proc/sys/kernel/ftrace_enabled

将所选择的跟踪器的名字写入文件current_tracer
将要跟踪的函数写入文件set_ftrace_filter,将不希望跟踪的函数写入文件set_ftrace_notrace。通常直接操作文件set_ftrace_filter就可以了
激活ftrace跟踪,即将1写入文件tracing_enabled。还要确保文件tracing_on的值也为1,该文件可以控制跟踪的暂停
如果是对应用程序进行分析的话,启动应用程序的执行,ftrace会跟踪应用程序运行期间内核的运作情况
通过将0写入文件tracing_on来暂停跟踪信息的记录,此时跟踪器还在跟踪内核的运行,只是不再向文件trace中写入跟踪信息;或者将0写入文件tracing_enabled来关闭跟踪
查看文件trace获取跟踪信息,对内核的运行进行分析调试

接下来将对跟踪器的使用以及跟踪信息的格式通过实例加以讲解。

fucntion跟踪器

function跟踪器可以跟踪内核函数的调用情况,可用于调试或者分析bug,还可用于了解和观察Linux内核的执行过程。清单1给出了使用function跟踪器的示例。

[b]清单1.function跟踪器使用示例


[cpp]
viewplaincopyprint?

<PREclass=cppname="code">[root@linuxtracing]#pwd /sys/kernel/debug/tracing [root@linuxtracing]#echo0>tracing_enabled [root@linuxtracing]#echo1>/proc/sys/kernel/ftrace_enabled [root@linuxtracing]#echofunction>current_tracer [root@linuxtracing]#echo1>tracing_on [root@linuxtracing]#echo1>tracing_enabled #让内核运行一段时间,这样ftrace可以收集一些跟踪信息,之后再停止跟踪 [root@linuxtracing]#echo0>tracing_enabled [root@linuxtracing]#cattrace|head-10 #tracer:function # #TASK-PIDCPU#TIMESTAMPFUNCTION #||||| <idle>-0[000]20654.426521:_raw_spin_lock<-scheduler_tick <idle>-0[000]20654.426522:task_tick_idle<-scheduler_tick <idle>-0[000]20654.426522:cpumask_weight<-scheduler_tick <idle>-0[000]20654.426523:cpumask_weight<-scheduler_tick <idle>-0[000]20654.426523:run_posix_cpu_timers<-update_process_times <idle>-0[000]20654.426524:hrtimer_forward<-tick_sched_timer</PRE><BR>
<BR>
<PRE></PRE>
<P>trace文件给出的信息格式很清晰。首先,字段“tracer:”给出了当前所使用的跟踪器的名字,这里为function跟踪器。然后是跟踪信息记录的格式,TASK字段对应任务的名字,PID字段则给出了任务的进程ID,字段CPU#表示运行被跟踪函数的CPU号,这里可以看到idle进程运行在0号CPU上,其进程ID是0;字段TIMESTAMP是时间戳,其格式为“<secs>.<usecs><CODE>”</CODE>,表示执行该函数时对应的时间戳;FUNCTION
一列则给出了被跟踪的函数,函数的调用者通过符号“<-”标明,这样可以观察到函数的调用关系。</P>
<DIVclass=ibm-alternate-rule>
<HR>
</DIV>
<Pclass="ibm-ind-linkibm-back-to-top"></P>
<P><Aname=major3><SPANclass=atitle>function_graph跟踪器</SPAN></A></P>
<P>在function跟踪器给出的信息中,可以通过FUNCTION列中的符号“<-”来查看函数调用关系,但是由于中间会混合不同函数的调用,导致看起来非常混乱,十分不方便。function_graph跟踪器则可以提供类似C代码的函数调用关系信息。通过写文件set_graph_function可以显示指定要生成调用关系的函数,缺省会对所有可跟踪的内核函数生成函数调用关系图。清单2给出了使用function_grapch跟踪器的示例,示例中将内核函数__do_fault作为观察对象,该函数在内核运作过程中会被频繁调用。</P>
<Aname=listing2><STRONG>清单2.function_graph跟踪器使用示例</STRONG></A><SPANstyle="FONT-SIZE:16px"></SPAN>
<P>在文件trace的输出信息中,首先给出的也是当前跟踪器的名字,这里是function_graph。接下来是详细的跟踪信息,可以看到,函数的调用关系以类似C代码的形式组织。</P>
<P>CPU字段给出了执行函数的CPU号,本例中都为1号CPU。DURATION字段给出了函数执行的时间长度,以us为单位。FUNCTIONCALLS则给出了调用的函数,并显示了调用流程。注意,对于不调用其它函数的函数,其对应行以“;”结尾,而且对应的DURATION字段给出其运行时长;对于调用其它函数的函数,则在其“}”对应行给出了运行时长,该时间是一个累加值,包括了其内部调用的函数的执行时长。DURATION字段给出的时长并不是精确的,它还包含了执行ftrace自身的代码所耗费的时间,所以示例中将内部函数时长累加得到的结果会与对应的外围调用函数的执行时长并不一致;不过通过该字段还是可以大致了解函数在时间上的运行开销的。清单
2中最后通过echo命令重置了文件set_graph_function。</P>
<DIVclass=ibm-alternate-rule>
<HR>
</DIV>
<Pclass="ibm-ind-linkibm-back-to-top"></P>
<P><Aname=major4><SPANclass=atitle>sched_switch跟踪器</SPAN></A></P>
<P>sched_switch跟踪器可以对进程的调度切换以及之间的唤醒操作进行跟踪,如清单3所示。</P>
<Aname=listing3><STRONG>清单3.sched_switch跟踪器使用示例</STRONG></A><SPANstyle="FONT-SIZE:16px"></SPAN><PREclass=cppname="code">
[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#echo1>/proc/sys/kernel/ftrace_enabled
[root@linuxtracing]#echosched_switch>current_tracer
[root@linuxtracing]#echo1>tracing_on
[root@linuxtracing]#echo1>tracing_enabled

#让内核运行一段时间,这样ftrace可以收集一些跟踪信息,之后再停止跟踪

[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#cattrace|head-10
#tracer:sched_switch
#
#TASK-PIDCPU#TIMESTAMPFUNCTION
#|||||
bash-1408[000]26208.816058:1408:120:S+[000]1408:120:Sbash
bash-1408[000]26208.816070:1408:120:S+[000]1408:120:Sbash
bash-1408[000]26208.816921:1408:120:R+[000]9:120:Revents/0
bash-1408[000]26208.816939:1408:120:R==>[000]9:120:Revents/0
events/0-9[000]26208.817081:9:120:R+[000]1377:120:Rgnome-terminal
events/0-9[000]26208.817088:9:120:S==>[000]1377:120:Rgnome-terminal
</PRE><BR>
<BR>
<P>在sched_swich跟踪器获取的跟踪信息中记录了进程间的唤醒操作和调度切换信息,可以通过符号‘+’和‘==>’区分;唤醒操作记录给出了当前进程唤醒运行的进程,进程调度切换记录中显示了接替当前进程运行的后续进程。</P>
<P>描述进程状态的格式为“Task-PID:Priority:Task-State”。以示例跟踪信息中的第一条跟踪记录为例,可以看到进程bash的PID为1408,其对应的内核态优先级为120,当前状态为S(可中断睡眠状态),当前bash并没有唤醒其它进程;从第3条记录可以看到,进程bash将进程events/0唤醒,而在第4条记录中发生了进程调度,进程bash切换到进程events/0执行。</P>
<P>在Linux内核中,进程的状态在内核头文件include/linux/sched.h中定义,包括可运行状态TASK_RUNNING(对应跟踪信息中的符号‘R’)、可中断阻塞状态TASK_INTERRUPTIBLE(对应跟踪信息中的符号‘S’)等。同时该头文件也定义了用户态进程所使用的优先级的范围,最小值为MAX_USER_RT_PRIO(值为100),最大值为MAX_PRIO-1(对应值为139),缺省为DEFAULT_PRIO(值为120);在本例中,进程优先级都是缺省值
120。</P>
<DIVclass=ibm-alternate-rule>
<HR>
</DIV>
<Pclass="ibm-ind-linkibm-back-to-top"><BR>
</P>
<P><Aname=major5><SPANclass=atitle>irqsoff跟踪器</SPAN></A></P>
<P>当关闭中断时,CPU会延迟对设备的状态变化做出反应,有时候这样做会对系统性能造成比较大的影响。irqsoff跟踪器可以对中断被关闭的状况进行跟踪,有助于发现导致较大延迟的代码;当出现最大延迟时,跟踪器会记录导致延迟的跟踪信息,文件tracing_max_latency则记录中断被关闭的最大延时。</P>
<BR>
<Aname=listing4><STRONG>清单4.irqsoff跟踪器使用示例</STRONG></A><BR>
<SPANstyle="FONT-FAMILY:monospace"><SPANstyle="WHITE-SPACE:pre"></SPAN></SPAN><PREclass=cppname="code">
[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#echo1>/proc/sys/kernel/ftrace_enabled
[root@linuxtracing]#echoirqsoff>current_tracer
[root@linuxtracing]#echo1>tracing_on
[root@linuxtracing]#echo1>tracing_enabled

#让内核运行一段时间,这样ftrace可以收集一些跟踪信息,之后再停止跟踪

[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#cattrace|head-35
#tracer:irqsoff
#
#irqsofflatencytracev1.1.5on2.6.33.1
#--------------------------------------------------------------------
#latency:34380us,#6/6,CPU#1|(M:desktopVP:0,KP:0,SP:0HP:0#P:2)
#-----------------

#|task:-0(uid:0nice:0policy:0rt_prio:0)
#-----------------

#=>startedat:reschedule_interrupt
#=>endedat:restore_all_notrace
#
#
#_------=>CPU#
#/_-----=>irqs-off
#|/_----=>need-resched
#||/_---=>hardirq/softirq
#|||/_--=>preempt-depth
#||||/_--=>lock-depth
#|||||/delay
#cmdpid||||||time|caller
#\/||||||\|/
<idle>-01dN...4285us!:trace_hardirqs_off_thunk<-reschedule_interrupt
<idle>-01dN...34373us+:smp_reschedule_interrupt<-reschedule_interrupt
<idle>-01dN...34375us+:native_apic_mem_write<-smp_reschedule_interrupt
<idle>-01dN...34380us+:trace_hardirqs_on_thunk<-restore_all_notrace
<idle>-01dN...34384us:trace_hardirqs_on_caller<-restore_all_notrace
<idle>-01dN...34384us:<stacktrace>
=>trace_hardirqs_on_thunk
[root@linuxtracing]#cattracing_max_latency
34380</PRE><BR>
<BR>
<P>从清单4中的输出信息中,可以看到当前irqsoff延迟跟踪器的版本信息。接下来是最大延迟时间,以us为单位,本例中为34380us,查看文件tracing_max_latency也可以获取该值。从“task:”字段可以知道延迟发生时正在运行的进程为idle(其pid为0)。中断的关闭操作是在函数reschedule_interrupt中进行的,由“=>startedat:”标识,函数restore_all_ontrace重新激活了中断,由“=>endedat:”标识;中断关闭的最大延迟发生在函数
trace_hardirqs_on_thunk中,这个可以从最大延迟时间所在的记录项看到,也可以从延迟记录信息中最后的“=>”标识所对应的记录行知道这一点。</P>
<P>在输出信息中,irqs-off、need_resched等字段对应于进程结构structtask_struct的字段或者状态标志,可以从头文件arch/<platform>/include/asm/thread_info.h中查看进程支持的状态标志,include/linux/sched.h则给出了结构structtask_struct的定义。其中,irqs-off字段显示是否中断被禁止,为‘d’表示中断被禁止;need_resched字段显示为‘N’表示设置了进程状态标志
TIF_NEED_RESCHED。hardirq/softirq字段表示当前是否发生硬件中断/软中断;preempt-depth表示是否禁止进程抢占,例如在持有自旋锁的情况下进程是不能被抢占的,本例中进程idle是可以被其它进程抢占的。结构structtask_struct中的lock_depth字段是与大内核锁相关的,而最近的内核开发工作中有一部分是与移除大内核锁相关的,这里对该字段不再加以描述。</P>
<P>另外,还有用于跟踪禁止进程抢占的跟踪器preemptoff和跟踪中断/抢占禁止的跟踪器preemptirqsoff,它们的使用方式与输出信息格式与irqsoff跟踪器类似,这里不再赘述。</P>
<DIVclass=ibm-alternate-rule>
<HR>
</DIV>
<Pclass="ibm-ind-linkibm-back-to-top"><BR>
</P>
<P><Aname=major6><SPANclass=atitle>跟踪指定模块中的函数</SPAN></A></P>
<P>前面提过,通过文件set_ftrace_filter可以指定要跟踪的函数,缺省目标为所有可跟踪的内核函数;可以将感兴趣的函数通过echo写入该文件。为了方便使用,set_ftrace_filter文件还支持简单格式的通配符。</P>
<UL>
<LI><STRONG>begin*</STRONG>选择所有名字以begin字串开头的函数<LI><STRONG>*middle*</STRONG>选择所有名字中包含middle字串的函数<LI><STRONG>*end</STRONG>选择所有名字以end字串结尾的函数</LI></UL>
<P>需要注意的是,这三种形式不能组合使用,比如“begin*middle*end”实际的效果与“begin”相同。另外,使用通配符表达式时,需要用单引号将其括起来,如果使用双引号,shell可能会对字符‘*’进行扩展,这样最终跟踪的对象可能与目标函数不一样。</P>
<P>通过该文件还可以指定属于特定模块的函数,这要用到mod指令。指定模块的格式为:</P>
<TABLEborder=0cellSpacing=0cellPadding=0width="100%"sizcache="0"sizset="6"><TBODYsizcache="0"sizset="6"><TRsizcache="0"sizset="6"><TDclass=code-outlinesizcache="0"sizset="6"><PREclass=displaycodename="code">echo':mod:[module_name]'>set_ftrace_filter
</PRE></TD></TR></TBODY></TABLE>
<BR>
<P>下面给出了一个指定跟踪模块ipv6中的函数的例子。可以看到,指定跟踪模块ipv6中的函数会将文件set_ftrace_filter的内容设置为只包含该模块中的函数。</P>
<BR>
<Aname=listing5><STRONG>清单5.指定跟踪ipv6模块中的函数</STRONG></A><BR>
<SPANstyle="FONT-SIZE:16px"></SPAN><PREclass=cppname="code">
[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo':mod:ipv6'>set_ftrace_filter
[root@linuxtracing]#catset_ftrace_filter|head-5
ipv6_opt_accepted
inet6_net_exit
ipv6_gro_complete
inet6_create
ipv6_addr_copy
</PRE><BR>
<BR>
<DIVclass=ibm-alternate-rule>
<HR>
</DIV>
<Pclass="ibm-ind-linkibm-back-to-top"><BR>
</P>
<P><Aname=major7><SPANclass=atitle>小结</SPAN></A></P>
<P>本系列文章对ftrace的配置和使用进行了介绍。本文是其中的第二部分,介绍了ftrace所提供的跟踪器的使用,重点对不同格式的跟踪信息进行了描述,并对如何跟踪指定模块中的函数进行了介绍。当前Linux内核中ftrace支持多个跟踪器,由于篇幅限制,这里不能对其全部加以描述。读者可参照参考资料中给出的信息对ftrace跟踪器进行更深入的了解。下一篇文章将对如何在代码中通过ftrace提供的工具函数与ftrace交互进行描述。</P>
<BR>
<BR>
<PRE></PRE>

[cpp]viewplaincopyprint?
[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#echo1>/proc/sys/kernel/ftrace_enabled
[root@linuxtracing]#echofunction>current_tracer
[root@linuxtracing]#echo1>tracing_on
[root@linuxtracing]#echo1>tracing_enabled

#让内核运行一段时间,这样ftrace可以收集一些跟踪信息,之后再停止跟踪

[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#cattrace|head-10
#tracer:function
#
#TASK-PIDCPU#TIMESTAMPFUNCTION
#|||||
<idle>-0[000]20654.426521:_raw_spin_lock<-scheduler_tick
<idle>-0[000]20654.426522:task_tick_idle<-scheduler_tick
<idle>-0[000]20654.426522:cpumask_weight<-scheduler_tick
<idle>-0[000]20654.426523:cpumask_weight<-scheduler_tick
<idle>-0[000]20654.426523:run_posix_cpu_timers<-update_process_times
<idle>-0[000]20654.426524:hrtimer_forward<-tick_sched_timer

[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#echo1>/proc/sys/kernel/ftrace_enabled
[root@linuxtracing]#echofunction>current_tracer
[root@linuxtracing]#echo1>tracing_on
[root@linuxtracing]#echo1>tracing_enabled

#让内核运行一段时间,这样ftrace可以收集一些跟踪信息,之后再停止跟踪

[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#cattrace|head-10
#tracer:function
#
#TASK-PIDCPU#TIMESTAMPFUNCTION
#|||||
<idle>-0[000]20654.426521:_raw_spin_lock<-scheduler_tick
<idle>-0[000]20654.426522:task_tick_idle<-scheduler_tick
<idle>-0[000]20654.426522:cpumask_weight<-scheduler_tick
<idle>-0[000]20654.426523:cpumask_weight<-scheduler_tick
<idle>-0[000]20654.426523:run_posix_cpu_timers<-update_process_times
<idle>-0[000]20654.426524:hrtimer_forward<-tick_sched_timer



trace文件给出的信息格式很清晰。首先,字段“tracer:”给出了当前所使用的跟踪器的名字,这里为function跟踪器。然后是跟踪信息记录的格式,TASK字段对应任务的名字,PID字段则给出了任务的进程ID,字段CPU#表示运行被跟踪函数的CPU号,这里可以看到idle进程运行在0号CPU上,其进程ID是0;字段TIMESTAMP是时间戳,其格式为“<secs>.<usecs>[code]”
,表示执行该函数时对应的时间戳;FUNCTION一列则给出了被跟踪的函数,函数的调用者通过符号“<-”标明,这样可以观察到函数的调用关系。[/code]


function_graph跟踪器

在function跟踪器给出的信息中,可以通过FUNCTION列中的符号“<-”来查看函数调用关系,但是由于中间会混合不同函数的调用,导致看起来非常混乱,十分不方便。function_graph跟踪器则可以提供类似C代码的函数调用关系信息。通过写文件set_graph_function可以显示指定要生成调用关系的函数,缺省会对所有可跟踪的内核函数生成函数调用关系图。清单2给出了使用function_grapch跟踪器的示例,示例中将内核函数__do_fault作为观察对象,该函数在内核运作过程中会被频繁调用。

清单2.function_graph跟踪器使用示例


在文件trace的输出信息中,首先给出的也是当前跟踪器的名字,这里是function_graph。接下来是详细的跟踪信息,可以看到,函数的调用关系以类似C代码的形式组织。

CPU字段给出了执行函数的CPU号,本例中都为1号CPU。DURATION字段给出了函数执行的时间长度,以us为单位。FUNCTIONCALLS则给出了调用的函数,并显示了调用流程。注意,对于不调用其它函数的函数,其对应行以“;”结尾,而且对应的DURATION字段给出其运行时长;对于调用其它函数的函数,则在其“}”对应行给出了运行时长,该时间是一个累加值,包括了其内部调用的函数的执行时长。DURATION字段给出的时长并不是精确的,它还包含了执行ftrace自身的代码所耗费的时间,所以示例中将内部函数时长累加得到的结果会与对应的外围调用函数的执行时长并不一致;不过通过该字段还是可以大致了解函数在时间上的运行开销的。清单2中最后通过echo命令重置了文件set_graph_function。



sched_switch跟踪器

sched_switch跟踪器可以对进程的调度切换以及之间的唤醒操作进行跟踪,如清单3所示。

清单3.sched_switch跟踪器使用示例


[cpp]viewplaincopyprint?
[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#echo1>/proc/sys/kernel/ftrace_enabled
[root@linuxtracing]#echosched_switch>current_tracer
[root@linuxtracing]#echo1>tracing_on
[root@linuxtracing]#echo1>tracing_enabled

#让内核运行一段时间,这样ftrace可以收集一些跟踪信息,之后再停止跟踪

[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#cattrace|head-10
#tracer:sched_switch
#
#TASK-PIDCPU#TIMESTAMPFUNCTION
#|||||
bash-1408[000]26208.816058:1408:120:S+[000]1408:120:Sbash
bash-1408[000]26208.816070:1408:120:S+[000]1408:120:Sbash
bash-1408[000]26208.816921:1408:120:R+[000]9:120:Revents/0
bash-1408[000]26208.816939:1408:120:R==>[000]9:120:Revents/0
events/0-9[000]26208.817081:9:120:R+[000]1377:120:Rgnome-terminal
events/0-9[000]26208.817088:9:120:S==>[000]1377:120:Rgnome-terminal


[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#echo1>/proc/sys/kernel/ftrace_enabled
[root@linuxtracing]#echosched_switch>current_tracer
[root@linuxtracing]#echo1>tracing_on
[root@linuxtracing]#echo1>tracing_enabled

#让内核运行一段时间,这样ftrace可以收集一些跟踪信息,之后再停止跟踪

[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#cattrace|head-10
#tracer:sched_switch
#
#TASK-PIDCPU#TIMESTAMPFUNCTION
#|||||
bash-1408[000]26208.816058:1408:120:S+[000]1408:120:Sbash
bash-1408[000]26208.816070:1408:120:S+[000]1408:120:Sbash
bash-1408[000]26208.816921:1408:120:R+[000]9:120:Revents/0
bash-1408[000]26208.816939:1408:120:R==>[000]9:120:Revents/0
events/0-9[000]26208.817081:9:120:R+[000]1377:120:Rgnome-terminal
events/0-9[000]26208.817088:9:120:S==>[000]1377:120:Rgnome-terminal


在sched_swich跟踪器获取的跟踪信息中记录了进程间的唤醒操作和调度切换信息,可以通过符号‘+’和‘==>’区分;唤醒操作记录给出了当前进程唤醒运行的进程,进程调度切换记录中显示了接替当前进程运行的后续进程。

描述进程状态的格式为“Task-PID:Priority:Task-State”。以示例跟踪信息中的第一条跟踪记录为例,可以看到进程bash的PID为1408,其对应的内核态优先级为120,当前状态为S(可中断睡眠状态),当前bash并没有唤醒其它进程;从第3条记录可以看到,进程bash将进程events/0唤醒,而在第4条记录中发生了进程调度,进程bash切换到进程events/0执行。

在Linux内核中,进程的状态在内核头文件include/linux/sched.h中定义,包括可运行状态TASK_RUNNING(对应跟踪信息中的符号‘R’)、可中断阻塞状态TASK_INTERRUPTIBLE(对应跟踪信息中的符号‘S’)等。同时该头文件也定义了用户态进程所使用的优先级的范围,最小值为MAX_USER_RT_PRIO(值为100),最大值为MAX_PRIO-1(对应值为139),缺省为DEFAULT_PRIO(值为120);在本例中,进程优先级都是缺省值120。



irqsoff跟踪器

当关闭中断时,CPU会延迟对设备的状态变化做出反应,有时候这样做会对系统性能造成比较大的影响。irqsoff跟踪器可以对中断被关闭的状况进行跟踪,有助于发现导致较大延迟的代码;当出现最大延迟时,跟踪器会记录导致延迟的跟踪信息,文件tracing_max_latency则记录中断被关闭的最大延时。

清单4.irqsoff跟踪器使用示例


[cpp]viewplaincopyprint?
[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#echo1>/proc/sys/kernel/ftrace_enabled
[root@linuxtracing]#echoirqsoff>current_tracer
[root@linuxtracing]#echo1>tracing_on
[root@linuxtracing]#echo1>tracing_enabled

#让内核运行一段时间,这样ftrace可以收集一些跟踪信息,之后再停止跟踪

[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#cattrace|head-35
#tracer:irqsoff
#
#irqsofflatencytracev1.1.5on2.6.33.1
#--------------------------------------------------------------------
#latency:34380us,#6/6,CPU#1|(M:desktopVP:0,KP:0,SP:0HP:0#P:2)
#-----------------
#|task:-0(uid:0nice:0policy:0rt_prio:0)
#-----------------
#=>startedat:reschedule_interrupt
#=>endedat:restore_all_notrace
#
#
#_------=>CPU#
#/_-----=>irqs-off
#|/_----=>need-resched
#||/_---=>hardirq/softirq
#|||/_--=>preempt-depth
#||||/_--=>lock-depth
#|||||/delay
#cmdpid||||||time|caller
#\/||||||\|/
<idle>-01dN...4285us!:trace_hardirqs_off_thunk<-reschedule_interrupt
<idle>-01dN...34373us+:smp_reschedule_interrupt<-reschedule_interrupt
<idle>-01dN...34375us+:native_apic_mem_write<-smp_reschedule_interrupt
<idle>-01dN...34380us+:trace_hardirqs_on_thunk<-restore_all_notrace
<idle>-01dN...34384us:trace_hardirqs_on_caller<-restore_all_notrace
<idle>-01dN...34384us:<stacktrace>
=>trace_hardirqs_on_thunk
[root@linuxtracing]#cattracing_max_latency
34380


[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#echo1>/proc/sys/kernel/ftrace_enabled
[root@linuxtracing]#echoirqsoff>current_tracer
[root@linuxtracing]#echo1>tracing_on
[root@linuxtracing]#echo1>tracing_enabled

#让内核运行一段时间,这样ftrace可以收集一些跟踪信息,之后再停止跟踪

[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#cattrace|head-35
#tracer:irqsoff
#
#irqsofflatencytracev1.1.5on2.6.33.1
#--------------------------------------------------------------------
#latency:34380us,#6/6,CPU#1|(M:desktopVP:0,KP:0,SP:0HP:0#P:2)
#-----------------
#|task:-0(uid:0nice:0policy:0rt_prio:0)
#-----------------
#=>startedat:reschedule_interrupt
#=>endedat:restore_all_notrace
#
#
#_------=>CPU#
#/_-----=>irqs-off
#|/_----=>need-resched
#||/_---=>hardirq/softirq
#|||/_--=>preempt-depth
#||||/_--=>lock-depth
#|||||/delay
#cmdpid||||||time|caller
#\/||||||\|/
<idle>-01dN...4285us!:trace_hardirqs_off_thunk<-reschedule_interrupt
<idle>-01dN...34373us+:smp_reschedule_interrupt<-reschedule_interrupt
<idle>-01dN...34375us+:native_apic_mem_write<-smp_reschedule_interrupt
<idle>-01dN...34380us+:trace_hardirqs_on_thunk<-restore_all_notrace
<idle>-01dN...34384us:trace_hardirqs_on_caller<-restore_all_notrace
<idle>-01dN...34384us:<stacktrace>
=>trace_hardirqs_on_thunk
[root@linuxtracing]#cattracing_max_latency
34380


从清单4中的输出信息中,可以看到当前irqsoff延迟跟踪器的版本信息。接下来是最大延迟时间,以us为单位,本例中为34380us,查看文件tracing_max_latency也可以获取该值。从“task:”字段可以知道延迟发生时正在运行的进程为idle(其pid为0)。中断的关闭操作是在函数reschedule_interrupt中进行的,由“=>startedat:”标识,函数restore_all_ontrace重新激活了中断,由“=>endedat:”标识;中断关闭的最大延迟发生在函数trace_hardirqs_on_thunk中,这个可以从最大延迟时间所在的记录项看到,也可以从延迟记录信息中最后的“=>”标识所对应的记录行知道这一点。

在输出信息中,irqs-off、need_resched等字段对应于进程结构structtask_struct的字段或者状态标志,可以从头文件arch/<platform>/include/asm/thread_info.h中查看进程支持的状态标志,include/linux/sched.h则给出了结构structtask_struct的定义。其中,irqs-off字段显示是否中断被禁止,为‘d’表示中断被禁止;need_resched字段显示为‘N’表示设置了进程状态标志TIF_NEED_RESCHED。hardirq/softirq字段表示当前是否发生硬件中断/软中断;preempt-depth表示是否禁止进程抢占,例如在持有自旋锁的情况下进程是不能被抢占的,本例中进程idle是可以被其它进程抢占的。结构structtask_struct中的lock_depth字段是与大内核锁相关的,而最近的内核开发工作中有一部分是与移除大内核锁相关的,这里对该字段不再加以描述。

另外,还有用于跟踪禁止进程抢占的跟踪器preemptoff和跟踪中断/抢占禁止的跟踪器preemptirqsoff,它们的使用方式与输出信息格式与irqsoff跟踪器类似,这里不再赘述。



跟踪指定模块中的函数

前面提过,通过文件set_ftrace_filter可以指定要跟踪的函数,缺省目标为所有可跟踪的内核函数;可以将感兴趣的函数通过echo写入该文件。为了方便使用,set_ftrace_filter文件还支持简单格式的通配符。

begin*选择所有名字以begin字串开头的函数
*middle*选择所有名字中包含middle字串的函数
*end选择所有名字以end字串结尾的函数

需要注意的是,这三种形式不能组合使用,比如“begin*middle*end”实际的效果与“begin”相同。另外,使用通配符表达式时,需要用单引号将其括起来,如果使用双引号,shell可能会对字符‘*’进行扩展,这样最终跟踪的对象可能与目标函数不一样。

通过该文件还可以指定属于特定模块的函数,这要用到mod指令。指定模块的格式为:

echo':mod:[module_name]'>set_ftrace_filter

下面给出了一个指定跟踪模块ipv6中的函数的例子。可以看到,指定跟踪模块ipv6中的函数会将文件set_ftrace_filter的内容设置为只包含该模块中的函数。

清单5.指定跟踪ipv6模块中的函数

[cpp]
viewplaincopyprint?

[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo':mod:ipv6'>set_ftrace_filter
[root@linuxtracing]#catset_ftrace_filter|head-5
ipv6_opt_accepted
inet6_net_exit
ipv6_gro_complete
inet6_create
ipv6_addr_copy


[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo':mod:ipv6'>set_ftrace_filter
[root@linuxtracing]#catset_ftrace_filter|head-5
ipv6_opt_accepted
inet6_net_exit
ipv6_gro_complete
inet6_create
ipv6_addr_copy


小结

本系列文章对ftrace的配置和使用进行了介绍。本文是其中的第二部分,介绍了ftrace所提供的跟踪器的使用,重点对不同格式的跟踪信息进行了描述,并对如何跟踪指定模块中的函数进行了介绍。当前Linux内核中ftrace支持多个跟踪器,由于篇幅限制,这里不能对其全部加以描述。读者可参照参考资料中给出的信息对ftrace跟踪器进行更深入的了解。下一篇文章将对如何在代码中通过ftrace提供的工具函数与ftrace交互进行描述。



使用ftrace调试Linux内核,第3部分

/article/2458806.html


简介:ftrace是Linux内核中提供的一种调试工具。使用ftrace可以对内核中发生的事情进行跟踪,这在调试bug或者分析内核时非常有用。本系列文章对ftrace进行了介绍,分为三部分。本文是第三部分,通过示例代码介绍如何在代码中使用ftrace提供的工具函数,以与ftrace交互。通过本文的讲解,读者可以在实际代码中使用ftrace,方便了调试和分析。

内核头文件include/linux/kernel.h中描述了ftrace提供的工具函数的原型,这些函数包括trace_printk、tracing_on/tracing_off等。本文通过示例模块程序向读者展示如何在代码中使用这些工具函数。

使用trace_printk打印跟踪信息

ftrace提供了一个用于向ftrace跟踪缓冲区输出跟踪信息的工具函数,叫做trace_printk(),它的使用方式与printk()类似。可以通过trace文件读取该函数的输出。从头文件include/linux/kernel.h中可以看到,在激活配置CONFIG_TRACING后,trace_printk()定义为宏:

#definetrace_printk(fmt,args...)		\
...

下面通过一个示例模块ftrace_demo来演示如何使用trace_printk()向跟踪缓冲区输出信息,以及如何查看这些信息。这里的示例模块程序中仅提供了初始化和退出函数,这样读者不会因为需要为模块创建必要的访问接口比如设备文件而分散注意力。注意,编译模块时要加入-pg选项。

清单1.示例模块ftrace_demo

[cpp]
viewplaincopyprint?

/*
*ftrace_demo.c
*/
#include<linux/init.h>

#include<linux/module.h>
#include<linux/kernel.h>

MODULE_LICENSE("GPL");

staticintftrace_demo_init(void)
{
trace_printk("Cannotseethisintraceunlessloadedforthesecondtime\n");
return0;
}

staticvoidftrace_demo_exit(void)
{
trace_printk("Moduleunloading\n");
}

module_init(ftrace_demo_init);
module_exit(ftrace_demo_exit);


/*
*ftrace_demo.c
*/
#include<linux/init.h>
#include<linux/module.h>
#include<linux/kernel.h>

MODULE_LICENSE("GPL");

staticintftrace_demo_init(void)
{
trace_printk("Cannotseethisintraceunlessloadedforthesecondtime\n");
return0;
}

staticvoidftrace_demo_exit(void)
{
trace_printk("Moduleunloading\n");
}

module_init(ftrace_demo_init);
module_exit(ftrace_demo_exit);


示例模块非常简单,仅仅是在模块初始化函数和退出函数中输出信息。接下来要对模块的运行进行跟踪,如清单2所示。

清单2.对模块ftrace_demo进行跟踪

[cpp]viewplaincopyprint?
[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#echo1>/proc/sys/kernel/ftrace_enabled
[root@linuxtracing]#echofunction_graph>current_tracer

#事先加载模块ftrace_demo

[root@linuxtracing]#echo':mod:ftrace_demo'>set_ftrace_filter
[root@linuxtracing]#catset_ftrace_filter
ftrace_demo_init
ftrace_demo_exit

#将模块ftrace_demo卸载

[root@linuxtracing]#echo1>tracing_enabled

#重新进行模块ftrace_demo的加载与卸载操作

[root@linuxtracing]#cattrace
#tracer:function_graph
#
#CPUDURATIONFUNCTIONCALLS
#|||||||
1)|/*Cannotseethisintraceunlessloadedforthesecondtime*/
0)|/*Moduleunloading*/

[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#echo1>/proc/sys/kernel/ftrace_enabled
[root@linuxtracing]#echofunction_graph>current_tracer

#事先加载模块ftrace_demo

[root@linuxtracing]#echo':mod:ftrace_demo'>set_ftrace_filter
[root@linuxtracing]#catset_ftrace_filter
ftrace_demo_init
ftrace_demo_exit

#将模块ftrace_demo卸载

[root@linuxtracing]#echo1>tracing_enabled

#重新进行模块ftrace_demo的加载与卸载操作

[root@linuxtracing]#cattrace
#tracer:function_graph
#
#CPUDURATIONFUNCTIONCALLS
#|||||||
1)|/*Cannotseethisintraceunlessloadedforthesecondtime*/
0)|/*Moduleunloading*/



在这个例子中,使用mod指令显式指定跟踪模块ftrace_demo中的函数,这需要提前加载该模块,否则在写文件set_ftrace_filter时会因为找不到该模块报错。这样在第一次加载模块时,其初始化函数ftrace_demo_init中调用trace_printk打印的语句就跟踪不到了。因此这里会将其卸载,然后激活跟踪,再重新进行模块ftrace_demo的加载与卸载操作。最终可以从文件trace中看到模块在初始化和退出时调用trace_printk()输出的信息。

这里仅仅是为了以简单的模块进行演示,故只定义了模块的init/exit函数,重复加载模块也只是为了获取初始化函数输出的跟踪信息。实践中,可以在模块的功能函数中加入对trace_printk的调用,这样可以记录模块的运作情况,然后对其特定功能进行调试优化。还可以将对trace_printk()的调用通过宏来控制编译,这样可以在调试时将其开启,在最终发布时将其关闭。

回页首
使用tracing_on/tracing_off控制跟踪信息的记录

在跟踪过程中,有时候在检测到某些事件发生时,想要停止跟踪信息的记录,这样,跟踪缓冲区中较新的数据是与该事件有关的。在用户态,可以通过向文件tracing_on写入0来停止记录跟踪信息,写入1会继续记录跟踪信息。而在内核代码中,可以通过函数tracing_on()和tracing_off()来做到这一点,它们的行为类似于对/sys/kernel/debug/tracing下的文件tracing_on分别执行写1和写0的操作。使用这两个函数,会对跟踪信息的记录控制地更准确一些,这是因为在用户态写文件
tracing_on到实际暂停跟踪,中间由于上下文切换、系统调度控制等可能已经经过较长的时间,这样会积累大量的跟踪信息,而感兴趣的那部分可能会被覆盖掉了。

现在对清单1中的代码进行修改,使用tracing_off()来控制跟踪信息记录的暂停。

清单3.使用tracing_off的模块ftrace_demo


[cpp]viewplaincopyprint?/*
*ftrace_demo.c
*modifiedtodemostratetheusageoftracing_off
*/
#include<linux/init.h>
#include<linux/module.h>
#include<linux/kernel.h>

MODULE_LICENSE("GPL");

staticintftrace_demo_init(void)
{
trace_printk("ftrace_demo_initcalled\n");
tracing_off();
return0;
}

staticvoidftrace_demo_exit(void)
{
trace_printk("ftrace_demo_exitcalled\n");
tracing_off();
}

module_init(ftrace_demo_init);
module_exit(ftrace_demo_exit);
/*
*ftrace_demo.c
*modifiedtodemostratetheusageoftracing_off
*/
#include<linux/init.h>
#include<linux/module.h>
#include<linux/kernel.h>

MODULE_LICENSE("GPL");

staticintftrace_demo_init(void)
{
trace_printk("ftrace_demo_initcalled\n");
tracing_off();
return0;
}

staticvoidftrace_demo_exit(void)
{
trace_printk("ftrace_demo_exitcalled\n");
tracing_off();
}

module_init(ftrace_demo_init);
module_exit(ftrace_demo_exit);


下面对其进行跟踪,如清单4所示。

清单4.跟踪


[cpp]viewplaincopyprint?[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#echo1>/proc/sys/kernel/ftrace_enabled
[root@linuxtracing]#echo1>tracing_on
[root@linuxtracing]#echofunction>current_tracer
[root@linuxtracing]#echo1>tracing_enabled

#加载模块ftrace_demo,模块初始化函数ftrace_demo_init被调用

[root@linuxtracing]#cattracing_on
0
[root@linuxtracing]#cattrace|wc-l
120210
[root@linuxtracing]#cattrace|grep-nftrace_demo_init
120187:insmod-2897[000]2610.504611:ftrace_demo_init<-do_one_initcall
120193:insmod-2897[000]2610.504667:ftrace_demo_init:ftrace_demo_initcalled

[root@linuxtracing]#echo1>tracing_on#继续跟踪信息的记录

#卸载模块ftrace_demo,模块函数ftrace_demo_exit被调用

[root@linuxtracing]#cattracing_on
0
[root@linuxtracing]#wc-ltrace
120106trace
[root@linuxtracing]#grep-nftrace_demo_exittrace
120106:rmmod-2992[001]3016.884449::ftrace_demo_exitcalled
[root@linuxtracing]#pwd
/sys/kernel/debug/tracing
[root@linuxtracing]#echo0>tracing_enabled
[root@linuxtracing]#echo1>/proc/sys/kernel/ftrace_enabled
[root@linuxtracing]#echo1>tracing_on
[root@linuxtracing]#echofunction>current_tracer
[root@linuxtracing]#echo1>tracing_enabled

#加载模块ftrace_demo,模块初始化函数ftrace_demo_init被调用

[root@linuxtracing]#cattracing_on
0
[root@linuxtracing]#cattrace|wc-l
120210
[root@linuxtracing]#cattrace|grep-nftrace_demo_init
120187:insmod-2897[000]2610.504611:ftrace_demo_init<-do_one_initcall
120193:insmod-2897[000]2610.504667:ftrace_demo_init:ftrace_demo_initcalled

[root@linuxtracing]#echo1>tracing_on#继续跟踪信息的记录

#卸载模块ftrace_demo,模块函数ftrace_demo_exit被调用

[root@linuxtracing]#cattracing_on
0
[root@linuxtracing]#wc-ltrace
120106trace
[root@linuxtracing]#grep-nftrace_demo_exittrace
120106:rmmod-2992[001]3016.884449::ftrace_demo_exitcalled


在这个例子中,跟踪开始之前需要确保tracing_on的值为1。跟踪开始后,加载模块ftrace_demo,其初始化方法ftrace_demo_init被调用,该方法会调用tracing_off()函数来暂停跟踪信息的记录,这时文件tracing_on的值被代码设置为0。查看文件trace,可以看到ftrace_demo_init相关的记录位于跟踪信息的末端,这是因为从调用trace_off()到其生效需要一段时间,这段时间中的内核活动会被记录下来;相比从用户态读写tracing_on
文件,这段时间开销要小了许多。卸载模块时的情况与此类似。从这里可以看到,在代码中使用tracing_off()可以控制将感兴趣的信息保存在跟踪缓冲区的末端位置,不会很快被新的信息所覆盖,便于及时查看。

实际代码中,可以通过特定条件(比如检测到某种异常状况,等等)来控制跟踪信息的记录,函数的使用方式类似如下的形式:

if(condition)
tracing_on()ortracing_off()

跟踪模块运行状况时,使用ftrace命令操作序列在用户态进行必要的设置,而在代码中则可以通过traceing_on()控制在进入特定代码区域时开启跟踪信息,并在遇到某些条件时通过tracing_off()暂停;读者可以在查看完感兴趣的信息后,将1写入tracing_on文件以继续记录跟踪信息。实践中,可以通过宏来控制是否将对这些函数的调用编译进内核模块,这样可以在调试时将其开启,在最终发布时将其关闭。

用户态的应用程序可以通过直接读写文件tracing_on来控制记录跟踪信息的暂停状态,以便了解应用程序运行期间内核中发生的活动。

回页首
小结

本系列文章对ftrace的配置和使用进行了介绍。本文是其中的第三部分,通过示例代码介绍了ftrace所提供的部分工具函数的使用,包括trace_printk、tracing_on/tracing_off。至此,本系列文章的所有内容都已经介绍完毕。通过本系列文章,读者可以了解如何使用ftrace来对内核进行调试和分析。ftrace使用起来非常灵活,并且支持多种跟踪器,而且它的框架结构使得在添加新的跟踪器的时候会比较方便。读者可以在以后的工作和学习中探索ftrace的各种用法。


[/b]

贴上2.6.35内核里ftrace.txt

ftrace-FunctionTracer

========================

Copyright2008RedHatInc.Author:StevenRostedt<srostedt@redhat.com>License:TheGNUFreeDocumentationLicense,Version1.2(duallicensedundertheGPLv2)Reviewers:EliasOltmanns,Randy
Dunlap,AndrewMorton,JohnKacur,andDavidTeigland.Writtenfor:2.6.28-rc2

Introduction

------------

Ftraceisaninternaltracerdesignedtohelpoutdevelopersanddesignersofsystemstofindwhatisgoingoninsidethekernel.Itcanbeusedfordebuggingoranalyzinglatenciesandperformanceissuesthattakeplaceoutsideofuser-space.

Althoughftraceisthefunctiontracer,italsoincludesaninfrastructurethatallowsforothertypesoftracing.Someofthetracersthatarecurrentlyinftraceincludeatracertotracecontextswitches,thetimeittakesforahighprioritytasktorun
afteritwaswokenup,thetimeinterruptsaredisabled,andmore(ftraceallowsfortracerplugins,whichmeansthatthelistoftracerscanalwaysgrow).

ImplementationDetails

----------------------

Seeftrace-design.txtfordetailsforarchportersandsuch.

TheFileSystem

---------------

Ftraceusesthedebugfsfilesystemtoholdthecontrolfilesaswellasthefilestodisplayoutput.

Whendebugfsisconfiguredintothekernel(whichselectinganyftraceoptionwilldo)thedirectory/sys/kernel/debugwillbecreated.Tomountthisdirectory,youcanaddtoyour/etc/fstabfile:

debugfs/sys/kernel/debugdebugfsdefaults00

Oryoucanmountitatruntimewith:

mount-tdebugfsnodev/sys/kernel/debug

Forquickeraccesstothatdirectoryyoumaywanttomakeasoftlinktoit:

ln-s/sys/kernel/debug/debug

Anyselectedftraceoptionwillalsocreateadirectorycalledtracingwithinthedebugfs.Therestofthedocumentwillassumethatyouareintheftracedirectory(cd/sys/kernel/debug/tracing)andwillonlyconcentrateonthefileswithinthatdirectory
andnotdistractfromthecontentwiththeextended"/sys/kernel/debug/tracing"pathname.

That'sit!(assumingthatyouhaveftraceconfiguredintoyourkernel)

Aftermountingthedebugfs,youcanseeadirectorycalled"tracing".Thisdirectorycontainsthecontrolandoutputfilesofftrace.Hereisalistofsomeofthekeyfiles:

Note:alltimevaluesareinmicroseconds.

current_tracer:

Thisisusedtosetordisplaythecurrenttracer

thatisconfigured.

available_tracers:

Thisholdsthedifferenttypesoftracersthat

havebeencompiledintothekernel.The

tracerslistedherecanbeconfiguredby

echoingtheirnameintocurrent_tracer.

tracing_enabled:

Thissetsordisplayswhetherthecurrent_tracer

isactivatedandtracingornot.Echo0intothis

filetodisablethetraceror1toenableit.

trace:

Thisfileholdstheoutputofthetraceinahuman

readableformat(describedbelow).

trace_pipe:

Theoutputisthesameasthe"trace"filebutthis

fileismeanttobestreamedwithlivetracing.

Readsfromthisfilewillblockuntilnewdatais

retrieved.Unlikethe"trace"file,thisfileisa

consumer.Thismeansreadingfromthisfilecauses

sequentialreadstodisplaymorecurrentdata.Once

dataisreadfromthisfile,itisconsumed,and

willnotbereadagainwithasequentialread.The

"trace"fileisstatic,andifthetracerisnot

addingmoredata,theywilldisplaythesame

informationeverytimetheyareread.

trace_options:

Thisfileletstheusercontroltheamountofdata

thatisdisplayedinoneoftheaboveoutput

files.

tracing_max_latency:

Someofthetracersrecordthemaxlatency.

Forexample,thetimeinterruptsaredisabled.

Thistimeissavedinthisfile.Themaxtrace

willalsobestored,anddisplayedby"trace".

Anewmaxtracewillonlyberecordedifthe

latencyisgreaterthanthevalueinthis

file.(inmicroseconds)

buffer_size_kb:

ThissetsordisplaysthenumberofkilobyteseachCPU

buffercanhold.Thetracerbuffersarethesamesize

foreachCPU.Thedisplayednumberisthesizeofthe

CPUbufferandnottotalsizeofallbuffers.The

tracebuffersareallocatedinpages(blocksofmemory

thatthekernelusesforallocation,usually4KBinsize).

Ifthelastpageallocatedhasroomformorebytes

thanrequested,therestofthepagewillbeused,

makingtheactualallocationbiggerthanrequested.

(Note,thesizemaynotbeamultipleofthepagesizeduetobuffermanagementoverhead.)

Thiscanonlybeupdatedwhenthecurrent_tracer

issetto"nop".

tracing_cpumask:

Thisisamaskthatletstheuseronlytrace

onspecifiedCPUS.Theformatisahexstring

representingtheCPUS.

set_ftrace_filter:

Whendynamicftraceisconfiguredin(seethe

sectionbelow"dynamicftrace"),thecodeisdynamically

modified(codetextrewrite)todisablecallingofthe

functionprofiler(mcount).Thisletstracingbeconfigured

inwithpracticallynooverheadinperformance.Thisalso

hasasideeffectofenablingordisablingspecificfunctions

tobetraced.Echoingnamesoffunctionsintothisfile

willlimitthetracetoonlythosefunctions.

Thisinterfacealsoallowsforcommandstobeused.Seethe

"Filtercommands"sectionformoredetails.

set_ftrace_notrace:

Thishasaneffectoppositetothatof

set_ftrace_filter.Anyfunctionthatisaddedherewillnot

betraced.Ifafunctionexistsinbothset_ftrace_filter

andset_ftrace_notrace,thefunctionwill_not_betraced.

set_ftrace_pid:

Havethefunctiontraceronlytraceasinglethread.

set_graph_function:

Seta"trigger"functionwheretracingshouldstart

withthefunctiongraphtracer(Seethesection

"dynamicftrace"formoredetails).

available_filter_functions:

Thisliststhefunctionsthatftrace

hasprocessedandcantrace.Thesearethefunction

namesthatyoucanpassto"set_ftrace_filter"or

"set_ftrace_notrace".(Seethesection"dynamicftrace"

belowformoredetails.)

TheTracers

-----------

Hereisthelistofcurrenttracersthatmaybeconfigured.

"function"

Functioncalltracertotraceallkernelfunctions.

"function_graph"

Similartothefunctiontracerexceptthatthe

functiontracerprobesthefunctionsontheirentry

whereasthefunctiongraphtracertracesonbothentry

andexitofthefunctions.Itthenprovidestheability

todrawagraphoffunctioncallssimilartoCcode

source.

"sched_switch"

Tracesthecontextswitchesandwakeupsbetweentasks.

"irqsoff"

Tracestheareasthatdisableinterruptsandsaves

thetracewiththelongestmaxlatency.

Seetracing_max_latency.Whenanewmaxisrecorded,

itreplacestheoldtrace.Itisbesttoviewthis

tracewiththelatency-formatoptionenabled.

"preemptoff"

Similartoirqsoffbuttracesandrecordstheamountof

timeforwhichpreemptionisdisabled.

"preemptirqsoff"

Similartoirqsoffandpreemptoff,buttracesand

recordsthelargesttimeforwhichirqsand/orpreemption

isdisabled.

"wakeup"

Tracesandrecordsthemaxlatencythatittakesfor

thehighestprioritytasktogetscheduledafter

ithasbeenwokenup.

"hw-branch-tracer"

UsestheBTSCPUfeatureonx86CPUstotracesall

branchesexecuted.

"nop"

Thisisthe"tracenothing"tracer.Toremoveall

tracersfromtracingsimplyecho"nop"into

current_tracer.

Examplesofusingthetracer

----------------------------

Herearetypicalexamplesofusingthetracerswhencontrollingthemonlywiththedebugfsinterface(withoutusinganyuser-landutilities).

Outputformat:

--------------

Hereisanexampleoftheoutputformatofthefile"trace"

--------#tracer:function##TASK-PIDCPU#TIMESTAMPFUNCTION#|||||bash-4251[01]10152.583854:path_put<-path_walk

bash-4251[01]10152.583855:dput<-path_put

bash-4251[01]10152.583855:_atomic_dec_and_lock<-dput

--------

Aheaderisprintedwiththetracernamethatisrepresentedbythetrace.Inthiscasethetraceris"function".Thenaheadershowingtheformat.Taskname"bash",thetaskPID"4251",theCPUthatitwasrunningon"01",thetimestampin<secs>.<usecs>format,
thefunctionnamethatwastraced"path_put"andtheparentfunctionthatcalledthisfunction"path_walk".Thetimestampisthetimeatwhichthefunctionwasentered.

Thesched_switchtraceralsoincludestracingoftaskwakeupsandcontextswitches.

ksoftirqd/1-7[01]1453.070013:7:115:R+2916:115:Sksoftirqd/1-7[01]1453.070013:7:115:R+10:115:Sksoftirqd/1-7[01]1453.070013:7:115:R==>10:115:Revents/1-10[01]1453.070013:10:115:S==>2916:115:R
kondemand/1-2916[01]1453.070013:2916:115:S==>7:115:Rksoftirqd/1-7[01]1453.070013:7:115:S==>0:140:R

Wakeupsarerepresentedbya"+"andthecontextswitchesareshownas"==>".Theformatis:

Contextswitches:

PrevioustaskNextTask

<pid>:<prio>:<state>==><pid>:<prio>:<state>

Wakeups:

CurrenttaskTaskwakingup

<pid>:<prio>:<state>+<pid>:<prio>:<state>

Theprioistheinternalkernelpriority,whichistheinverseoftheprioritythatisusuallydisplayedbyuser-spacetools.Zerorepresentsthehighestpriority(99).Prio100startsthe"nice"prioritieswith100beingequaltonice-20and139beingnice
19.Theprio"140"isreservedfortheidletaskwhichisthelowestprioritythread(pid0).

Latencytraceformat

--------------------

Whenthelatency-formatoptionisenabled,thetracefilegivessomewhatmoreinformationtoseewhyalatencyhappened.Hereisatypicaltrace.

#tracer:irqsoff#irqsofflatencytracev1.1.5on2.6.26-rc8

--------------------------------------------------------------------latency:97us,#3/3,CPU#0|(M:preemptVP:0,KP:0,SP:0HP:0#P:2)

-----------------

|task:swapper-0(uid:0nice:0policy:0rt_prio:0)

-----------------=>startedat:apic_timer_interrupt

=>endedat:do_softirq

#_------=>CPU##/_-----=>irqs-off#|/_----=>need-resched#||/_---=>hardirq/softirq#|||/_--=>preempt-depth#||||/#|||||delay#cmdpid
|||||time|caller#\/|||||\|/<idle>-00d..10us+:trace_hardirqs_off_thunk(apic_timer_interrupt)<idle>-00d.s.97us:__do_softirq(do_softirq)

<idle>-00d.s198us:trace_hardirqs_on(do_softirq)

Thisshowsthatthecurrenttraceris"irqsoff"tracingthetimeforwhichinterruptsweredisabled.Itgivesthetraceversionandtheversionofthekerneluponwhichthiswasexecutedon(2.6.26-rc8).Thenitdisplaysthemaxlatencyinmicrosecs(97us).
Thenumberoftraceentriesdisplayedandthetotalnumberrecorded(botharethree:#3/3).Thetypeofpreemptionthatwasused(PREEMPT).VP,KP,SP,andHParealwayszeroandarereservedforlateruse.#PisthenumberofonlineCPUS(#P:2).

Thetaskistheprocessthatwasrunningwhenthelatencyoccurred.(swapperpid:0).

Thestartandstop(thefunctionsinwhichtheinterruptsweredisabledandenabledrespectively)thatcausedthelatencies:

apic_timer_interruptiswheretheinterruptsweredisabled.

do_softirqiswheretheywereenabledagain.

Thenextlinesaftertheheaderarethetraceitself.Theheaderexplainswhichiswhich.

cmd:Thenameoftheprocessinthetrace.

pid:ThePIDofthatprocess.

CPU#:TheCPUwhichtheprocesswasrunningon.

irqs-off:'d'interruptsaredisabled.'.'otherwise.Note:Ifthearchitecturedoesnotsupportawaytoreadtheirqflagsvariable,an'X'willalways

beprintedhere.

need-resched:'N'taskneed_reschedisset,'.'otherwise.

hardirq/softirq:'H'-hardirqoccurredinsideasoftirq.

'h'-hardirqisrunning

's'-softirqisrunning

'.'-normalcontext.

preempt-depth:Thelevelofpreempt_disabled

Theaboveismostlymeaningfulforkerneldevelopers.

time:Whenthelatency-formatoptionisenabled,thetracefileoutputincludesatimestamprelativetothestartofthe

trace.Thisdiffersfromtheoutputwhenlatency-format

isdisabled,whichincludesanabsolutetimestamp.

delay:Thisisjusttohelpcatchyoureyeabitbetter.And

needstobefixedtobeonlyrelativetothesameCPU.

Themarksaredeterminedbythedifferencebetweenthis

currenttraceandthenexttrace.'!'-greaterthanpreempt_mark_thresh(default100)

'+'-greaterthan1microsecond

''-lessthanorequalto1microsecond.

Therestisthesameasthe'trace'file.

trace_options

-------------

Thetrace_optionsfileisusedtocontrolwhatgetsprintedinthetraceoutput.Toseewhatisavailable,simplycatthefile:

cattrace_options

print-parentnosym-offsetnosym-addrnoverbosenorawnohexnobin\noblocknostacktracenosched-treenouserstacktracenosym-userobj

Todisableoneoftheoptions,echointheoptionprependedwith"no".

echonoprint-parent>trace_options

Toenableanoption,leaveoffthe"no".

echosym-offset>trace_options

Herearetheavailableoptions:

print-parent-Onfunctiontraces,displaythecalling(parent)functionaswellasthefunctionbeingtraced.

print-parent:bash-4000[01]1477.606694:simple_strtoul<-strict_strtoul

noprint-parent:bash-4000[01]1477.606694:simple_strtoul

sym-offset-Displaynotonlythefunctionname,butalsotheoffsetinthefunction.Forexample,insteadof

seeingjust"ktime_get",youwillsee

"ktime_get+0xb/0x20".

sym-offset:bash-4000[01]1477.606694:simple_strtoul+0x6/0xa0

sym-addr-thiswillalsodisplaythefunctionaddressaswellasthefunctionname.

sym-addr:bash-4000[01]1477.606694:simple_strtoul<c0339346>

verbose-Thisdealswiththetracefilewhenthelatency-formatoptionisenabled.

bash4000100000000000010a95[58127d26]1720.415ms\

(+0.000ms):simple_strtoul(strict_strtoul)

raw-Thiswilldisplayrawnumbers.Thisoptionisbestforusewithuserapplicationsthatcantranslatetheraw

numbersbetterthanhavingitdoneinthekernel.

hex-Similartoraw,butthenumberswillbeinahexadecimalformat.

bin-Thiswillprintouttheformatsinrawbinary.

block-TBD(needsupdate)

stacktrace-Thisisoneoftheoptionsthatchangesthetraceitself.Whenatraceisrecorded,soisthestack

offunctions.Thisallowsforbacktracesof

tracesites.

userstacktrace-Thisoptionchangesthetrace.Itrecordsastacktraceofthecurrentuserspacethread.

sym-userobj-whenuserstacktraceareenabled,lookupwhich

objecttheaddressbelongsto,andprinta

relativeaddress.Thisisespeciallyusefulwhen

ASLRison,otherwiseyoudon'tgetachanceto

resolvetheaddresstoobject/file/lineafter

theappisnolongerrunning

Thelookupisperformedwhenyouread

trace,trace_pipe.Example:

a.out-1623[000]40874.465068:/root/a.out[+0x480]<-/root/a.out[+0x494]<-/root/a.out[+0x4a8]<-/lib/libc-2.7.so[+0x1e1a6]

sched-tree-tracealltasksthatareontherunqueue,ateveryschedulingevent.Willaddoverheadif

there'salotoftasksrunningatonce.

latency-format-Thisoptionchangesthetrace.Whenitisenabled,thetracedisplays

additionalinformationaboutthe

latencies,asdescribedin"Latency

traceformat".

sched_switch

------------

Thistracersimplyrecordsscheduleswitches.Hereisanexampleofhowtouseit.

#echosched_switch>current_tracer

#echo1>tracing_enabled

#sleep1

#echo0>tracing_enabled

#cattrace

#tracer:sched_switch##TASK-PIDCPU#TIMESTAMPFUNCTION#|||||bash-3997[01]240.132281:3997:120:R+4055:120:R

bash-3997[01]240.132284:3997:120:R==>4055:120:Rsleep-4055[01]240.132371:4055:120:S==>3997:120:Rbash-3997[01]240.132454:3997:120:R+4055:120:S

bash-3997[01]240.132457:3997:120:R==>4055:120:Rsleep-4055[01]240.132460:4055:120:D==>3997:120:Rbash-3997[01]240.132463:3997:120:R+4055:120:D

bash-3997[01]240.132465:3997:120:R==>4055:120:R<idle>-0[00]240.132589:0:140:R+4:115:S

<idle>-0[00]240.132591:0:140:R==>4:115:Rksoftirqd/0-4[00]240.132595:4:115:S==>0:140:R<idle>-0[00]240.132598:0:140:R+4:115:S

<idle>-0[00]240.132599:0:140:R==>4:115:Rksoftirqd/0-4[00]240.132603:4:115:S==>0:140:Rsleep-4055[01]240.133058:4055:120:S==>3997:120:R[...]

Aswehavediscussedpreviouslyaboutthisformat,theheadershowsthenameofthetraceandpointstotheoptions.The"FUNCTION"isamisnomersincehereitrepresentsthewakeupsandcontextswitches.

Thesched_switchfileonlyliststhewakeups(representedwith'+')andcontextswitches('==>')withtheprevioustaskorcurrenttaskfirstfollowedbythenexttaskortaskwakingup.TheformatforbothoftheseisPID:KERNEL-PRIO:TASK-STATE.Remember
thattheKERNEL-PRIOistheinverseoftheactualprioritywithzero(0)beingthehighestpriorityandthenicevaluesstartingat100(nice-20).Belowisaquickcharttomapthekernelprioritytouserlandpriorities.

KernelSpaceUserSpace===============================================================0(high)to98(low)userRTpriority99(high)to1(low)withSCHED_RRorSCHED_FIFO

---------------------------------------------------------------99sched_priorityisnotusedinschedulingdecisions(itmustbespecifiedas0)

---------------------------------------------------------------100(high)to139(low)usernice-20(high)to19(low)

---------------------------------------------------------------140idletaskpriority

---------------------------------------------------------------

Thetaskstatesare:

R-running:wantstorun,maynotactuallyberunning

S-sleep:processiswaitingtobewokenup(handlessignals)

D-disksleep(uninterruptiblesleep):processmustbewokenup(ignoressignals)T-stopped:processsuspended

t-traced:processisbeingtraced(withsomethinglikegdb)

Z-zombie:processwaitingtobecleanedup

X-unknown

ftrace_enabled

--------------

Thefollowingtracers(listedbelow)givedifferentoutputdependingonwhetherornotthesysctlftrace_enabledisset.Tosetftrace_enabled,onecaneitherusethesysctlfunctionorsetitviatheprocfilesysteminterface.

sysctlkernel.ftrace_enabled=1

or

echo1>/proc/sys/kernel/ftrace_enabled

Todisableftrace_enabledsimplyreplacethe'1'with'0'intheabovecommands.

Whenftrace_enabledissetthetracerswillalsorecordthefunctionsthatarewithinthetrace.Thedescriptionsofthetracerswillalsoshowanexamplewithftraceenabled.

irqsoff

-------

Wheninterruptsaredisabled,theCPUcannotreacttoanyotherexternalevent(besidesNMIsandSMIs).Thispreventsthetimerinterruptfromtriggeringorthemouseinterruptfromlettingthekernelknowofanewmouseevent.Theresultisalatencywith
thereactiontime.

Theirqsofftracertracksthetimeforwhichinterruptsaredisabled.Whenanewmaximumlatencyishit,thetracersavesthetraceleadinguptothatlatencypointsothateverytimeanewmaximumisreached,theoldsavedtraceisdiscardedandthenewtrace
issaved.

Toresetthemaximum,echo0intotracing_max_latency.Hereisanexample:

#echoirqsoff>current_tracer

#echolatency-format>trace_options

#echo0>tracing_max_latency

#echo1>tracing_enabled

#ls-ltr

[...]

#echo0>tracing_enabled

#cattrace#tracer:irqsoff#irqsofflatencytracev1.1.5on2.6.26

--------------------------------------------------------------------latency:12us,#3/3,CPU#1|(M:preemptVP:0,KP:0,SP:0HP:0#P:2)

-----------------

|task:bash-3730(uid:0nice:0policy:0rt_prio:0)

-----------------=>startedat:sys_setpgid

=>endedat:sys_setpgid

#_------=>CPU##/_-----=>irqs-off#|/_----=>need-resched#||/_---=>hardirq/softirq#|||/_--=>preempt-depth#||||/#|||||delay#cmdpid
|||||time|caller#\/|||||\|/bash-37301d...0us:_write_lock_irq(sys_setpgid)

bash-37301d..11us+:_write_unlock_irq(sys_setpgid)

bash-37301d..214us:trace_hardirqs_on(sys_setpgid)

Hereweseethatthatwehadalatencyof12microsecs(whichisverygood).The_write_lock_irqinsys_setpgiddisabledinterrupts.Thedifferencebetweenthe12andthedisplayedtimestamp14usoccurredbecausetheclockwasincrementedbetweenthetimeof
recordingthemaxlatencyandthetimeofrecordingthefunctionthathadthatlatency.

Notetheaboveexamplehadftrace_enablednotset.Ifwesettheftrace_enabled,wegetamuchlargeroutput:

#tracer:irqsoff#irqsofflatencytracev1.1.5on2.6.26-rc8

--------------------------------------------------------------------latency:50us,#101/101,CPU#0|(M:preemptVP:0,KP:0,SP:0HP:0#P:2)

-----------------

|task:ls-4339(uid:0nice:0policy:0rt_prio:0)

-----------------=>startedat:__alloc_pages_internal

=>endedat:__alloc_pages_internal

#_------=>CPU##/_-----=>irqs-off#|/_----=>need-resched#||/_---=>hardirq/softirq#|||/_--=>preempt-depth#||||/#|||||delay#cmdpid
|||||time|caller#\/|||||\|/ls-43390...10us+:get_page_from_freelist(__alloc_pages_internal)ls-43390d..13us:rmqueue_bulk(get_page_from_freelist)

ls-43390d..13us:_spin_lock(rmqueue_bulk)

ls-43390d..14us:add_preempt_count(_spin_lock)

ls-43390d..24us:__rmqueue(rmqueue_bulk)

ls-43390d..25us:__rmqueue_smallest(__rmqueue)

ls-43390d..25us:__mod_zone_page_state(__rmqueue_smallest)ls-43390d..26us:__rmqueue(rmqueue_bulk)

ls-43390d..26us:__rmqueue_smallest(__rmqueue)

ls-43390d..27us:__mod_zone_page_state(__rmqueue_smallest)ls-43390d..27us:__rmqueue(rmqueue_bulk)

ls-43390d..28us:__rmqueue_smallest(__rmqueue)[...]ls-43390d..246us:__rmqueue_smallest(__rmqueue)

ls-43390d..247us:__mod_zone_page_state(__rmqueue_smallest)ls-43390d..247us:__rmqueue(rmqueue_bulk)

ls-43390d..248us:__rmqueue_smallest(__rmqueue)

ls-43390d..248us:__mod_zone_page_state(__rmqueue_smallest)ls-43390d..249us:_spin_unlock(rmqueue_bulk)

ls-43390d..249us:sub_preempt_count(_spin_unlock)

ls-43390d..150us:get_page_from_freelist(__alloc_pages_internal)ls-43390d..251us:trace_hardirqs_on(__alloc_pages_internal)

Herewetraceda50microsecondlatency.Butwealsoseeallthefunctionsthatwerecalledduringthattime.Notethatbyenablingfunctiontracing,weincuranaddedoverhead.Thisoverheadmayextendthelatencytimes.Butnevertheless,thistracehasprovided
someveryhelpfuldebugginginformation.

preemptoff

----------

Whenpreemptionisdisabled,wemaybeabletoreceiveinterruptsbutthetaskcannotbepreemptedandahigherprioritytaskmustwaitforpreemptiontobeenabledagainbeforeitcanpreemptalowerprioritytask.

Thepreemptofftracertracestheplacesthatdisablepreemption.Liketheirqsofftracer,itrecordsthemaximumlatencyforwhichpreemptionwasdisabled.Thecontrolofpreemptofftracerismuchliketheirqsofftracer.

#echopreemptoff>current_tracer

#echolatency-format>trace_options

#echo0>tracing_max_latency

#echo1>tracing_enabled

#ls-ltr

[...]

#echo0>tracing_enabled

#cattrace#tracer:preemptoff#preemptofflatencytracev1.1.5on2.6.26-rc8

--------------------------------------------------------------------latency:29us,#3/3,CPU#0|(M:preemptVP:0,KP:0,SP:0HP:0#P:2)

-----------------

|task:sshd-4261(uid:0nice:0policy:0rt_prio:0)

-----------------=>startedat:do_IRQ

=>endedat:__do_softirq

#_------=>CPU##/_-----=>irqs-off#|/_----=>need-resched#||/_---=>hardirq/softirq#|||/_--=>preempt-depth#||||/#|||||delay#cmdpid
|||||time|caller#\/|||||\|/sshd-42610d.h.0us+:irq_enter(do_IRQ)

sshd-42610d.s.29us:_local_bh_enable(__do_softirq)

sshd-42610d.s130us:trace_preempt_on(__do_softirq)

Thishassomemorechanges.Preemptionwasdisabledwhenaninterruptcamein(noticethe'h'),andwasenabledwhiledoingasoftirq.(noticethe's').Butwealsoseethatinterruptshavebeendisabledwhenenteringthepreemptoffsectionandleavingit
(the'd').Wedonotknowifinterruptswereenabledinthemeantime.

#tracer:preemptoff#preemptofflatencytracev1.1.5on2.6.26-rc8

--------------------------------------------------------------------latency:63us,#87/87,CPU#0|(M:preemptVP:0,KP:0,SP:0HP:0#P:2)

-----------------

|task:sshd-4261(uid:0nice:0policy:0rt_prio:0)

-----------------=>startedat:remove_wait_queue

=>endedat:__do_softirq

#_------=>CPU##/_-----=>irqs-off#|/_----=>need-resched#||/_---=>hardirq/softirq#|||/_--=>preempt-depth#||||/#|||||delay#cmdpid
|||||time|caller#\/|||||\|/sshd-42610d..10us:_spin_lock_irqsave(remove_wait_queue)

sshd-42610d..11us:_spin_unlock_irqrestore(remove_wait_queue)sshd-42610d..12us:do_IRQ(common_interrupt)

sshd-42610d..12us:irq_enter(do_IRQ)

sshd-42610d..12us:idle_cpu(irq_enter)

sshd-42610d..13us:add_preempt_count(irq_enter)

sshd-42610d.h13us:idle_cpu(irq_enter)

sshd-42610d.h.4us:handle_fasteoi_irq(do_IRQ)[...]sshd-42610d.h.12us:add_preempt_count(_spin_lock)

sshd-42610d.h112us:ack_ioapic_quirk_irq(handle_fasteoi_irq)sshd-42610d.h113us:move_native_irq(ack_ioapic_quirk_irq)

sshd-42610d.h113us:_spin_unlock(handle_fasteoi_irq)

sshd-42610d.h114us:sub_preempt_count(_spin_unlock)

sshd-42610d.h114us:irq_exit(do_IRQ)

sshd-42610d.h115us:sub_preempt_count(irq_exit)

sshd-42610d..215us:do_softirq(irq_exit)

sshd-42610d...15us:__do_softirq(do_softirq)

sshd-42610d...16us:__local_bh_disable(__do_softirq)

sshd-42610d...16us+:add_preempt_count(__local_bh_disable)sshd-42610d.s420us:add_preempt_count(__local_bh_disable)

sshd-42610d.s421us:sub_preempt_count(local_bh_enable)

sshd-42610d.s521us:sub_preempt_count(local_bh_enable)[...]sshd-42610d.s641us:add_preempt_count(__local_bh_disable)

sshd-42610d.s642us:sub_preempt_count(local_bh_enable)

sshd-42610d.s742us:sub_preempt_count(local_bh_enable)

sshd-42610d.s543us:add_preempt_count(__local_bh_disable)sshd-42610d.s543us:sub_preempt_count(local_bh_enable_ip)

sshd-42610d.s644us:sub_preempt_count(local_bh_enable_ip)sshd-42610d.s544us:add_preempt_count(__local_bh_disable)

sshd-42610d.s545us:sub_preempt_count(local_bh_enable)[...]sshd-42610d.s.63us:_local_bh_enable(__do_softirq)

sshd-42610d.s164us:trace_preempt_on(__do_softirq)

Theaboveisanexampleofthepreemptofftracewithftrace_enabledset.Hereweseethatinterruptsweredisabledtheentiretime.Theirq_entercodeletsusknowthatweenteredaninterrupt'h'.Beforethat,thefunctionsbeingtracedstillshowthatit
isnotinaninterrupt,butwecanseefromthefunctionsthemselvesthatthisisnotthecase.

Noticethat__do_softirqwhencalleddoesnothaveapreempt_count.Itmayseemthatwemissedapreemptenabling.Whatreallyhappenedisthatthepreemptcountisheldonthethread'sstackandweswitchedtothesoftirqstack(4Kstacksineffect).Thecode
doesnotcopythepreemptcount,butbecauseinterruptsaredisabled,wedonotneedtoworryaboutit.Havingatracerlikethisisgoodforlettingpeopleknowwhatreallyhappensinsidethekernel.

preemptirqsoff

--------------

Knowingthelocationsthathaveinterruptsdisabledorpreemptiondisabledforthelongesttimesishelpful.Butsometimeswewouldliketoknowwheneitherpreemptionand/orinterruptsaredisabled.

Considerthefollowingcode:

local_irq_disable();

call_function_with_irqs_off();

preempt_disable();

call_function_with_irqs_and_preemption_off();

local_irq_enable();

call_function_with_preemption_off();

preempt_enable();

Theirqsofftracerwillrecordthetotallengthofcall_function_with_irqs_off()andcall_function_with_irqs_and_preemption_off().

Thepreemptofftracerwillrecordthetotallengthofcall_function_with_irqs_and_preemption_off()andcall_function_with_preemption_off().

Butneitherwilltracethetimethatinterruptsand/orpreemptionisdisabled.Thistotaltimeisthetimethatwecannotschedule.Torecordthistime,usethepreemptirqsofftracer.

Again,usingthistraceismuchliketheirqsoffandpreemptofftracers.

#echopreemptirqsoff>current_tracer

#echolatency-format>trace_options

#echo0>tracing_max_latency

#echo1>tracing_enabled

#ls-ltr

[...]

#echo0>tracing_enabled

#cattrace#tracer:preemptirqsoff#preemptirqsofflatencytracev1.1.5on2.6.26-rc8

--------------------------------------------------------------------latency:293us,#3/3,CPU#0|(M:preemptVP:0,KP:0,SP:0HP:0#P:2)

-----------------

|task:ls-4860(uid:0nice:0policy:0rt_prio:0)

-----------------=>startedat:apic_timer_interrupt

=>endedat:__do_softirq

#_------=>CPU##/_-----=>irqs-off#|/_----=>need-resched#||/_---=>hardirq/softirq#|||/_--=>preempt-depth#||||/#|||||delay#cmdpid
|||||time|caller#\/|||||\|/ls-48600d...0us!:trace_hardirqs_off_thunk(apic_timer_interrupt)ls-48600d.s.294us:_local_bh_enable(__do_softirq)

ls-48600d.s1294us:trace_preempt_on(__do_softirq)

Thetrace_hardirqs_off_thunkiscalledfromassemblyonx86wheninterruptsaredisabledintheassemblycode.Withoutthefunctiontracing,wedonotknowifinterruptswereenabledwithinthepreemptionpoints.Wedoseethatitstartedwithpreemptionenabled.

Hereisatracewithftrace_enabledset:

#tracer:preemptirqsoff#preemptirqsofflatencytracev1.1.5on2.6.26-rc8

--------------------------------------------------------------------latency:105us,#183/183,CPU#0|(M:preemptVP:0,KP:0,SP:0HP:0#P:2)

-----------------

|task:sshd-4261(uid:0nice:0policy:0rt_prio:0)

-----------------=>startedat:write_chan

=>endedat:__do_softirq

#_------=>CPU##/_-----=>irqs-off#|/_----=>need-resched#||/_---=>hardirq/softirq#|||/_--=>preempt-depth#||||/#|||||delay#cmdpid
|||||time|caller#\/|||||\|/ls-44730.N..0us:preempt_schedule(write_chan)

ls-44730dN.11us:_spin_lock(schedule)

ls-44730dN.12us:add_preempt_count(_spin_lock)

ls-44730d..22us:put_prev_task_fair(schedule)[...]ls-44730d..213us:set_normalized_timespec(ktime_get_ts)

ls-44730d..213us:__switch_to(schedule)sshd-42610d..214us:finish_task_switch(schedule)

sshd-42610d..214us:_spin_unlock_irq(finish_task_switch)

sshd-42610d..115us:add_preempt_count(_spin_lock_irqsave)sshd-42610d..216us:_spin_unlock_irqrestore(hrtick_set)

sshd-42610d..216us:do_IRQ(common_interrupt)

sshd-42610d..217us:irq_enter(do_IRQ)

sshd-42610d..217us:idle_cpu(irq_enter)

sshd-42610d..218us:add_preempt_count(irq_enter)

sshd-42610d.h218us:idle_cpu(irq_enter)

sshd-42610d.h.18us:handle_fasteoi_irq(do_IRQ)

sshd-42610d.h.19us:_spin_lock(handle_fasteoi_irq)

sshd-42610d.h.19us:add_preempt_count(_spin_lock)

sshd-42610d.h120us:_spin_unlock(handle_fasteoi_irq)

sshd-42610d.h120us:sub_preempt_count(_spin_unlock)[...]sshd-42610d.h128us:_spin_unlock(handle_fasteoi_irq)

sshd-42610d.h129us:sub_preempt_count(_spin_unlock)

sshd-42610d.h229us:irq_exit(do_IRQ)

sshd-42610d.h229us:sub_preempt_count(irq_exit)

sshd-42610d..330us:do_softirq(irq_exit)

sshd-42610d...30us:__do_softirq(do_softirq)

sshd-42610d...31us:__local_bh_disable(__do_softirq)

sshd-42610d...31us+:add_preempt_count(__local_bh_disable)sshd-42610d.s434us:add_preempt_count(__local_bh_disable)[...]sshd-42610d.s343us:sub_preempt_count(local_bh_enable_ip)

sshd-42610d.s444us:sub_preempt_count(local_bh_enable_ip)sshd-42610d.s344us:smp_apic_timer_interrupt(apic_timer_interrupt)sshd-42610d.s345us:irq_enter(smp_apic_timer_interrupt)

sshd-42610d.s345us:idle_cpu(irq_enter)

sshd-42610d.s346us:add_preempt_count(irq_enter)

sshd-42610d.H346us:idle_cpu(irq_enter)

sshd-42610d.H347us:hrtimer_interrupt(smp_apic_timer_interrupt)sshd-42610d.H347us:ktime_get(hrtimer_interrupt)[...]sshd-42610d.H381us:tick_program_event(hrtimer_interrupt)

sshd-42610d.H382us:ktime_get(tick_program_event)

sshd-42610d.H382us:ktime_get_ts(ktime_get)

sshd-42610d.H383us:getnstimeofday(ktime_get_ts)

sshd-42610d.H383us:set_normalized_timespec(ktime_get_ts)sshd-42610d.H384us:clockevents_program_event(tick_program_event)sshd-42610d.H384us:lapic_next_event(clockevents_program_event)sshd-42610d.H385us:irq_exit(smp_apic_timer_interrupt)

sshd-42610d.H385us:sub_preempt_count(irq_exit)

sshd-42610d.s486us:sub_preempt_count(irq_exit)

sshd-42610d.s386us:add_preempt_count(__local_bh_disable)[...]sshd-42610d.s198us:sub_preempt_count(net_rx_action)

sshd-42610d.s.99us:add_preempt_count(_spin_lock_irq)

sshd-42610d.s199us+:_spin_unlock_irq(run_timer_softirq)

sshd-42610d.s.104us:_local_bh_enable(__do_softirq)

sshd-42610d.s.104us:sub_preempt_count(_local_bh_enable)

sshd-42610d.s.105us:_local_bh_enable(__do_softirq)

sshd-42610d.s1105us:trace_preempt_on(__do_softirq)

Thisisaveryinterestingtrace.Itstartedwiththepreemptionofthelstask.Weseethatthetaskhadthe"need_resched"bitsetviathe'N'inthetrace.Interruptsweredisabledbeforethespin_lockatthebeginningofthetrace.Weseethataschedule
tookplacetorunsshd.Whentheinterruptswereenabled,wetookaninterrupt.Onreturnfromtheinterrupthandler,thesoftirqran.Wetookanotherinterruptwhilerunningthesoftirqasweseefromthecapital'H'.

wakeup

------

InaReal-Timeenvironmentitisveryimportanttoknowthewakeuptimeittakesforthehighestprioritytaskthatiswokenuptothetimethatitexecutes.Thisisalsoknownas"schedulelatency".IstressthepointthatthisisaboutRTtasks.Itisalso
importanttoknowtheschedulinglatencyofnon-RTtasks,buttheaverageschedulelatencyisbetterfornon-RTtasks.ToolslikeLatencyToparemoreappropriateforsuchmeasurements.

Real-Timeenvironmentsareinterestedintheworstcaselatency.Thatisthelongestlatencyittakesforsomethingtohappen,andnottheaverage.Wecanhaveaveryfastschedulerthatmayonlyhavealargelatencyonceinawhile,butthatwouldnotwork
wellwithReal-Timetasks.ThewakeuptracerwasdesignedtorecordtheworstcasewakeupsofRTtasks.Non-RTtasksarenotrecordedbecausethetraceronlyrecordsoneworstcaseandtracingnon-RTtasksthatareunpredictablewilloverwritetheworstcase
latencyofRTtasks.

SincethistraceronlydealswithRTtasks,wewillrunthisslightlydifferentlythanwedidwiththeprevioustracers.Insteadofperformingan'ls',wewillrun'sleep1'under'chrt'whichchangesthepriorityofthetask.

#echowakeup>current_tracer

#echolatency-format>trace_options

#echo0>tracing_max_latency

#echo1>tracing_enabled

#chrt-f5sleep1

#echo0>tracing_enabled

#cattrace#tracer:wakeup#wakeuplatencytracev1.1.5on2.6.26-rc8

--------------------------------------------------------------------latency:4us,#2/2,CPU#1|(M:preemptVP:0,KP:0,SP:0HP:0#P:2)

-----------------

|task:sleep-4901(uid:0nice:0policy:1rt_prio:5)

-----------------

#_------=>CPU##/_-----=>irqs-off#|/_----=>need-resched#||/_---=>hardirq/softirq#|||/_--=>preempt-depth#||||/#|||||delay#cmdpid
|||||time|caller#\/|||||\|/<idle>-01d.h40us+:try_to_wake_up(wake_up_process)

<idle>-01d..44us:schedule(cpu_idle)

Runningthisonanidlesystem,weseethatitonlytook4microsecondstoperformthetaskswitch.Note,sincethetracemarkerinthescheduleisbeforetheactual"switch",westopthetracingwhentherecordedtaskisabouttoschedulein.Thismaychange
ifweaddanewmarkerattheendofthescheduler.

Noticethattherecordedtaskis'sleep'withthePIDof4901andithasanrt_prioof5.Thispriorityisuser-spacepriorityandnottheinternalkernelpriority.Thepolicyis1forSCHED_FIFOand2forSCHED_RR.

Doingthesamewithchrt-r5andftrace_enabledset.

#tracer:wakeup#wakeuplatencytracev1.1.5on2.6.26-rc8

--------------------------------------------------------------------latency:50us,#60/60,CPU#1|(M:preemptVP:0,KP:0,SP:0HP:0#P:2)

-----------------

|task:sleep-4068(uid:0nice:0policy:2rt_prio:5)

-----------------

#_------=>CPU##/_-----=>irqs-off#|/_----=>need-resched#||/_---=>hardirq/softirq#|||/_--=>preempt-depth#||||/#|||||delay#cmdpid
|||||time|caller#\/|||||\|/ksoftirq-71d.H30us:try_to_wake_up(wake_up_process)ksoftirq-71d.H41us:sub_preempt_count(marker_probe_cb)ksoftirq-71d.H32us:check_preempt_wakeup(try_to_wake_up)ksoftirq-7
1d.H33us:update_curr(check_preempt_wakeup)ksoftirq-71d.H34us:calc_delta_mine(update_curr)ksoftirq-71d.H35us:__resched_task(check_preempt_wakeup)ksoftirq-71d.H36us:task_wake_up_rt(try_to_wake_up)ksoftirq-7
1d.H37us:_spin_unlock_irqrestore(try_to_wake_up)[...]ksoftirq-71d.H217us:irq_exit(smp_apic_timer_interrupt)ksoftirq-71d.H218us:sub_preempt_count(irq_exit)ksoftirq-71d.s319us:sub_preempt_count(irq_exit)ksoftirq-7
1..s220us:rcu_process_callbacks(__do_softirq)[...]ksoftirq-71..s226us:__rcu_process_callbacks(rcu_process_callbacks)ksoftirq-71d.s227us:_local_bh_enable(__do_softirq)ksoftirq-71d.s228us:sub_preempt_count(_local_bh_enable)
ksoftirq-71.N.329us:sub_preempt_count(ksoftirqd)ksoftirq-71.N.230us:_cond_resched(ksoftirqd)ksoftirq-71.N.231us:__cond_resched(_cond_resched)ksoftirq-71.N.232us:add_preempt_count(__cond_resched)ksoftirq-7
1.N.233us:schedule(__cond_resched)ksoftirq-71.N.233us:add_preempt_count(schedule)ksoftirq-71.N.334us:hrtick_clear(schedule)ksoftirq-71dN.335us:_spin_lock(schedule)ksoftirq-71dN.336us:add_preempt_count(_spin_lock)
ksoftirq-71d..437us:put_prev_task_fair(schedule)ksoftirq-71d..438us:update_curr(put_prev_task_fair)[...]ksoftirq-71d..547us:_spin_trylock(tracing_record_cmdline)ksoftirq-71d..548us:add_preempt_count(_spin_trylock)
ksoftirq-71d..649us:_spin_unlock(tracing_record_cmdline)ksoftirq-71d..649us:sub_preempt_count(_spin_unlock)ksoftirq-71d..450us:schedule(__cond_resched)

Theinterruptwentoffwhilerunningksoftirqd.ThistaskrunsatSCHED_OTHER.Whydidnotweseethe'N'setearly?Thismaybeaharmlessbugwithx86_32and4Kstacks.Onx86_32with4Kstacksconfigured,theinterruptandsoftirqrunwiththeirownstack.
Someinformationisheldonthetopofthetask'sstack(need_reschedandpreempt_countarebothstoredthere).ThesettingoftheNEED_RESCHEDbitisdonedirectlytothetask'sstack,butthereadingoftheNEED_RESCHEDisdonebylookingatthecurrent
stack,whichinthiscaseisthestackforthehardinterrupt.ThishidesthefactthatNEED_RESCHEDhasbeenset.Wedonotseethe'N'untilweswitchbacktothetask'sassignedstack.

function

--------

Thistraceristhefunctiontracer.Enablingthefunctiontracercanbedonefromthedebugfilesystem.Makesuretheftrace_enabledisset;otherwisethistracerisanop.

#sysctlkernel.ftrace_enabled=1

#echofunction>current_tracer

#echo1>tracing_enabled

#usleep1

#echo0>tracing_enabled

#cattrace#tracer:function##TASK-PIDCPU#TIMESTAMPFUNCTION#|||||bash-4003[00]123.638713:finish_task_switch<-schedule

bash-4003[00]123.638714:_spin_unlock_irq<-finish_task_switchbash-4003[00]123.638714:sub_preempt_count<-_spin_unlock_irq

bash-4003[00]123.638715:hrtick_set<-schedule

bash-4003[00]123.638715:_spin_lock_irqsave<-hrtick_setbash-4003[00]123.638716:add_preempt_count<-_spin_lock_irqsavebash-4003[00]123.638716:_spin_unlock_irqrestore<-hrtick_set

bash-4003[00]123.638717:sub_preempt_count<-_spin_unlock_irqrestorebash-4003[00]123.638717:hrtick_clear<-hrtick_set

bash-4003[00]123.638718:sub_preempt_count<-schedulebash-4003[00]123.638718:sub_preempt_count<-preempt_schedule

bash-4003[00]123.638719:wait_for_completion<-__stop_machine_runbash-4003[00]123.638719:wait_for_common<-wait_for_completion

bash-4003[00]123.638720:_spin_lock_irq<-wait_for_commonbash-4003[00]123.638720:add_preempt_count<-_spin_lock_irq[...]

Note:functiontracerusesringbufferstostoretheaboveentries.Thenewestdatamayoverwritetheoldestdata.Sometimesusingechotostopthetraceisnotsufficientbecausethetracingcouldhaveoverwrittenthedatathatyouwantedtorecord.Forthis
reason,itissometimesbettertodisabletracingdirectlyfromaprogram.Thisallowsyoutostopthetracingatthepointthatyouhitthepartthatyouareinterestedin.TodisablethetracingdirectlyfromaCprogram,somethinglikefollowingcodesnippet
canbeused:

inttrace_fd;[...]intmain(intargc,char*argv[]){[...]

trace_fd=open(tracing_file("tracing_enabled"),O_WRONLY);

[...]

if(condition_hit()){write(trace_fd,"0",1);}

[...]}

Singlethreadtracing

---------------------

Bywritingintoset_ftrace_pidyoucantraceasinglethread.Forexample:

#catset_ftrace_pidnopid#echo3111>set_ftrace_pid#catset_ftrace_pid3111#echofunction>current_tracer#cattrace|head#tracer:function

#

#TASK-PIDCPU#TIMESTAMPFUNCTION

#|||||yum-updatesd-3111[003]1637.254676:finish_task_switch<-thread_returnyum-updatesd-3111[003]1637.254681:hrtimer_cancel<-schedule_hrtimeout_rangeyum-updatesd-3111[003]1637.254682:hrtimer_try_to_cancel
<-hrtimer_cancelyum-updatesd-3111[003]1637.254683:lock_hrtimer_base<-hrtimer_try_to_cancelyum-updatesd-3111[003]1637.254685:fget_light<-do_sys_poll

yum-updatesd-3111[003]1637.254686:pipe_poll<-do_sys_poll#echo-1>set_ftrace_pid#cattrace|head#tracer:function

#

#TASK-PIDCPU#TIMESTAMPFUNCTION

#|||||

#####CPU3bufferstarted####yum-updatesd-3111[003]1701.957688:free_poll_entry<-poll_freewaityum-updatesd-3111[003]1701.957689:remove_wait_queue<-free_poll_entryyum-updatesd-3111[003]1701.957691:fput<-free_poll_entry

yum-updatesd-3111[003]1701.957692:audit_syscall_exit<-sysret_audityum-updatesd-3111[003]1701.957693:path_put<-audit_syscall_exit

Ifyouwanttotraceafunctionwhenexecuting,youcouldusesomethinglikethissimpleprogram:

#include<stdio.h>#include<stdlib.h>#include<sys/types.h>#include<sys/stat.h>#include<fcntl.h>#include<unistd.h>#include<string.h>

#define_STR(x)#x#defineSTR(x)_STR(x)#defineMAX_PATH256

constchar*find_debugfs(void){staticchardebugfs[MAX_PATH+1];

staticintdebugfs_found;

chartype[100];

FILE*fp;

if(debugfs_found)returndebugfs;

if((fp=fopen("/proc/mounts","r"))==NULL){perror("/proc/mounts");

returnNULL;}

while(fscanf(fp,"%*s%"STR(MAX_PATH)

"s%99s%*s%*d%*d\n",

debugfs,type)==2){if(strcmp(type,"debugfs")==0)break;}

fclose(fp);

if(strcmp(type,"debugfs")!=0){fprintf(stderr,"debugfsnotmounted");

returnNULL;}

strcat(debugfs,"/tracing/");

debugfs_found=1;

returndebugfs;}

constchar*tracing_file(constchar*file_name){staticchartrace_file[MAX_PATH+1];

snprintf(trace_file,MAX_PATH,"%s/%s",find_debugfs(),file_name);returntrace_file;}

intmain(intargc,char**argv){if(argc<1)exit(-1);

if(fork()>0){intfd,ffd;

charline[64];

ints;

ffd=open(tracing_file("current_tracer"),O_WRONLY);

if(ffd<0)exit(-1);write(ffd,"nop",3);

fd=open(tracing_file("set_ftrace_pid"),O_WRONLY);

s=sprintf(line,"%d\n",getpid());

write(fd,line,s);

write(ffd,"function",8);

close(fd);

close(ffd);

execvp(argv[1],argv+1);}

return0;}

hw-branch-tracer(x86only)

---------------------------

Thistracerusesthex86lastbranchtracinghardwarefeaturetocollectabranchtraceonallcpuswithrelativelylowoverhead.

Thetracerusesafixed-sizecircularbufferpercpuandonlytracesring0branches.Thetracefiledumpsthatbufferinthefollowingformat:

#tracer:hw-branch-tracer##CPU#TO<-FROM0scheduler_tick+0xb5/0x1bf<-task_tick_idle+0x5/0x62run_posix_cpu_timers+0x2b/0x72a<-run_posix_cpu_timers+0x25/0x72a0scheduler_tick+0x139/0x1bf<-scheduler_tick+0xed/0x1bf0scheduler_tick+0x17c/0x1bf
<-scheduler_tick+0x148/0x1bf2run_posix_cpu_timers+0x9e/0x72a<-run_posix_cpu_timers+0x5e/0x72a0scheduler_tick+0x1b6/0x1bf<-scheduler_tick+0x1aa/0x1bf

Thetracermaybeusedtodumpthetracefortheoops'ingcpuonakerneloopsintothesystemlog.Toenablethis,ftrace_dump_on_oopsmustbeset.Tosetftrace_dump_on_oops,onecaneitherusethesysctlfunctionorsetitviatheprocsysteminterface.

sysctlkernel.ftrace_dump_on_oops=n

or

echon>/proc/sys/kernel/ftrace_dump_on_oops

Ifn=1,ftracewilldumpbuffersofallCPUs,ifn=2ftracewillonlydumpthebufferoftheCPUthattriggeredtheoops.

Here'sanexampleofsuchadumpafteranullpointerdereferenceinakernelmodule:

[57848.105921]BUG:unabletohandlekernelNULLpointerdereferenceat0000000000000000[57848.106019]IP:[<ffffffffa0000006>]open+0x6/0x14[oops][57848.106019]PGD2354e9067PUD2375e7067PMD0[57848.106019]Oops:0002[#1]SMP[57848.106019]lastsysfs
file:/sys/devices/pci0000:00/0000:00:1e.0/0000:20:05.0/local_cpus[57848.106019]Dumpingftracebuffer:[57848.106019]---------------------------------[...][57848.106019]0chrdev_open+0xe6/0x165<-cdev_put+0x23/0x24[57848.106019]0chrdev_open+0x117/0x165
<-chrdev_open+0xfa/0x165[57848.106019]0chrdev_open+0x120/0x165<-chrdev_open+0x11c/0x165[57848.106019]0chrdev_open+0x134/0x165<-chrdev_open+0x12b/0x165[57848.106019]0open+0x0/0x14[oops]<-chrdev_open+0x144/0x165[57848.106019]
0page_fault+0x0/0x30<-open+0x6/0x14[oops][57848.106019]0error_entry+0x0/0x5b<-page_fault+0x4/0x30[57848.106019]0error_kernelspace+0x0/0x31<-error_entry+0x59/0x5b[57848.106019]0error_sti+0x0/0x1<-error_kernelspace+0x2d/0x31
[57848.106019]0page_fault+0x9/0x30<-error_sti+0x0/0x1[57848.106019]0do_page_fault+0x0/0x881<-page_fault+0x1a/0x30[...][57848.106019]0do_page_fault+0x66b/0x881<-is_prefetch+0x1ee/0x1f2[57848.106019]0do_page_fault+0x6e0/0x881
<-do_page_fault+0x67a/0x881[57848.106019]0oops_begin+0x0/0x96<-do_page_fault+0x6e0/0x881[57848.106019]0trace_hw_branch_oops+0x0/0x2d<-oops_begin+0x9/0x96[...][57848.106019]0ds_suspend_bts+0x2a/0xe3<-ds_suspend_bts+0x1a/0xe3
[57848.106019]---------------------------------[57848.106019]CPU0[57848.106019]Moduleslinkedin:oops[57848.106019]Pid:5542,comm:catTainted:GW2.6.28#23[57848.106019]RIP:0010:[<ffffffffa0000006>][<ffffffffa0000006>]open+0x6/0x14
[oops][57848.106019]RSP:0018:ffff880235457d48EFLAGS:00010246[...]

functiongraphtracer

---------------------------

Thistracerissimilartothefunctiontracerexceptthatitprobesafunctiononitsentryanditsexit.Thisisdonebyusingadynamicallyallocatedstackofreturnaddressesineachtask_struct.Onfunctionentrythetraceroverwritesthereturnaddress
ofeachfunctiontracedtosetacustomprobe.Thustheoriginalreturnaddressisstoredonthestackofreturnaddressinthetask_struct.

Probingonbothendsofafunctionleadstospecialfeaturessuchas:

-measureofafunction'stimeexecution

-havingareliablecallstacktodrawfunctioncallsgraph

Thistracerisusefulinseveralsituations:

-youwanttofindthereasonofastrangekernelbehaviorandneedtoseewhathappensindetailonanyareas(orspecific

ones).

-youareexperiencingweirdlatenciesbutit'sdifficulttofinditsorigin.

-youwanttofindquicklywhichpathistakenbyaspecificfunction

-youjustwanttopeekinsideaworkingkernelandwanttoseewhathappensthere.

#tracer:function_graph##CPUDURATIONFUNCTIONCALLS#|||||||

0)|sys_open(){0)|do_sys_open(){0)|getname(){0)|kmem_cache_alloc(){0)1.382us|__might_sleep();0)2.478us|}0)|strncpy_from_user()
{0)|might_fault(){0)1.389us|__might_sleep();0)2.553us|}0)3.807us|}0)7.876us|}0)|alloc_fd(){0)0.668us|_spin_lock();0)
0.570us|expand_files();0)0.586us|_spin_unlock();

Thereareseveralcolumnsthatcanbedynamicallyenabled/disabled.Youcanuseeverycombinationofoptionsyouwant,dependingonyourneeds.

-Thecpunumberonwhichthefunctionexecutedisdefaultenabled.Itissometimesbettertoonlytraceonecpu(seetracing_cpu_maskfile)oryoumightsometimesseeunordered

functioncallswhilecputracingswitch.

hide:echonofuncgraph-cpu>trace_options

show:echofuncgraph-cpu>trace_options

-Theduration(function'stimeofexecution)isdisplayedontheclosingbracketlineofafunctionoronthesameline

thanthecurrentfunctionincaseofaleafone.Itisdefault

enabled.

hide:echonofuncgraph-duration>trace_options

show:echofuncgraph-duration>trace_options

-Theoverheadfieldprecedesthedurationfieldincaseofreacheddurationthresholds.

hide:echonofuncgraph-overhead>trace_options

show:echofuncgraph-overhead>trace_options

dependson:funcgraph-duration

ie:

0)|up_write(){0)0.646us|_spin_lock_irqsave();0)0.684us|_spin_unlock_irqrestore();0)3.123us|}0)0.548us|fput();0)+58.628us|}

[...]

0)|putname(){0)|kmem_cache_free(){0)0.518us|__phys_addr();0)1.757us|}0)2.861us|}0)!115.305us|}0)!116.402us|}

+meansthatthefunctionexceeded10usecs.

!meansthatthefunctionexceeded100usecs.

-Thetask/pidfielddisplaysthethreadcmdlineandpidwhichexecutedthefunction.Itisdefaultdisabled.

hide:echonofuncgraph-proc>trace_options

show:echofuncgraph-proc>trace_options

ie:

#tracer:function_graph

#

#CPUTASK/PIDDURATIONFUNCTIONCALLS

#|||||||||0)sh-4802||d_free(){0)sh-4802||call_rcu(){0)sh-4802||__call_rcu()
{0)sh-4802|0.616us|rcu_process_gp_end();0)sh-4802|0.586us|check_for_new_grace_period();0)sh-4802|2.899us|}0)sh-4802|4.040
us|}0)sh-4802|5.151us|}0)sh-4802|+49.370us|}

-Theabsolutetimefieldisanabsolutetimestampgivenbythesystemclocksinceitstarted.Asnapshotofthistimeis

givenoneachentry/exitoffunctions

hide:echonofuncgraph-abstime>trace_options

show:echofuncgraph-abstime>trace_options

ie:

#

#TIMECPUDURATIONFUNCTIONCALLS

#||||||||

360.774522|1)0.541us|}

360.774522|1)4.663us|}

360.774523|1)0.541us|__wake_up_bit();

360.774524|1)6.796us|}

360.774524|1)7.952us|}

360.774525|1)9.063us|}

360.774525|1)0.615us|journal_mark_dirty();

360.774527|1)0.578us|__brelse();

360.774528|1)|reiserfs_prepare_for_journal(){

360.774528|1)|unlock_buffer(){

360.774529|1)|wake_up_bit(){

360.774529|1)|bit_waitqueue(){

360.774530|1)0.594us|__phys_addr();

Youcanputsomecommentsonspecificfunctionsbyusingtrace_printk()Forexample,ifyouwanttoputacommentinsidethe__might_sleep()function,youjusthavetoinclude<linux/ftrace.h>andcalltrace_printk()inside__might_sleep()

trace_printk("I'macomment!\n")

willproduce:

1)|__might_sleep(){1)|/*I'macomment!*/1)1.449us|}

Youmightfindotherusefulfeaturesforthistracerinthefollowing"dynamicftrace"sectionsuchastracingonlyspecificfunctionsortasks.

dynamicftrace

--------------

IfCONFIG_DYNAMIC_FTRACEisset,thesystemwillrunwithvirtuallynooverheadwhenfunctiontracingisdisabled.Thewaythisworksisthemcountfunctioncall(placedatthestartofeverykernelfunction,producedbythe-pgswitchingcc),startsofpointing
toasimplereturn.(EnablingFTRACEwillincludethe-pgswitchinthecompilingofthekernel.)

AtcompiletimeeveryCfileobjectisrunthroughtherecordmcount.plscript(locatedinthescriptsdirectory).ThisscriptwillprocesstheCobjectusingobjdumptofindallthelocationsinthe.textsectionthatcallmcount.(Note,onlythe.textsection
isprocessed,sinceprocessingothersectionslike.init.textmaycauseracesduetothosesectionsbeingfreed).

Anewsectioncalled"__mcount_loc"iscreatedthatholdsreferencestoallthemcountcallsitesinthe.textsection.Thissectioniscompiledbackintotheoriginalobject.Thefinallinkerwilladdallthesereferencesintoasingletable.

Onbootup,beforeSMPisinitialized,thedynamicftracecodescansthistableandupdatesallthelocationsintonops.Italsorecordsthelocations,whichareaddedtotheavailable_filter_functionslist.Modulesareprocessedastheyareloadedandbefore
theyareexecuted.Whenamoduleisunloaded,italsoremovesitsfunctionsfromtheftracefunctionlist.Thisisautomaticinthemoduleunloadcode,andthemoduleauthordoesnotneedtoworryaboutit.

Whentracingisenabled,kstop_machineiscalledtopreventraceswiththeCPUSexecutingcodebeingmodified(whichcancausetheCPUtodoundesirablethings),andthenopsarepatchedbacktocalls.Butthistime,theydonotcallmcount(whichisjusta
functionstub).Theynowcallintotheftraceinfrastructure.

Onespecialside-effecttotherecordingofthefunctionsbeingtracedisthatwecannowselectivelychoosewhichfunctionswewishtotraceandwhichoneswewantthemcountcallstoremainasnops.

Twofilesareused,oneforenablingandonefordisablingthetracingofspecifiedfunctions.Theyare:

set_ftrace_filter

and

set_ftrace_notrace

Alistofavailablefunctionsthatyoucanaddtothesefilesislistedin:

available_filter_functions

#catavailable_filter_functionsput_prev_task_idlekmem_cache_createpick_next_task_rtget_online_cpuspick_next_task_fairmutex_lock[...]

IfIamonlyinterestedinsys_nanosleepandhrtimer_interrupt:

#echosys_nanosleephrtimer_interrupt\>set_ftrace_filter#echofunction>current_tracer

#echo1>tracing_enabled

#usleep1

#echo0>tracing_enabled

#cattrace#tracer:ftrace##TASK-PIDCPU#TIMESTAMPFUNCTION#|||||usleep-4134[00]1317.070017:hrtimer_interrupt<-smp_apic_timer_interruptusleep-4134[00]1317.070111:sys_nanosleep<-syscall_call

<idle>-0[00]1317.070115:hrtimer_interrupt<-smp_apic_timer_interrupt

Toseewhichfunctionsarebeingtraced,youcancatthefile:

#catset_ftrace_filterhrtimer_interruptsys_nanosleep

Perhapsthisisnotenough.Thefiltersalsoallowsimplewildcards.Onlythefollowingarecurrentlyavailable

<match>*-willmatchfunctionsthatbeginwith<match>

*<match>-willmatchfunctionsthatendwith<match>

*<match>*-willmatchfunctionsthathave<match>init

Thesearetheonlywildcardswhicharesupported.

<match>*<match>willnotwork.

Note:Itisbettertousequotestoenclosethewildcards,otherwisetheshellmayexpandtheparametersintonames

offilesinthelocaldirectory.

#echo'hrtimer_*'>set_ftrace_filter

Produces:

#tracer:ftrace##TASK-PIDCPU#TIMESTAMPFUNCTION#|||||bash-4003[00]1480.611794:hrtimer_init<-copy_process

bash-4003[00]1480.611941:hrtimer_start<-hrtick_setbash-4003[00]1480.611956:hrtimer_cancel<-hrtick_clear

bash-4003[00]1480.611956:hrtimer_try_to_cancel<-hrtimer_cancel<idle>-0[00]1480.612019:hrtimer_get_next_event<-get_next_timer_interrupt<idle>-0[00]1480.612025:hrtimer_get_next_event<-get_next_timer_interrupt<idle>-0
[00]1480.612032:hrtimer_get_next_event<-get_next_timer_interrupt<idle>-0[00]1480.612037:hrtimer_get_next_event<-get_next_timer_interrupt<idle>-0[00]1480.612382:hrtimer_get_next_event<-get_next_timer_interrupt

Noticethatwelostthesys_nanosleep.

#catset_ftrace_filterhrtimer_run_queueshrtimer_run_pendinghrtimer_inithrtimer_cancelhrtimer_try_to_cancelhrtimer_forwardhrtimer_starthrtimer_reprogramhrtimer_force_reprogramhrtimer_get_next_eventhrtimer_interrupthrtimer_nanosleephrtimer_wakeup
hrtimer_get_remaininghrtimer_get_reshrtimer_init_sleeper

Thisisbecausethe'>'and'>>'actjustliketheydoinbash.Torewritethefilters,use'>'Toappendtothefilters,use'>>'

Toclearoutafiltersothatallfunctionswillberecordedagain:

#echo>set_ftrace_filter

#catset_ftrace_filter

#

Again,nowwewanttoappend.

#echosys_nanosleep>set_ftrace_filter

#catset_ftrace_filtersys_nanosleep#echo'hrtimer_*'>>set_ftrace_filter

#catset_ftrace_filterhrtimer_run_queueshrtimer_run_pendinghrtimer_inithrtimer_cancelhrtimer_try_to_cancelhrtimer_forwardhrtimer_starthrtimer_reprogramhrtimer_force_reprogramhrtimer_get_next_eventhrtimer_interruptsys_nanosleephrtimer_nanosleep
hrtimer_wakeuphrtimer_get_remaininghrtimer_get_reshrtimer_init_sleeper

Theset_ftrace_notracepreventsthosefunctionsfrombeingtraced.

#echo'*preempt*''*lock*'>set_ftrace_notrace

Produces:

#tracer:ftrace##TASK-PIDCPU#TIMESTAMPFUNCTION#|||||bash-4043[01]115.281644:finish_task_switch<-schedule

bash-4043[01]115.281645:hrtick_set<-schedule

bash-4043[01]115.281645:hrtick_clear<-hrtick_set

bash-4043[01]115.281646:wait_for_completion<-__stop_machine_runbash-4043[01]115.281647:wait_for_common<-wait_for_completion

bash-4043[01]115.281647:kthread_stop<-stop_machine_runbash-4043[01]115.281648:init_waitqueue_head<-kthread_stop

bash-4043[01]115.281648:wake_up_process<-kthread_stopbash-4043[01]115.281649:try_to_wake_up<-wake_up_process

Wecanseethatthere'snomorelockorpreempttracing.

Dynamicftracewiththefunctiongraphtracer

---------------------------------------------

Althoughwhathasbeenexplainedaboveconcernsboththefunctiontracerandthefunction-graph-tracer,therearesomespecialfeaturesonlyavailableinthefunction-graphtracer.

Ifyouwanttotraceonlyonefunctionandallofitschildren,youjusthavetoechoitsnameintoset_graph_function:

echo__do_fault>set_graph_function

willproducethefollowing"expanded"traceofthe__do_fault()function:

0)|__do_fault(){0)|filemap_fault(){0)|find_lock_page(){0)0.804us|find_get_page();0)|__might_sleep(){0)1.329us|}0)3.904us|
}0)4.979us|}0)0.653us|_spin_lock();0)0.578us|page_add_file_rmap();0)0.525us|native_set_pte_at();0)0.585us|_spin_unlock();0)|unlock_page(){0)0.541us|page_waitqueue();
0)0.639us|__wake_up_bit();0)2.786us|}0)+14.237us|}0)|__do_fault(){0)|filemap_fault(){0)|find_lock_page(){0)0.698us|find_get_page();0)
|__might_sleep(){0)1.412us|}0)3.950us|}0)5.098us|}0)0.631us|_spin_lock();0)0.571us|page_add_file_rmap();0)0.526us|native_set_pte_at();0)0.586us|_spin_unlock();
0)|unlock_page(){0)0.533us|page_waitqueue();0)0.638us|__wake_up_bit();0)2.793us|}0)+14.012us|}

Youcanalsoexpandseveralfunctionsatonce:

echosys_open>set_graph_function

echosys_close>>set_graph_function

Nowifyouwanttogobacktotraceallfunctionsyoucanclearthisspecialfiltervia:

echo>set_graph_function

Filtercommands

---------------

Afewcommandsaresupportedbytheset_ftrace_filterinterface.Tracecommandshavethefollowingformat:

<function>:<command>:<parameter>

Thefollowingcommandsaresupported:

-modThiscommandenablesfunctionfilteringpermodule.The

parameterdefinesthemodule.Forexample,ifonlythewrite*functionsintheext3modulearedesired,run:

echo'write*:mod:ext3'>set_ftrace_filter

Thiscommandinteractswiththefilterinthesamewayas

filteringbasedonfunctionnames.Thus,addingmorefunctions

inadifferentmoduleisaccomplishedbyappending(>>)tothe

filterfile.Removespecificmodulefunctionsbyprepending

'!':

echo'!writeback*:mod:ext3'>>set_ftrace_filter

-traceon/traceoffThesecommandsturntracingonandoffwhenthespecified

functionsarehit.Theparameterdetermineshowmanytimesthe

tracingsystemisturnedonandoff.Ifunspecified,thereis

nolimit.Forexample,todisabletracingwhenaschedulebug

ishitthefirst5times,run:

echo'__schedule_bug:traceoff:5'>set_ftrace_filter

Thesecommandsarecumulativewhetherornottheyareappended

toset_ftrace_filter.Toremoveacommand,prependitby'!'

anddroptheparameter:

echo'!__schedule_bug:traceoff'>set_ftrace_filter

trace_pipe

----------

Thetrace_pipeoutputsthesamecontentasthetracefile,buttheeffectonthetracingisdifferent.Everyreadfromtrace_pipeisconsumed.Thismeansthatsubsequentreadswillbedifferent.Thetraceislive.

#echofunction>current_tracer

#cattrace_pipe>/tmp/trace.out&[1]4153#echo1>tracing_enabled

#usleep1

#echo0>tracing_enabled

#cattrace#tracer:function##TASK-PIDCPU#TIMESTAMPFUNCTION#|||||

#

#cat/tmp/trace.outbash-4043[00]41.267106:finish_task_switch<-schedule

bash-4043[00]41.267106:hrtick_set<-schedule

bash-4043[00]41.267107:hrtick_clear<-hrtick_set

bash-4043[00]41.267108:wait_for_completion<-__stop_machine_runbash-4043[00]41.267108:wait_for_common<-wait_for_completion

bash-4043[00]41.267109:kthread_stop<-stop_machine_runbash-4043[00]41.267109:init_waitqueue_head<-kthread_stop

bash-4043[00]41.267110:wake_up_process<-kthread_stopbash-4043[00]41.267110:try_to_wake_up<-wake_up_process

bash-4043[00]41.267111:select_task_rq_rt<-try_to_wake_up

Note,readingthetrace_pipefilewillblockuntilmoreinputisadded.Bychangingthetracer,trace_pipewillissueanEOF.Weneededtosetthefunctiontracer_before_we"cat"thetrace_pipefile.

traceentries

-------------

Havingtoomuchornotenoughdatacanbetroublesomeindiagnosinganissueinthekernel.Thefilebuffer_size_kbisusedtomodifythesizeoftheinternaltracebuffers.ThenumberlistedisthenumberofentriesthatcanberecordedperCPU.Toknowthe
fullsize,multiplythenumberofpossibleCPUSwiththenumberofentries.

#catbuffer_size_kb1408(unitskilobytes)

Note,tomodifythis,youmusthavetracingcompletelydisabled.Todothat,echo"nop"intothecurrent_tracer.Ifthecurrent_tracerisnotsetto"nop",anEINVALerrorwillbereturned.

#echonop>current_tracer

#echo10000>buffer_size_kb

#catbuffer_size_kb10000(unitskilobytes)

Thenumberofpageswhichwillbeallocatedislimitedtoapercentageofavailablememory.Allocatingtoomuchwillproduceanerror.

#echo1000000000000>buffer_size_kb

-bash:echo:writeerror:Cannotallocatememory#catbuffer_size_kb85

-----------

Moredetailscanbefoundinthesourcecode,inthekernel/trace/*.cfiles.

http://www.ibm.com/developerworks/cn/linux/l-cn-ftrace/

参考一下这个link也不错。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: