您的位置：首页 > 运维架构 > Linux

linux进程调度之&nbsp;FIFO&nbsp;…

2013-12-19 20:48 246 查看

作者：manuscola.bean@gmail.com

博客地址：bean.blog.chinaunix.net

最近花了10几天的时间，将linux进程调度相关的内核代码看了两遍左右，也看了一些讲述linux进程调度的一些文章，总想写个系列文章，把进程调度全景剖析一遍，但是总是感觉力不逮己，自己都不敢下笔写文章了。算了，还是不难为自己了，就随便写写自己的心得好了。

在用户空间，或者应用编程领域
，Linux提供了一些API或者系统调用来影响Linux的内核调度器，或者是获取内核调度器的信息。比如可以获取或者设置进程的调度策略、优先级，获取CPU时间片大小的信息。

这些接口一旦在应用程序中调用，就像给湖面扔进一颗石子，对内核带来了那些的影响，其实这是我内心很激动很想分享的东西，但是内容好没有组织好，所以本文的主题暂不深入涉及这些系统调用及对应的内核层的代码。

严格地说，对于优先级对于实时进程和普通进程的意义是不一样的。

1
在一定程度上，实时进程优先级高，实时进程存在，就没有普通进程占用CPU的机会，（但是前一篇博文也讲过了，实时组调度出现在内核以后，允许普通进程占用少量的CPU时间,取决于配置)。

2
对于实时进程而言，高优先级的进程存在，低优先级的进程是轮不上的，没机会跑在CPU上，所谓实时进程的调度策略,指的是相同优先级之间的调度策略。如果是FIFO实时进程在占用CPU，除非出现以下事情，否则FIFO一条道跑到黑。

a）FIFO进程良心发现，调用了系统调用sched_yield
自愿让出CPU

b)
更高优先级的进程横空出世，抢占FIFO进程的CPU。有些人觉得很奇怪，怎么FIFO占着CPU，为啥还能有更高优先级的进程出现呢。别忘记，我们是多核多CPU
,如果其他CPU上出现了一个比FIFO优先级高的进程，可能会push到FIFO进程所在的CPU上。

c)FIFO进程停止（TASK_STOPPED or
TASK_TRACED状态）或者被杀死（EXIT_ZOMBIE or EXIT_DEAD状态）

d) FIFO进程执行了阻塞调用并进入睡眠（TASK_INTERRUPTIBLE OR
TASK_UNINTERRUPTIBLE）。

如果是进程的调度策略是时间片轮转RR，那么，除了前面提到的abcd，RR实时进程耗尽自己的时间片后，自动退到对应优先级实时队列的队尾，重新调度。

下面我们就是来探究FIFO策略和RR策略的特点。为了降低理解的难度，我将我们启动的实时进程绑定到同一个核上。

#include<stdio.h>

#include<stdlib.h>

#include<unistd.h>

#include<sys/time.h>

#include<sys/types.h>

#include<sys/sysinfo.h>

#include<time.h>

#define __USE_GNU

#include<sched.h>

#include<ctype.h>

#include<string.h>

#define COUNT 300000

#define MILLION 1000000L

#define NANOSECOND 1000

void test_func()

{

int i = 0;

unsigned
long long result = 0;;

for(i = 0; i<8000 ;i++)

{

result += 2;

}

}

int main(int argc,char* argv[])

{

int i;

struct
timespec sleeptm;

long
interval;

struct
timeval tend,tstart;

struct
tm lcltime = {0};

struct
sched_param param;

int ret = 0;

if(argc != 3)

{

fprintf(stderr,"usage:./test
sched_method sched_priority\n");

return -1;

}

cpu_set_t
mask ;

CPU_ZERO(&mask);

CPU_SET(1,&mask);

if (sched_setaffinity(0, sizeof(mask), &mask) == -1)

{

printf("warning:
could not set CPU affinity, continuing...\n");

}

int sched_method = atoi(argv[1]);

int sched_priority = atoi(argv[2]);

/* if(sched_method > 2 || sched_method < 0)

{

fprintf(stderr,"sched_method
scope [0,2]\n");

return -2;

}

if(sched_priority > 99 || sched_priority < 1)

{

fprintf(stderr,"sched_priority
scope [1,99]\n");

return -3;

}

if(sched_method == 1 || sched_method == 2)*/

{

param.sched_priority = sched_priority;

ret = sched_setscheduler(getpid(),sched_method,¶m);

if(ret)

{

fprintf(stderr,"set
scheduler to %d %d failed %m\n");

return -4;

}

}

int scheduler = sched_getscheduler(getpid());

fprintf(stderr,"the
scheduler of PID(%ld) is %d, priority (%d),BEGIN time is
:%ld\n",

getpid(),scheduler,sched_priority,time(NULL));

sleep(2);

sleeptm.tv_sec = 0;

sleeptm.tv_nsec = NANOSECOND;

for(i = 0;i<COUNT;i++)

{

test_func();

}

interval = MILLION*(tend.tv_sec - tstart.tv_sec)

+(tend.tv_usec-tstart.tv_usec);

fprintf(stderr,"
PID = %d\t priority: %d\tEND TIME is %ld\n",getpid(),sched_priority,time(NULL));

return
0;

}

上面这个程序有几点需要说明的地方

1 为了降低复杂度，绑定到了同一个核上，我做实验的机器是四核（通过cat
/proc/cpuinfo可以看到）

2
sleep（2）,是给其他进程得到调度的机会，否则无法模拟出多个不同优先级的实时进程并行的场景。sleep过后，就没有阻塞性的系统调用了，高优先级的就会占据CPU（FIFO），同等优先级的进程轮转（RR）

struct
sched_param {

/* ... */

int sched_priority;

/* ... */

};

int sched_setscheduler
(pid_t pid,

int
policy,

const
struct sched_param *sp);

sched_setscheduler函数的第二个参数调度方法
：

#define
SCHED_OTHER 0

#define SCHED_FIFO 1

#define SCHED_RR 2

#ifdef __USE_GNU

# define SCHED_BATCH 3

#endif

SCHED_OTHER表示普通进程，对于普通进程，第三个参数sp->sched_priority只能是0

SCHED_FIFO
和SCHED_RR表示实时进程的调度策略，第三个参数的取值范围为[1,99]。

如果sched_setscheduler
优先级设置的值和调度策略不符合的话，会返回失败的。

内核中有这么一段注释：

/*

* Valid
priorities for SCHED_FIFO and SCHED_RR
are

* 1..MAX_USER_RT_PRIO-1, valid
priority for SCHED_NORMAL,

* SCHED_BATCH and SCHED_IDLE is 0.

*/

LINUX系统提供了其他的系统调用来获取不同策略优先级的取值范围：

#include <sched.h>

int sched_get_priority_min (int policy);

int sched_get_priority_max (int policy);

另外需要指出的一点是，应用层和内核层的优先级含义是不同的：

首先说实时进程：实时进程的优先级设置可以通过sched_setsheduler设置，也可以通过sched_setparam设置优先级的大小。

int sched_setparam (pid_t
pid, const struct
sched_param *sp);

在用户层或者应用层，1表示优先级最低，99表示优先级最高。但是在内核中，[0,99]表示的实时进程的优先级，0最高，99最低。[100,139]是普通进程折腾的范围。应用层比较天真率直，就看大小，数字大，则优先级高。ps查看进程的优先级也是如此。有意思的是，应用层实时进程最高优先级的99，在ps看进程优先级的时候，输出的是139.

下面是ps
-C test -o pid,pri,cmd,time,psr 的输出：

PID
PRI CMD

TIME
PSR

6303 139 ./test
1 99 00:00:04
1

虽说本文主要讲的是实时进程，但是需要插句话。对于普通进程，是通过nice系统调用来调整优先级的。从内核角度讲[100,139]是普通进程的优先级的范围，100最高，139最低，默认是120。普通进程的优先级的作用和实时进程不同，普通进程优先级表示的是占的CPU时间。深入linux内核架构中提到，普通优先级越高（100最高，139最低），享受的CPU
time越多，相邻的两个优先级，高一级的进程比低一级的进程多占用10%的CPU，比如内核优先级数值为120的进程要比数值是121的进程多占用10%的CPU。

内核中有一个数组：prio_to_weight[20]表示的是默认优先级120的权重，数值为1024，prio_to_weight[21]表示nice值为1，优先级为121的进程的权重，数值为820。这就到了CFS的原理了

static const int prio_to_weight[40] = {

/* -20 */ 88761, 71755, 56483, 46273, 36291,

/* -15 */ 29154, 23254, 18705, 14949, 11916,

/* -10 */ 9548, 7620, 6100, 4904, 3906,

/* -5 */ 3121, 2501, 1991, 1586, 1277,

/* 0 */ 1024, 820, 655, 526, 423,

/* 5 */ 335, 272, 215, 172, 137,

/* 10 */ 110, 87, 70, 56, 45,

/* 15 */ 36, 29, 23, 18, 15,

};

假如有1台电脑，10个人玩，怎么才公平。

1 约定好时间片，每人玩1小时，玩完后记账，张XX
1小时，谁玩的时间短，谁去玩

2
引入优先级的概念，李四有紧急情况，需要提高他玩电脑的时间，怎么办，玩1个小时，记账半小时，那么同等情况下，李四会比其他人被选中玩电脑的频率要高，就体现了这个优先级的概念。

3
王五也有紧急情况，但是以考察，不如李四的紧急，好吧，玩1个小时，记账45分钟。

4
情况有变化，听说这里有电脑，突然又来了10个人，如果按照每人玩1小时的时间片，排在最后的那哥们早就开始骂人了，怎么办？时间片动态变化，根据人数来确定时间片。人越多，每个人玩的时间越少，防止哥们老捞不着玩，耐心耗尽，开始骂人。

这个记账就是我们prio_to_weight的作用。我就不多说了，prio_to_weight[20]就是基准，玩一小时，记账一小时，数组20以前的值是特权一级，玩1小时记账20分钟之类的享有特权的，数组20之后是倒霉蛋，玩1小时，记账1.5小时之类的倒霉蛋。
CFS这种调度好在大家都能捞着玩。

扯到优先级多说了几句，现在回到正题。我将上面的C程序编译成可执行程序test，然后写了一个脚本comp.sh。

[root@localhost sched]# cat comp.sh

#/bin/sh

./test $1 99 &

usleep 1000;

./test $1 70 &

usleep 1000;

./test $1 70 &

usleep 1000;

./test $1 70 &

usleep 1000;

./test $1 50 &

usleep 1000;

./test $1 30 &

usleep 1000;

./test $1 10 &

因为test进程有sleep
2秒，所以可以给comp.sh启动其他test的机会。可以看到有
99级（最高优先级）的实时进程，3个70级的实时进程，50级，30级，10级的各一个。

对于FIFO而言，一旦sleep过后，高优先级运行，低优先级是没戏运行的，同等优先级的进程，先运行的不运行完，后运行的也没戏。

对于RR而言，高优先级的先运行，同等优先级的进程过家家，你玩完，我玩，我玩完你再玩，每个进程耗费一个时间片的时间。对于Linux，RR时间片是100ms：

#define
DEF_TIMESLICE (100 * HZ / 1000)

下面我们验证：我写了两个观察脚本，来观察实时进程的调度情况：

第一个脚本比较简单，观察进程的CPU
占用的time，用ps工具就可以了：

[root@localhost sched]# cat getpsinfo.sh

#!/bin/sh

for((i = 0; i < 40; i++))

do

ps -C test -o
pid,pri,cmd,time,psr >>psinfo.log 2>&1

sleep 2;

done

第二个脚本比较复杂是systemtap脚本，观察名字为test的进程相关的上下文切换，谁替换了test，或者test替换了谁，同时记录下test进程的退出：

[root@localhost
sched]# cat
cswmon_spec.stp

global time_offset

probe begin { time_offset = gettimeofday_us() }

probe scheduler.ctxswitch {

if(next_task_name == "test" ||prev_task_name == "test")

{

t = gettimeofday_us()

printf("
time_off (� ) s(m)(pri=M)(state=%d)->
s(m)(pri=M)(state=%d)\n",

t-time_offset,

prev_task_name,

prev_pid,

prev_priority,

(prevtsk_state),

next_task_name,

next_pid,

next_priority,

(nexttsk_state))

}

}

probe scheduler.process_exit

{

if(execname() == "test")

printf("task
:%s PID(%d) PRI(%d) EXIT\n",execname(),pid,priority);

}

probe timer.s($1) {

printf("--------------------------------------------------------------\n")

exit();

}

A)
FIFO调度策略的输出：

终端1 ：

stap ./cswmon_spec.stp 70

终端2 ：

./getpsinfo.sh

终端3

./comp.sh 1

输出结果如下：

FIFO 和 RR 调度策略" TITLE="linux进程调度之 FIFO 和 RR 调度策略" />

99优先级跑完了，才轮到70优先级，但是虽说有3个70优先级，但是先跑的那个进程跑完了，第二个优先级为70的才能跑。因为输出结果用代码无法漂亮的展示，所以我截了图，截图又不能把这个输出都截下来，所以我很蛋疼。有需要结果的，我以附件形式附在最后。

看下第二个脚本的输出：

time_off ( 689546 ) test( 6305)(pri= 120)(state=0)->
migration/2( 11)(pri= 0)(state=0)

time_off ( 689977 ) stap( 5895)(pri= 120)(state=0)-> test(
6305)(pri= 120)(state=0)

time_off ( 690067 ) test( 6305)(pri= 29)(state=1)-> stap(
5895)(pri= 120)(state=0)

time_off ( 697899 ) test( 6303)(pri= 120)(state=0)->
migration/2( 11)(pri= 0)(state=0)

time_off ( 698042 ) test( 6307)(pri= 120)(state=0)->
migration/0( 3)(pri= 0)(state=0)

time_off ( 699114 ) stap( 5895)(pri= 120)(state=0)-> test(
6303)(pri= 120)(state=0)

time_off ( 699307 ) test( 6303)(pri= 0)(state=1)-> test(
6307)(pri= 120)(state=0)

time_off ( 699371 ) test( 6307)(pri= 29)(state=1)-> stap(
5895)(pri= 120)(state=0)

time_off ( 699392 ) test( 6309)(pri= 120)(state=0)->
migration/3( 15)(pri= 0)(state=0)

time_off ( 699966 ) events/1( 20)(pri= 120)(state=1)-> test(
6309)(pri= 120)(state=0)

time_off ( 700034 ) test( 6309)(pri= 29)(state=1)-> stap(
5895)(pri= 120)(state=0)

time_off ( 707379 ) test( 6311)(pri= 120)(state=0)->
migration/3( 15)(pri= 0)(state=0)

time_off ( 707587 ) test( 6313)(pri= 120)(state=0)->
migration/0( 3)(pri= 0)(state=0)

time_off ( 712021 ) stap( 5895)(pri= 120)(state=0)-> test(
6311)(pri= 120)(state=0)

time_off ( 712145 ) test( 6311)(pri= 49)(state=1)-> test(
6313)(pri= 120)(state=0)

time_off ( 712252 ) test( 6313)(pri= 69)(state=1)-> stap(
5895)(pri= 120)(state=0)

time_off ( 727057 ) test( 6315)(pri= 120)(state=0)->
migration/0( 3)(pri= 0)(state=0)

time_off ( 727952 ) stap( 5895)(pri= 120)(state=0)-> test(
6315)(pri= 120)(state=0)

time_off ( 728047 ) test( 6315)(pri= 89)(state=1)-> stap(
5895)(pri= 120)(state=0)

time_off ( 2690181 ) stap( 5895)(pri= 120)(state=0)-> test(
6305)(pri= 29)(state=0)

time_off ( 2699316 ) test( 6305)(pri= 29)(state=0)-> test(
6303)(pri= 0)(state=0)

task :test PID(6303) PRI(0) EXIT

time_off (13057854 ) test( 6303)(pri= 0)(state=64)-> watchdog/1(
10)(pri= 0)(state=0)

time_off (13057864 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6305)(pri= 29)(state=0)

time_off (15333340 ) test( 6305)(pri= 29)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off (15333354 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6305)(pri= 29)(state=0)

time_off (18743409 ) test( 6305)(pri= 29)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off (18743422 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6305)(pri= 29)(state=0)

time_off (22154757 ) test( 6305)(pri= 29)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off (22154771 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6305)(pri= 29)(state=0)

task :test PID(6305) PRI(29) EXIT

time_off (22466855 ) test( 6305)(pri= 29)(state=64)-> test(
6307)(pri= 29)(state=0)

time_off (25563548 ) test( 6307)(pri= 29)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off (25563566 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6307)(pri= 29)(state=0)

time_off (28973602 ) test( 6307)(pri= 29)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off (28973616 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6307)(pri= 29)(state=0)

task :test PID(6307) PRI(29) EXIT

time_off (31846121 ) test( 6307)(pri= 29)(state=64)-> test(
6309)(pri= 29)(state=0)

time_off (32383671 ) test( 6309)(pri= 29)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off (32383683 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6309)(pri= 29)(state=0)

time_off (35793735 ) test( 6309)(pri= 29)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off (35793747 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6309)(pri= 29)(state=0)

time_off (39203797 ) test( 6309)(pri= 29)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off (39203809 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6309)(pri= 29)(state=0)

task :test PID(6309) PRI(29) EXIT

time_off (41200440 ) test( 6309)(pri= 29)(state=64)-> test(
6311)(pri= 49)(state=0)

time_off (42613866 ) test( 6311)(pri= 49)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off (42613898 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6311)(pri= 49)(state=0)

time_off (46024070 ) test( 6311)(pri= 49)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off (46024082 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6311)(pri= 49)(state=0)

time_off (49434004 ) test( 6311)(pri= 49)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off (49434017 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6311)(pri= 49)(state=0)

task :test PID(6311) PRI(49) EXIT

可以清楚的可到，同样是70优先级（内核态是29），6305退出以前，6307根本就捞不着跑。同样6307退出一样，6309根本就捞不着跑。这就是FIFO。

B) RR的情况

终端1 ：

stap ./cswmon_spec.stp 70

终端2 ：

./getpsinfo.sh

终端3

./comp.sh 1

FIFO 和 RR 调度策略" TITLE="linux进程调度之 FIFO 和 RR 调度策略" />

实时优先级是70的三个进程齐头并进。再看第二个脚本的输出：

time_off ( 4188015 ) test( 6428)(pri= 0)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off ( 4188025 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6428)(pri= 0)(state=0)

time_off ( 7612014 ) test( 6428)(pri= 0)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off ( 7612024 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6428)(pri= 0)(state=0)

task :test PID(6428) PRI(0) EXIT

time_off (10679062 ) test( 6428)(pri= 0)(state=64)-> test(
6430)(pri= 29)(state=0)

time_off (10964413 ) test( 6430)(pri= 29)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off (10964422 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6430)(pri= 29)(state=0)

time_off (11709024 ) test( 6430)(pri= 29)(state=0)-> test(
6432)(pri= 29)(state=0)

time_off (12736030 ) test( 6432)(pri= 29)(state=0)-> test(
6434)(pri= 29)(state=0)

time_off (13779022 ) test( 6434)(pri= 29)(state=0)-> test(
6430)(pri= 29)(state=0)

time_off (13879021 ) test( 6430)(pri= 29)(state=0)-> test(
6432)(pri= 29)(state=0)

time_off (13984075 ) test( 6432)(pri= 29)(state=0)-> test(
6434)(pri= 29)(state=0)

time_off (14084020 ) test( 6434)(pri= 29)(state=0)-> test(
6430)(pri= 29)(state=0)

time_off (14184023 ) test( 6430)(pri= 29)(state=0)-> test(
6432)(pri= 29)(state=0)

time_off (14284024 ) test( 6432)(pri= 29)(state=0)-> test(
6434)(pri= 29)(state=0)

time_off (14374486 ) test( 6434)(pri= 29)(state=0)-> watchdog/1(
10)(pri= 0)(state=0)

time_off (14374502 ) watchdog/1( 10)(pri= 0)(state=1)-> test(
6434)(pri= 29)(state=0)

time_off (14384097 ) test( 6434)(pri= 29)(state=0)-> test(
6430)(pri= 29)(state=0)

time_off (14484066 ) test( 6430)(pri= 29)(state=0)-> test(
6432)(pri= 29)(state=0)

time_off (14584023 ) test( 6432)(pri= 29)(state=0)-> test(
6434)(pri= 29)(state=0)

time_off (14684020 ) test( 6434)(pri= 29)(state=0)-> test(
6430)(pri= 29)(state=0)

time_off (14786032 ) test( 6430)(pri= 29)(state=0)-> test(
6432)(pri= 29)(state=0)

time_off (14886020 ) test( 6432)(pri= 29)(state=0)-> test(
6434)(pri= 29)(state=0)

time_off (14986026 )
test( 6434)(pri= 29)(state=0)-> test( 6430)(pri=
29)(state=0)

time_off (15089023 )
test( 6430)(pri= 29)(state=0)-> test( 6432)(pri=
29)(state=0)

time_off (15192030 )
test( 6432)(pri= 29)(state=0)-> test( 6434)(pri=
29)(state=0)

time_off (15292026 )
test( 6434)(pri= 29)(state=0)-> test( 6430)(pri=
29)(state=0)

time_off (15396085 )
test( 6430)(pri= 29)(state=0)-> test( 6432)(pri=
29)(state=0)

time_off (15496022 ) test( 6432)(pri= 29)(state=0)-> test(
6434)(pri= 29)(state=0)

time_off (15596027 ) test( 6434)(pri= 29)(state=0)-> test(
6430)(pri= 29)(state=0)

time_off (15696153 ) test( 6430)(pri= 29)(state=0)-> test(
6432)(pri= 29)(state=0)

time_off (15796022 ) test( 6432)(pri= 29)(state=0)-> test(
6434)(pri= 29)(state=0)

用户态实时优先级为99，内核态优先级为0的进程6428退出后，3个用户态实时优先级为70的进程6430，6432，6434你方唱罢我登场，每个人都"唱"多久呢？看相邻2条记录的时间差，基本都在100ms左右，这就是时间片。

后记：如果放开绑定到一个CPU的限制，同时加大实时进程的个数，多个实时进程在CPU之间PULL和PUSH，是更复杂的情况，呵呵，希望抛砖引玉，能有人模拟下这种情况。

附件为测试代码及输出：

FIFO 和 RR 调度策略" TITLE="linux进程调度之 FIFO 和 RR 调度策略" /> study_sched.txt

参考文献

1 深入linux
内核架构

2 linux system
program

3 systemtap
example

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航

linux进程调度之&amp;nbsp;FIFO&amp;nbsp;…

linux进程调度之 FIFO …