您的位置:首页 > 编程语言

核中汇编写的字符串函数代码分析

2012-02-05 13:36 267 查看
***************************************************************
开始啃用汇编写的字符串函数:
***************************************************************
---------------------------------------------------------------
_I386_STRING_H_宏
---------------------------------------------------------------
include/asm-i386/string.h

#ifndef _I386_STRING_H_
#define _I386_STRING_H_
当包括了该汇编写的字符串处理函数的头文件后,就定义这个宏予以说明。
---------------------------------------------------------------
__KERNEL__宏
---------------------------------------------------------------
include/asm-i386/string.h

#ifdef __KERNEL__
#include <linux/config.h>
注意:
只有定义的了__KERNEL__宏才会包含config.h头文件。
/*
* On a 486 or Pentium, we are better off not using the
* byte string operations. But on a 386 or a PPro the
* byte string ops are faster than doing it by hand
* (MUCH faster on a Pentium).
*/
下面这段注释很重要,建议看看:
/*
* This string-include defines all string functions as inline
* functions. Use gcc. It also assumes ds=es=data space, this *should be normal. Most of the string-functions are rather *heavily hand-optimized,
* see especially strsep,strstr,str[c]spn. They should work, but are not
* very easy to understand. Everything is done entirely within the register
* set, making the functions fast and clean. String instructions have been
* used through-out, making for "slightly" unclear code :-)
*
* NO Copyright (C) 1991, 1992 Linus Torvalds,
* consider these trivial functions to be PD.
*/

/* AK: in fact I bet it would be better to move this stuff all out of line. */
---------------------------------------------------------------
__HAVE_ARCH_STRCPY strcpy()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRCPY
static inline char * strcpy(char * dest,const char *src)
{
int d0, d1, d2;
__asm__ __volatile__(
"1:\tlodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b"
: "=&S" (d0), "=&D" (d1), "=&a" (d2)
:"0" (src),"1" (dest)
: "memory");
return dest;
}

分析:
1.改写指令更清楚点:
1: ---> 1:
lodsb ---> mov al,ds:[si]
inc si
stosb ---> mov es:[di],al
inc di
testb al,al ---> test al,al
jne 1 ---> jne 1
明显该循环以0结束,当读到最后一个为0的字节后,该循环终止。

2.参数分析:
S: si/esi
&: 一般情况下,gcc会把输入操作数和输出操作数分配在同一个寄存器中,因为它假设在输出产生之前所有的输入都被消耗掉了。在输出操作数之前加上"&",可以保证输出操作数不会覆盖掉输入,即gcc将为此输出操作数分配一个输入操作数还没使用的寄存器,除非特殊声明(如用数字0-9,见下面)

0-9: 指定一个操作数,它既作输入,又作输出,而且输入操作数和输出操作数占据同一个位置(寄存器)。数字标志只能出现在输入中,指出与第I个输出操作数占据同一个位置。

int d0, d1, d2;
"=&S" (d0), "=&D" (d1), "=&a" (d2)
"0" (src),"1" (dest)
代码分析:
该输入操作数src和dst是既用作为输入操作数,又用作输出操作数的。在最开始时,src,dest作为整个函数的入口参数。将src,dest这两个char*型指针送入si/esi,di/edi中。在"0"与"1"的作用下,src与d0占据同一个寄存器si/esi,dst与d1占据同一个寄存器di/edi,所以d0,d1将分别从si/esi,di/edi中取出src,dest存入其中的函数入口参数,从而实现了将参数转移到函数局部变量上来。在函数的执行中si/esi,di/edi寄存器发生了变化。最后函数执行完毕返回时。由于src,dest前面指定的"0"和"1"说明了src,dest是既用作为输入操作数,又用作输出操作数的。且又分别与第0,1个输出操作数d0,d1占据同一个寄存器si/esi,di/edi。且又在"&"的保护下,明确指明输出操作数不能覆盖输入操作数,所以src,dest分别存入si/esi,di/edi中作为输出。

D: di/edi
a: ax/eax
"memory": 这是register-modified部分。说明内存修改不可预测,禁止编译器将其值缓存于寄存器中。

3.指令分析:
lodsb: == mov al,[si]
inc si / dec si
stosb: == mov es:[di],al
inc di / dec di
testb: == test oprd1,oprd2
把oprd1 & oprd2指令执行后,设置标志ZF,PF,SF.

---------------------------------------------------------------
__HAVE_ARCH_STRNCPY strncpy()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRNCPY
static inline char * strncpy(char * dest,const char *src,size_t count)
{
int d0, d1, d2, d3;
__asm__ __volatile__(
"1:\tdecl %2\n\t"
"js 2f\n\t"
"lodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b\n\t"
"rep\n\t"
"stosb\n"
"2:"
: "=&S" (d0), "=&D" (d1), "=&c" (d2), "=&a" (d3)
:"0" (src),"1" (dest),"2" (count)
: "memory");
return dest;
}

指令重排:
1: decl ecx ===> 1: dec cx
js 2 ===> js 2
lodsb ===> mov al,ds:[si]
inc si / dec si
stosb ===> mov es:[di],al
inc di /dec si
testb al,al ===> test al,al
jne 1 ===> jne 1

rep ===> rep
stosb ===> mov es:[di],al
inc di /dec si
2: ===> 2:

分析:
对这段代码的分析分3种情况:
若内存中为: abcde\0,
1)要求复制3个字符:
(1)初始值CX == 3
然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0
3-->2: copy a
2-->1: copy b
1-->0: copy c
0-->-1 js 2

2)要求复制5个字符:
(1)初始值CX == 5
然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0
5-->4: copy a
4-->3: copy b
3-->2: copy c
2-->1: copy d
1-->0: copy e
0-->-1 js 2
(2)所以复制5个字符: 复制5个字符:5个字符.

3)要求复制6个字符:
(1)初始值CX == 6
然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0
6-->5: copy a
5-->4: copy b
4-->3: copy c
3-->2: copy d
2-->1: copy e
1-->0: copy \0
test al,al ===> al == \0 ZF == 1成立.
jne 1 ===> 不会跳转到1

继续往下执行:此时CX == 0,al == \0
rep: 判断CX是否为0,而cx == 0,就结束循环
(2)所以复制6个字符: 复制6个字符:5个字符+一个'\0'.

4)要求复制10个字符:
初始值CX == 10
然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0
10-->9: copy a
9-->8: copy b
8-->7: copy c
7-->6: copy d
6-->5: copy e
5-->4: copy \0
test al,al ===> al == \0 ZF == 1成立.
jne 1 ===> 不会跳转到1

继续往下执行:此时CX == 4,al == \0
rep : CX==4,CX!=0,(CX=CX-1)==3,继续往下执行
copy al == \0
重复循环:
rep : CX==3,CX!=0,(CX=CX-1)==2,继续往下执行
copy al == \0
重复循环:
rep : CX==2,CX!=0,(CX=CX-1)==1,继续往下执行
copy al == \0
重复循环:
rep : CX==1,CX!=0,(CX=CX-1)==0,继续往下执行
copy al == \0
重复循环:rep: cx==0,就结束循环
(2)所以复制10个字符,先复制6个字符:5个字符+一个'\0',再填充4个'\0'

5)要求复制0个字符:
(1)初始值CX == 0
0-->-1 js 2
(2)所以复制了0个字符。

6)要求复制-1个字符:
(1)初始值CX == -1
-1-->-2 js 2
(2)所以复制了0个字符。
注意:
static inline char * strncpy(char * dest,const char *src,size_t count),该函数中的count是送往cx/ecx中去了,而ecx最大为32位故对有符号数最多复制2G-1个字节,即字符串不能超过(2G-1)B。
当时产生疑问,当CX<=0时,都是不复制,为何不干脆用个无符号数,这样可以扩大到4G。请看下一个函数就解决了。因为当要把两个字符串串联起来时,也是用ECX作为计数器的,而ECX为32位,最大表示范围为4G-1,所以这两个字符串的长度就各分了一半为2G-1.
rep指令说明:
重复其后面的串操作指令动作,每一次重复都先判断CX是否为0,如为0就结束循环,否则CX的值减1。
类似于loop指令,但loop指令是先把CX的值减1,后再来判断是否为0。
注意在重复过程中的减一操作,不会影响各标志。
---------------------------------------------------------------
strcat()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRCAT
static inline char * strcat(char * dest,const char * src)
{
int d0, d1, d2, d3;
__asm__ __volatile__(
"repne\n\t"
"scasb\n\t"
"decl %1\n"
"1:\tlodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b"
: "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3)
: "0" (src), "1" (dest), "2" (0), "3" (0xffffffffu)
:"memory");
return dest;
}
指令重排:
repne ===> while(ECX != 0 && ZF != 1)
scasb ===> {
if((al-es:[edi])==0)
ZF = 1;
edi++;
ECX--;
}

decl %1 ===> dec edi
1: ===> 1:
lodsb ===> mov al, ds:[esi]
inc esi
stosb ===> mov es:[edi], al
inc edi
testb %%al,%%al ===> test al, al
jne 1 ===> jne 1

参数初始值分析:
: "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3)
: "0" (src), "1" (dest), "2" (0), "3" (0xffffffffu)
src ==> si/esi 此处为: esi
dest ==> di/edi 此处为: edi
0 ==> ax/eax 此处为: ax
0xffffffffu ===> ecx 此处为: ecx
所以,esi,edi指向两个字符串的起始位置;而ax==0;ecx==0xffffffffu

一般情况分析:
初始值:
esi--->'abc\0' (src)
edi--->'123\0' (dest)
al == 0
ecx == 0xffffffffu
while(ECX != 0 && ZF != 1)
{
if((al-es:[edi])==0)
ZF = 1;
edi++;
ECX--;
}
在edi所指向的字符串中一直找到以'\0'结束的地方。然后,edi指向'\0'字节的下一个字节,ECX--;再就循环结束。此时edi=edi+4;ECX=ECX-4。

说明:可见要么在es:[edi]所指向的字符串中找到为'\0'的字符,从而能结束循环。要么该字符串大于或等于0xffffffff(2G-1B)(不计结尾处的'\0'),使得ECX减为0,从而结束循环。

dec edi
edi = edi - 1;edi就指向es:[edi]所指向的字符串中的'\0'结束处字符。

此时寄存器的值为:
esi--->'abc\0' (src)
edi--->'123\0'中的为'\0'结尾处字符 (dest)
al == 0
ecx == 0xffffffffbu

1:
mov al, ds:[esi]
inc esi
mov es:[edi], al
inc edi
test al, al
jne 1
将ds:[esi]所指向的字符串复制到es:[edi]所指向的字符串的结尾处,从es:[edi]所指向字符串的'\0'处开始。该'\0'被覆盖。

esi--->'abc\0?'中的'?'处. (src)
edi--->'123abc\0?'中的最后为'?'结尾处字符 (dest)
al == 0 ,注意这个0是从esi所指向的字符串中取出的结尾字符,而非初始化的0

功能:strcat(char * dest,const char * src),将src所指向的字符串复制到dest所指向的字符串的后面,将dest的'\0'覆盖,dest-src串成一个字符串后,再将src的'\0'复制过来使dest-src串结的字符串结束。

算法过程:
1.先扫描dest所指向的字符串,找到其的为'\0'处;
2.再从src所指向的字符串中一一将src所指向的字符串的各个字节复制到dest以'\0'为起始处。一直复制到src所指向的字符串的最后一个'\0',将这个'\0'复制完后。就结束程序。
可见,该函数要求src,dest所向的字符串要以'\0'结束。

特殊情况1:
初始值:
esi--->'abc\0' (src)
edi--->'123456789... ...YX' 该字符串>=0xffffffff (dest)
设edi指向es这个段的开始处,为0基址。
即:edi[0]=='1',edi[0xffffffff]=='X',由于edi只有32位,表示范围为0X0--->0xffffffff,共4G个字符。所以就算该字符串有多于4G的字符,esi将无法引用,所以该edi所指向的字符串到edi[0xffffffff]=='X'止。字符再多,edi再++,edi又变为了0。
esi的分析也同此。
al == 0
ecx == 0xffffffffu
while(ECX != 0 && ZF != 1)
{
if((al-es:[edi])==0)
ZF = 1;
edi++;
ECX--;
}
循环体执行0xffffffff次
由于edi所指向的字符串>=0xffffffff,则在上面的寻找edi所指向的字符串的'\0'结束符时候,就会使ECX == 0,从而结束循环,此时edi指向(0xffffffff)处的字节。(不考虑段越界)
出循环时,ECX == 0,edi == 0xffffffff。

dec edi
edi = edi - 1;edi == 0xffffffff-1,即:edi[0xffffffff-1]=='Y'。

此时寄存器的值为:
esi--->'abc\0' (src)
edi--->'123456......YX',edi==0xffffffff-1,edi就指向edi[0xffffffff-1]=='Y'(即:0xffffffff-1)处的字节 (dest)
al == 0
ecx == 0x00000000u

1:
mov al, ds:[esi]
inc esi
mov es:[edi], al
inc edi
test al, al
jne 1
将ds:[esi]所指向的字符串'abc\0'中的esi[0]=='a'复制到es:[edi]==es:edi[0xffffffff-1]=='Y'处。该es:[0xffffffff-1]=='Y'的字节'Y'被覆盖为'a'。即:esi[0]=='a'--->edi[0xffffffff-1]=='Y'
edi--->'123456......aX'。
这时,esi++,esi[1]=='b';edi++,edi[0xffffffff]=='X'。

再从ds:[esi]中复制下一个esi[1]=='b',到edi[0xffffffff]=='X'
edi--->'123.....ab',edi++,edi==0x00000000,就指向edi[0]=='1'处的字节
esi++,esi[2]=='c'.esi--->'abc\0?'中的'c'处, (src)

再从esi[2]=='c',复制到edi[0x00000000]=='1'处。
esi++,esi[3]=='\0',esi--->'abc\0?'中的'\0'处. (src)
edi--->'c23.....ab',edi++,edi==0x00000001,就指向edi[0x00000001]=='2'处的字节

再从esi[3]=='\0',复制到edi[0x00000001]=='2'处。
esi++,esi[4]=='?',esi--->'abc\0?'中的'?'处. (src)
edi--->'c\03.....ab',edi++,edi==0x00000002,就指向edi[0x00000002]=='3'处的字节。

所以合并后的字符串为"c\0".

与此类似,当src中的字符等于4G时,情况同上;而当src,dest均等于4G时,情况也同上。
只要src,dest中的字符之和不大于4G-1,留一个给'\0',就OK!

当src,dest中有一个或多个为空时,情况简单:
当dest为空,而src不为空:将src所指向的字符串连同'\0'复制到dest中去!
当src为空,而dest不为空:dest不动,只将src所指的'\0',复制并覆盖dest中的最后一个'\0'!
当src为空,而dest为空:只将src所指的'\0',复制并覆盖dest中那个'\0'!

参考资料:
S:si/esi
D:di/edi
a:ax/eax
c:cx/ecx
&: 一般情况下,gcc会把输入操作数和输出操作数分配在同一个寄存器中,因为它假设在输出产生之前所有的输入都被消耗掉了。在输出操作数之前加上"&",可以保证输出操作数不会覆盖掉输入,即gcc将为此输出操作数分配一个输入操作数还没使用的寄存器,除非特殊声明(如用数字0-9,见下面)

0-9: 指定一个操作数,它既作输入,又作输出,而且输入操作数和输出操作数占据同一个位置(寄存器)。数字标志只能出现在输入中,指出与第I个输出操作数占据同一个位置。
---------------------------------------------------------------
strncat()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRNCAT
static inline char * strncat(char * dest,const char * src,size_t count)
{
int d0, d1, d2, d3;
__asm__ __volatile__(
"repne\n\t"
"scasb\n\t"
"decl %1\n\t"
"movl %8,%3\n"
"1:\tdecl %3\n\t"
"js 2f\n\t"
"lodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b\n"
"2:\txorl %2,%2\n\t"
"stosb"
: "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3)
: "0" (src),"1" (dest),"2" (0),"3" (0xffffffffu), "g" (count)
: "memory");
return dest;
}
指令重排:
repne ===> while(ecx != 0 && ZF != 1)
scasb ===> {
if((al-es:[edi])==0)
ZF = 1;
edi++;
ecx--;
}
decl %1 ===> decl edi
movl %8,%3 ===> movl count,ecx
1: ===> 1:
decl %3 ===> decl ecx
js 2 ===> js 2
lodsb ===> mov al,ds:[esi]
inc esi
stosb ===> mov es:[edi],al
inc edi
testb %%al,%%al ===> test al,al
jne 1 ===> jne 1
2: ===> 2:
xorl %2,%2 ===> xor eax,eax
stosb ===> mov es:[edi],al
inc edi
参数初始值分析:
: "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3)
: "0" (src),"1" (dest),"2" (0),"3" (0xffffffffu), "g" (count)
esi: esi = src
edi: edi = dest
eax: eax = 0
ecx: ecx = 0xffffffff
"g": 让编译器决定如何装入它。

代码分析:
while(ecx != 0 && ZF != 1)
{
if((al-es:[edi])==0)
ZF = 1;
edi++;
ecx--;
}
decl edi
在es:[edi]所指向的字符串中寻找'\0'处。然后回调edi指向该'\0'。
当该字符串在4G-1个字节中时,以'\0'正常结束。而当该字符串等于4G时,以ecx==0结束循环,edi回调后指向edi[0xffffffff-1]处。而字符串大于4G则不可能。

movl count,ecx
1:
decl ecx
js 2
mov al,ds:[esi]
inc esi
mov es:[edi],al
inc edi
test al,al
jne 1
2:
xor eax,eax
mov es:[edi],al
inc edi

1:表示开始复制esi所指向的字符串到edi中去。
2:表示复制结束后,在未尾再加个'\0'。
分情况讨论:
1)若count数大于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串连同'\0'复制过了后,结束1:循环,在2:中再在'\0'的后面再复制一个'\0',再edi++,结束程序。

2)若count数小于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串中只复制count个后,ecx将减为-1后,由js 2跳出1:,在2:中接着再在后面复制一个'\0',再edi++,结束程序。

3)若count等于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串中复制count个后,ecx将减为0后,再在开始处ecx--,ecx == -1, 由js 2跳出1:,在2:中接着再在后面复制一个'\0',再edi++,结束程序。

4)若count为负数,在开始处ecx--,ecx == 负数, 由js 2跳出1:,在2:中接着再在后面复制一个'\0',即给edi所指向的字符串的那个'\0'再用'\0'重写一遍'\0',再edi++,结束程序。

尽管可以复制4G个字节,由于count为有符号数,则最多复制2G-2(除掉'\0')个字节。这显然是假设es:[edi]这个字符串最大为2G而来的,因为作者也不知道es:[edi]所指向的字符串有多长,虽然大部分不可能有2G,只有点点大,但作者却是作了最一般化的处理。
---------------------------------------------------------------
__HAVE_ARCH_STRCMP strcmp()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRCMP
static inline int strcmp(const char * cs,const char * ct)
{
int d0, d1;
register int __res;
__asm__ __volatile__(
"1:\tlodsb\n\t"
"scasb\n\t"
"jne 2f\n\t"
"testb %%al,%%al\n\t"
"jne 1b\n\t"
"xorl %%eax,%%eax\n\t"
"jmp 3f\n"
"2:\tsbbl %%eax,%%eax\n\t"
"orb $1,%%al\n"
"3:"
:"=a" (__res), "=&S" (d0), "=&D" (d1)
:"1" (cs),"2" (ct)
:"memory");
return __res;
}

初始值分析:
ax/eax:register int __res;
si/esi:const char* cs;
di/edi:const char* ct;
ZF == 0

指令重排:
1: lodsb ===> 1: mov al,ds:[esi]
inc esi
scasb ===> if((al-es:[edi])==0)
ZF = 1;
edi++;
jne 2 ===> jne 2;
testb %%al,%%al ===> testb al,al
jne 1 ===> jne 1
xorl %%eax,%%eax ===> xorl eax,eax
jmp 3 ===> jmp 3
2: sbbl %%eax,%%eax ===> 2: sbbl eax,eax
orb $1,%%al ===> orb al ,1
3: ===> 3:

1)代码剖析:
这是比较ds:[esi]和es:[edi]两个字符串是否相等。这两个字符串当以'\0'结束。函数返回值存放在eax中。将ds:[esi]中的每个字符送往al中,再与es:[edi]中的相应的各个字符进行比较,相同就置位ZF=1,然后测试al该字符是否为'\0',如果不是则继续比较下一个字符;如果是'\0',则就清eax为0,结束比较函数,该eax就为函数的返回值。

2)情况:
1.ds:[esi]和es:[edi]两个字符串是相等:同上,eax返回0
2.ds:[esi]和es:[edi]两个字符串不相等:
(1)ds:[esi]的字符串ASCII小于es:[edi]的ASCII
ds:[esi]=="abc\0"
es:[edi]=="xyz\0"
if((al-es:[edi])==0) ===>if( ('a'-'x')==0 )
ZF = 1; 条件不成立; CF == 1
edi++; edi++; edi指向'y'
jne 2 ; jne 2
2: sbbl eax,eax eax = eax-eax-CF=-1=0xffffffff
orb al ,1 al = 0xff

结论:
cs所指的字符串中第一个不同的字符的ASCII<ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0xffffffff==-1

(2)ds:[esi]的字符串ASCII大于es:[edi]的ASCII
ds:[esi]=="xyz\0"
es:[edi]=="abc\0"
if((al-es:[edi])==0) ===>if( ('x'-'a')==0 )
ZF = 1; 条件不成立; CF == 0
edi++; edi++; edi指向'y'
jne 2 ; jne 2
2: sbbl eax,eax eax = eax-eax-CF=0
orb al ,1 al = 0|1=1=0x00000001
输出: eax==0x00000001
结论:
cs所指的字符串第一个不同的字符的ASCII>ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0x00000001==1

(3)当其中一个字符串是另一个字符串的子字符串时:
ds:[esi]=="abc\0"
es:[edi]=="abc123\0"
当比较到'\0'-'1'时,结束循环,返回-1.
而是这种情况时候 :
ds:[esi]=="abc123\0"
es:[edi]=="abc\0"
当比较到'1'-'\0'时,结束循环,返回1.

(4)若其中一个为无限长的字符串,另一个为有限长的字符串时:
则要么在其中的一个位置不同,跳出来同上面的分析;要么一个相当于为另一个的子字符串,分析同上。
可见,只要一个字符串符合以'\0'结束的规则,另一个字符串就算没有'\0'结束,也能正常终止函数。

(5)两个字符串均为无限长的字符串:
若两者在中间某处不等,就终止跳出,分析同上。
若两者完全相等且又无限长,则就地直比较下去。esi,edi将递增到0xffffffff,然后又回到0x00000000。若两字符串是从0x00000000开始的话,就又重复比较下去,一个死循环。若两字符串是从中间某处开始,这个内存中的0x00000000开始处或其后面有不同的字符,就会终止函数。
---------------------------------------------------------------
__HAVE_ARCH_STRNCMP strncmp()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRNCMP
static inline int strncmp(const char * cs,const char * ct,size_t count)
{
register int __res;
int d0, d1, d2;
__asm__ __volatile__(
"1:\tdecl %3\n\t"
"js 2f\n\t"
"lodsb\n\t"
"scasb\n\t"
"jne 3f\n\t"
"testb %%al,%%al\n\t"
"jne 1b\n"
"2:\txorl %%eax,%%eax\n\t"
"jmp 4f\n"
"3:\tsbbl %%eax,%%eax\n\t"
"orb $1,%%al\n"
"4:"
:"=a" (__res), "=&S" (d0), "=&D" (d1), "=&c" (d2)
:"1" (cs),"2" (ct),"3" (count)
:"memory");
return __res;
}
初始值:
ax/eax:__res
si/esi:const char * cs
di/edi:const char * ct
cx/ecx:count

指令重排:
1: decl %3 ===> 1: decl ecx
js 2 ===> js 2
lodsb ===> mov al,ds:[esi]
inc esi
scasb ===> if((al-es:[edi])==0)
ZF = 1;
edi++;
jne 3 ===> jne 3
testb %%al,%%al ===> testb al,al
jne 1 ===> jne 1
2: xorl %%eax,%%eax ===> 2: xorl eax,eax
jmp 4 ===> jmp 4
3: sbbl %%eax,%%eax ===> 3: sbbl eax,eax
orb $1,%%al ===> orb 1,al
4: ===> 4:

此函数分析同上:
1)当指定的要比较的字符个数小于两个字符串长度时:
a:两字符串相同:ecx变为-1,由js 2出循环,再由xorl eax,eax将eax清0,作为函数的返回值返回。
b:两字符串不相同:由jne 3跳出来:
b-1:当cs所指的字符串第一个不同的字符的ASCII>ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0x00000001==1;
b-2:当cs所指的字符串中第一个不同的字符的ASCII<ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0xffffffff==-1

2)当指定的要比较的字符个数count等于两个字符串长度时:
a:两者相等时:
由testb al,al跳出循环,再由xorl eax,eax,将eax清0,返回这个0,结束函数。
b:两者不相等时:
同上分析。

3)当指定的要比较的字符个数count大于两个字符串时:
a:两者相等时:
比较到'\0'时,由testb al,al跳出循环,再由xorl eax,eax,将eax清0,返回这个0,结束函数。
b:两者不相等时:
同上分析。

4)当指定的要比较的字符个数count<=0时:
程序流程如下:
根本就不比较,直接返回0,结束函数。
1: decl %3 ===> 1: decl ecx
js 2 ===> js 2
... ...
2: xorl %%eax,%%eax ===> 2: xorl eax,eax
jmp 4 ===> jmp 4
... ...
4: ===> 4:

---------------------------------------------------------------
__HAVE_ARCH_STRCHR strchr()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRCHR
static inline char * strchr(const char * s, int c)
{
int d0;
register char * __res;
__asm__ __volatile__(
"movb %%al,%%ah\n"
"1:\tlodsb\n\t"
"cmpb %%ah,%%al\n\t"
"je 2f\n\t"
"testb %%al,%%al\n\t"
"jne 1b\n\t"
"movl $1,%1\n"
"2:\tmovl %1,%0\n\t"
"decl %0"
:"=a" (__res), "=&S" (d0)
:"1" (s),"0" (c)
:"memory");
return __res;
}

初始值:
ax/eax:int c
si/esi:const char *s

指令重排:
movb %%al,%%ah ===> movl al,ah
1: lodsb ===> 1: mov al,ds:[esi]
inc esi
cmpb %%ah,%%al ===> cmpb ah,al
je 2 ===> je 2
testb %%al,%%al ===> testb al,al
jne 1 ===> jne 1
movl $1,%1 ===> movl 1,esi
2: movl %1,%0 ===> 2: movl esi,eax
decl %0 ===> decl eax

功能:
ds:[esi]所指向的字符串以'\0'结束,在其中从前往后寻找c字符。如果找到,就返回该字符所在字符串中的位置。如果没找到,就返回0。

改写成C语言:
al == 要找寻的字符c;
esi == 该字符串的起始偏移地址;
int eax;
char ah;
ah = al;
1:
al = *(ds*16 + esi);
esi++;
if( al == ah )
goto 2;
if( al != 0 )
goto 1;
esi = 1;
2:
eax = esi;
eax--;
return eax;

极端情况:
如果ds:[esi]所指向的字符串不以'\0'结束的话,esi一个劲的++,直到变到0xffffffff,然后又变为0x00000000,又从头开始寻找,如果开头及到ds:[esi]处都找不到该字符c,或是也没有'\0'时,就陷入一个死循环。
---------------------------------------------------------------
__HAVE_ARCH_STRRCHR strrchr()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRRCHR
static inline char * strrchr(const char * s, int c)
{
int d0, d1;
register char * __res;
__asm__ __volatile__(
"movb %%al,%%ah\n"
"1:\tlodsb\n\t"
"cmpb %%ah,%%al\n\t"
"jne 2f\n\t"
"leal -1(%%esi),%0\n"
"2:\ttestb %%al,%%al\n\t"
"jne 1b"
:"=g" (__res), "=&S" (d0), "=&a" (d1)
:"0" (0),"1" (s),"2" (c)
:"memory");
return __res;
}

初始值分析:
__res : 0
si/esi : const char * s
ax/eax : c

指令重排:
movb %%al,%%ah ===> movb al,ah
1: lodsb ===> 1: mov al,ds:[esi]
inc esi
cmpb %%ah,%%al ===> cmpb ah,al
jne 2 ===> jne 2
leal -1(%%esi),%0 ===> leal [esi-1],__res(g)
2: testb %%al,%%al ===> 2: testb al,al
jne 1 ===> jne 1
本函数分析类似上面的strchr()。只不过是找到在const char *s所指向的字符串c出现的最后的位置。找到了,返回其所在地址;没找到,返回0。分析类似上面的strchr(),不再重复。
strrchr - Find the last occurrence of a character in a string.

如果s为空指针,则后果无法预料。
---------------------------------------------------------------
__HAVE_ARCH_STRLEN strlen()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRLEN
static inline size_t strlen(const char * s)
{
int d0;
register int __res;
__asm__ __volatile__(
"repne\n\t"
"scasb\n\t"
"notl %0\n\t"
"decl %0"
:"=c" (__res), "=&D" (d0)
:"1" (s),"a" (0), "0" (0xffffffffu)
:"memory");
return __res;
}

参数初始值分析:
di/edi:const char * s
ax/eax:0
cx/ecx:0xffffffff
size_t ecx = 0xffffffff;
ZF = 0;
char * edi = s;
指令重排: eax = 0;
repne ===> while(ecx != 0 && ZF == 0)
scasb ===> {
if((al-es:[edi])==0)
ZF = 1;
edi++;
ecx--;
}
notl %0 ===> ecx = !ecx;
decl %0 ===> ecx--;

此处函数主要是ecx = !ecx,由于ecx是从0xffffffff递减下来的。记住:递减计数和递增计数是一样的计数,只要在最后,取个反,就让两者相互转化了。在递减计数或递增计数过程中多计数了的值,在最后取反后,要(转化后的数--)。

至于各种情况分析,很简单,同前,无须多说。
而对于极端情况分析,edi++,ecx--到0xfffffffff--->0x00000000,情况同前。

参考:
typedef unsigned int __kernel_size_t;
typedef __kernel_size_t size_t;
---------------------------------------------------------------
__memcpy()
---------------------------------------------------------------
include/asm-i386/string.h

static inline void * __memcpy(void * to, const void * from, size_t n)
{
int d0, d1, d2;
__asm__ __volatile__(
"rep ; movsl\n\t"
"movl %4,%%ecx\n\t"
"andl $3,%%ecx\n\t"
#if 1 /* want to pay 2 byte penalty for a chance to skip microcoded rep? */
"jz 1f\n\t"
#endif
"rep ; movsb\n\t"
"1:"
: "=&c" (d0), "=&D" (d1), "=&S" (d2)
: "0" (n/4), "g" (n), "1" ((long) to), "2" ((long) from)
: "memory");
return (to);
}

参数初始值:
cx/ecx:n/4
di/edi:to
si/esi:from

指令重排: ecx = n/4;
rep ===> while( ecx-- != 0 )
movsl ===> (long)ds:[esi] = (long)es:[edi];
movl %4,%%ecx ===> ecx = n;
andl $3,%%ecx ===> ZF = ecx & 0x00000003
#if 1
jz 1 ===> if(ZF==0) goto 1;
#endif
rep ===> while( ecx-- != 0 )
movsb ===> (char)ds:[esi] = (char)es:[edi];
1: ===> 1:

分析:
1.先进行4B为单位的复制:
ecx = n/4;然后就开始复制。
2.求出ecx = ecx % 4;对不足4B的字节进行复制。
ZF = ecx & 0x00000003;
以上为一般情况分析。

3.如果 0< n <4:
则ecx = n/4 == 0;
if( ecx-- !=0 )条件不成立,不进行4B单位的复制。直接进行以字节为单位的复制。

4.如果n = 0:
两个if条件均不满足,根本就不复制。

5.如果n < 0:
函数依然工作,只是牵涉到补码了,后果未知。

如果0<n<4

参考:
typedef unsigned int __kernel_size_t;
typedef __kernel_size_t size_t;
---------------------------------------------------------------
__constant_memcpy()
---------------------------------------------------------------
include/asm-i386/string.h

/*
* This looks ugly, but the compiler can optimize it totally,
* as the count is constant.
*/
static inline void * __constant_memcpy(void * to, const void * from, size_t n)
{
long esi, edi;
if (!n) return to;
#if 1 /* want to do small copies with non-string ops? */
switch (n)
{
case 1: *(char*)to = *(char*)from; return to;
case 2: *(short*)to = *(short*)from; return to;
case 4: *(int*)to = *(int*)from; return to;
#if 1 /* including those doable with two moves? */
case 3: *(short*)to = *(short*)from;
*((char*)to+2) = *((char*)from+2); return to;
case 5: *(int*)to = *(int*)from;
*((char*)to+4) = *((char*)from+4); return to;
case 6: *(int*)to = *(int*)from;
*((short*)to+2) = *((short*)from+2); return to;
case 8: *(int*)to = *(int*)from;
*((int*)to+1) = *((int*)from+1); return to;
#endif/* 1 */
}/* switch */
#endif/* 1 */
esi = (long) from;
edi = (long) to;
if (n >= 5*4)
{
/* large block: use rep prefix */
int ecx;
__asm__ __volatile__(
"rep ; movsl"
: "=&c" (ecx), "=&D" (edi), "=&S" (esi)
: "0" (n/4), "1" (edi),"2" (esi)
: "memory"
);
}/* if */

else
{
/* small block: don't clobber ecx + smaller code */
if (n >= 4*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");

if (n >= 3*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");

if (n >= 2*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");

if (n >= 1*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
}/* else */

switch (n % 4)
{
/* tail */
case 0: return to;

case 1: __asm__ __volatile__(
"movsb"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
return to;

case 2: __asm__ __volatile__(
"movsw"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
return to;

default: __asm__ __volatile__(
"movsw\n\tmovsb"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
return to;
}/* switch */
}

代码分析:
1.对1-8,(一包括7)个字节的复制,采用不同类型的变量进行复制:
#if 1 /* want to do small copies with non-string ops? */
switch (n)
{
case 1: *(char*)to = *(char*)from; return to;
case 2: *(short*)to = *(short*)from; return to;
case 4: *(int*)to = *(int*)from; return to;
#if 1 /* including those doable with two moves? */
case 3: *(short*)to = *(short*)from;
*((char*)to+2) = *((char*)from+2); return to;
case 5: *(int*)to = *(int*)from;
*((char*)to+4) = *((char*)from+4); return to;
case 6: *(int*)to = *(int*)from;
*((short*)to+2) = *((short*)from+2); return to;
case 8: *(int*)to = *(int*)from;
*((int*)to+1) = *((int*)from+1); return to;
#endif/* 1 */
}/* switch */
#endif/* 1 */
当要复制的字节数为:1-8个之间时。执行以上这段程序。当字节数为:
1个:用char *
2个:用short *
4个:用int*

2.复制的字节数在[20,>20],[16,19],[12,15],[8,11],[4,7]:
if (n >= 5*4) //当要复制的字节数在[20,>20]时:
{
/* large block: use rep prefix */
int ecx;
__asm__ __volatile__(
"rep ; movsl"
: "=&c" (ecx), "=&D" (edi), "=&S" (esi)
: "0" (n/4), "1" (edi),"2" (esi)
: "memory"
);
}/* if */

分析: esi = (long) from;
edi = (long) to;
ecx = n/4;
rep ===> if( ecx-- != 0 )
movsl ===> {
(unsigned long)es:[edi] = ds:[esi];
}
然后就转入下一个switch{}结构体中执行:
switch (n % 4)
{
/* tail */
case 0: return to;

case 1: __asm__ __volatile__(
"movsb"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
return to;

case 2: __asm__ __volatile__(
"movsw"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
return to;

default: __asm__ __volatile__(
"movsw\n\tmovsb"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
return to;
}/* switch */
代码简单,不再啰嗦。就是再将剩下的不足4B的字节复制过去。
default是表示,n%4 == 3,先复制一个字,再复制一个字节,共3B。
--------------------------------------------------------------
else //当要复制的字节数在 4<= n <=19时:
{
/* small block: don't clobber ecx + smaller code */
//当要复制的字节数在[16,19]时:
if (n >= 4*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");

//当要复制的字节数在[12,15]时:
if (n >= 3*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");

//当要复制的字节数在[8,11]时:
if (n >= 2*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");

//当要复制的字节数在[4,7]时:
if (n >= 1*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
}/* else */

分析:
???: ecx初始值没指定???ecx = n/4这才对啊!
其实这些代码合并成一个:
if( n >- 1*4 )//7,[9,19]
__asm__ __volatile__(
"rep; movsl\t\n"
:"=&D"(edi),"=&S"(esi),"=C"
:"0"(edi),"1"(esi),"2"(n/4)
:"memory");

注意:
__constant_memcpy()与__memcpy()很相同,参数个数和类型一样,同时功能作用也一样。
---------------------------------------------------------------
__constant_memcpy3d()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_MEMCPY
#ifdef CONFIG_X86_USE_3DNOW/* 对下面的__constant_memcpy3d()
__memcpy3d(),memcpy()*/
#include <asm/mmx.h>
/*
* This CPU favours 3DNow strongly (eg AMD Athlon)
*/
static inline void * __constant_memcpy3d(void * to, const void * from, size_t len)
{
if (len < 512)
return __constant_memcpy(to, from, len);
return _mmx_memcpy(to, from, len);
}
????_mmx_memcpy()函数找不到,只好罢手!!!
---------------------------------------------------------------
__memcpy3d()
---------------------------------------------------------------
include/asm-i386/string.h

static __inline__ void *__memcpy3d(void *to, const void *from, size_t len)
{
if (len < 512)
return __memcpy(to, from, len);
return _mmx_memcpy(to, from, len);
}
????_mmx_memcpy()函数找不到,只好罢手!!!
---------------------------------------------------------------
memcpy()
---------------------------------------------------------------
include/asm-i386/string.h

#define memcpy(t, f, n) \
(__builtin_constant_p(n) ? \
__constant_memcpy3d((t),(f),(n)) : \
__memcpy3d((t),(f),(n)))
#else/* CONFIG_X86_USE_3DNOW */
/*
* No 3D Now!
*/
#define memcpy(t, f, n) \
(__builtin_constant_p(n) ? \
__constant_memcpy((t),(f),(n)) : \
__memcpy((t),(f),(n)))
#endif/* CONFIG_X86_USE_3DNOW */

int __builtin_constant_p(exp)学习:
You can use the built-in function __builtin_constant_p to determine if a value is known to be constant at compile-time and hence that GCC can perform constantfolding on expressions involving that value. The argument of the function is the value to test. The function returns the integer 1 if the argument is known to be a compiletime constant and 0 if it is not known to be a compile-time constant. A return of 0 does not indicate that the value is not a constant, but merely that GCC cannot prove it is a constant with the specified value of the ‘-O’ option.
You would typically use this function in an embedded application where memory was a critical resource. If you have some complex calculation, you may want it to be folded if it involves constants, but need to call a function if it does not. For example:

#define Scale_Value(X) \
(__builtin_constant_p (X) \
? ((X) * SCALE + OFFSET) : Scale (X))

You may use this built-in function in either a macro or an inline function. However, if you use it in an inlined function and pass an argument of the function as the argument to the built-in, GCC will never return 1 when you call the inline function with a string constant or compound literal and will not return 1 when you pass a constant numeric value to the inline function unless you specify the ‘-O’ option.

使用__builtin_constant_p()要和gcc中的-O选项配合使用。

You may also use __builtin_constant_p in initializers for static data. For instance,you can write
static const int table[] = {
__builtin_constant_p (EXPRESSION) ? (EXPRESSION) : -1,
/* . . . */
};
This is an acceptable initializer even if EXPRESSION is not a constant expression.
GCC must be more conservative about evaluating the built-in in this case, because it has no opportunity to perform optimization.Previous versions of GCC did not accept this built-in in data initializers. The earliest version where it is completely safe is 3.0.1.

---------------------------------------------------------------
__HAVE_ARCH_MEMMOVE
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_MEMMOVE
void *memmove(void * dest,const void * src, size_t n);
memmove()延用string.c中的函数。

#define memcmp __builtin_memcmp
---------------------------------------------------------------
__HAVE_ARCH_MEMCHR memchr()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_MEMCHR
static inline void * memchr(const void * cs,int c,size_t count)
{
int d0;
register void * __res;
if (!count) return NULL;
__asm__ __volatile__(
"repne\n\t"
"scasb\n\t"
"je 1f\n\t"
"movl $1,%0\n"
"1:\tdecl %0"
:"=D" (__res), "=&c" (d0)
:"a" (c),"0" (cs),"1" (count)
:"memory");
return __res;
}

功能:cs指定内存的起始位置,count指定查找的个数,c指定要查找的内容。在以cs指定的内存为查找的起始位置,以cs+count为终止位置来查找内容c。找到就返回所找到的位置;没找到就返回0。

参数初始值:
ax/eax: c
di/edi: const void * cs
cx/ecx: count
ZF = 0;
ax = c;
edi = cs;
ecx = count;
指令重排:
repne ===> while( ecx-- != 0 && ZF == 0)
{
scasb ===> if((al-es:[edi++])==0)
ZF = 1;
}
je 1 ===> if(ZF == 1) goto 1;
movl $1,%0 ===> edi = 1;
1: ===> 1:
decl %0 ===> edi--;
return edi;
返回值:如果找到了c,就返回c所在的位置,如果没找到,就返回0。
一般情况代码简单,就此住手。

特殊情况:
1.若ecx==0:则两个if条件均不满足,直接返回0,结束程序。
2.若ecx为0xffffffff巨大的数:要么在其中能找到能与c相匹配的数,返回其位置;要么找不到,当ecx--变为0时,(当ecx==0时,跳出循环时,ecx还要再--又变为0xffffffff),并返回0。
3.此处无负数,故ecx<0一情况无须多虑。由于是内存操作函数,连'\0'也可以进入比较范围。
---------------------------------------------------------------
__memset_generic()
---------------------------------------------------------------
include/asm-i386/string.h

static inline void * __memset_generic(void * s, char c,size_t count)
{
int d0, d1;
__asm__ __volatile__(
"rep\n\t"
"stosb"
: "=&c" (d0), "=&D" (d1)
:"a" (c),"1" (s),"0" (count)
:"memory");
return s;
}
ax = c;
edi = s;
ecx = count;
rep ====> while( ecx !=0 )
{
stosb ====> es:[edi] = al;
}
return s;
---------------------------------------------------------------
__constant_count_memset()
---------------------------------------------------------------
include/asm-i386/string.h

/* we might want to write optimized versions of these later */
#define __constant_count_memset(s,c,count) __memset_generic((s),(c),(count))
---------------------------------------------------------------
__constant_c_memset()
---------------------------------------------------------------
include/asm-i386/string.h

/*
* memset(x,0,y) is a reasonably common thing to do, so we want to fill
* things 32 bits at a time even when we don't know the size of the
* area at compile-time..
*/
static inline void * __constant_c_memset(void * s, unsigned long c, size_t count)
{
int d0, d1;
__asm__ __volatile__(
"rep ; stosl\n\t"
"testb $2,%b3\n\t"
"je 1f\n\t"
"stosw\n"
"1:\ttestb $1,%b3\n\t"
"je 2f\n\t"
"stosb\n"
"2:"
:"=&c" (d0), "=&D" (d1)
:"a" (c), "q" (count), "0" (count/4), "1" ((long) s)
:"memory");
return (s);
}
参数初始值分析:
ax/eax: c
cx/ecx: count/4
di/edi: void *s

指令重排:
rep ====> while( ecx-- != 0 )
{
stosl ====> (long)es:[edi] = eax;
edi += 4;
}
testb $2,%b3 ====> if( (0x02 & (char)count) == 0 )
je 1 ====> goto 1;
stosw ====> (short)es:[edi] = ax;
edi += 2;
1: testb $1,%b3 ====> 1: if( (0x01 & (char)count) == 0)
je 2 ====> goto 2;
stosb ====> (char)es:[edi] = al;
2: ====> 2:
分析:
先以4B为单位进行复制字节。完成后,再分别测试倒数第2位,最后一位是否为1,从而判断是否还剩3,2,1,0个字节。若还剩3B,则复制一个字后,还剩1B;若还剩2B,则复制一个字后,还剩0B.与后面还剩2,0B的情况一样。

特殊情况:
若count==0,则while,if条件均不满足,跳出循环。
---------------------------------------------------------------
__HAVE_ARCH_STRNLEN strnlen()
---------------------------------------------------------------
include/asm-i386/string.h

/* Added by Gertjan van Wingerde to make minix and sysv module work */
#define __HAVE_ARCH_STRNLEN
static inline size_t strnlen(const char * s, size_t count)
{
int d0;
register int __res;
__asm__ __volatile__(
"movl %2,%0\n\t"
"jmp 2f\n"
"1:\tcmpb $0,(%0)\n\t"
"je 3f\n\t"
"incl %0\n"
"2:\tdecl %1\n\t"
"cmpl $-1,%1\n\t"
"jne 1b\n"
"3:\tsubl %2,%0"
:"=a" (__res), "=&d" (d0)
:"c" (s),"1" (count)
:"memory");
return __res;
}
/* end of additional stuff */

参数初始值分析:
cx/ecx: const char * s
dx/edx: count
ax/eax: __res

指令重排:
size_t edx;
edx = count;
char * eax,ecx;
ecx = s;

movl %2,%0 ====> eax = s; //ecx = eax = s;
jmp 2 ====> goto 2;

1: cmpb $0,(%0) ====> 1: if( ((char)(ds:[eax]))==0 )
je 3 ====> goto 3;
incl %0 ====> eax++;

2: decl %1 ====> 2: edx--;
cmpl $-1,%1 ====> if( (0xffffffff & edx) != 0)
jne 1 ====> goto 1;

3: subl %2,%0 ====> 3: eax -= ecx;
return eax;
各种情况分析:
1.字符串的长度(不含'\0') < count:
s==>"abcd\0?"
count == 5: eax已经指向'\0',但还尚未比较之。edx==1,经过edx--后变为edx==0,从而结束函数。再经过eax-=ecx;后,eax==4,为字符串的长度(不含'\0')作为函数返回值。

count == 6: edx==1,尚未变为0,但eax=='\0',且经过if条件的比较后,跳出循环,eax==4,为字符串的长度(不含'\0')作为函数返回值。

2.字符串的长度(不含'\0') == count:
s==>"abcd\0?"
count == 4: count总共比较3次,eax最后指向'd'(但尚未比较),eax-=ecx后,eax==3,为count-1的值,也即循环的次数。

3.字符串的长度(不含'\0') > count:
s==>"abcd\0?"
count == 3: 共循环2次后,count变为0,从而结束循环。此时比较了两个字符'a'和'b',eax指向'c',但尚未比较。eax-=ecx后,eax=2,为count-1,也就是所循环的次数。

4.字符串的长度(不含'\0')== 0:
s==>"\0?"
count == 4: 返回eax==0。

5.count == 1
s==>"abcd\0?"
count == 1: 返回eax==0。

6.count == 0
s==>"abcd\0?" : edx--后,edx变为0xffffffff,要么当edx又减为0时,终止循环,eax当为0,共加了0xffffffff次,又回到原来的值;要么找到为'\0'处,此时返回字符串的长度(不含'\0')。

功能分析:
s指定一个字符串的首地址,count指定一个长度。对该字符串进行扫描,若字符串的总长度(不含'\0')小于count,就返回该字符串的总长度(不含'\0');若字符串的总长度(不含'\0')>= count,就返回count-1;若字符串的总长度(不含'\0')== 0或count==1就返回0。若count==0则情况未知。

---------------------------------------------------------------
__HAVE_ARCH_STRSTR strstr()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRSTR
extern char *strstr(const char *cs, const char *ct);
此处当是引用string.c中的strstr()函数。
---------------------------------------------------------------
__constant_c_and_count_memset()
---------------------------------------------------------------
include/asm-i386/string.h
/*
* This looks horribly ugly, but the compiler can optimize it totally,
* as we by now know that both pattern and count is constant..
*/
static inline void * __constant_c_and_count_memset(void * s, unsigned long pattern, size_t count)
{
switch (count)
{
case 0:
return s;
case 1:
*(unsigned char *)s = pattern;
return s;
case 2:
*(unsigned short *)s = pattern;
return s;
case 3:
*(unsigned short *)s = pattern;
*(2+(unsigned char *)s) = pattern;
return s;
case 4:
*(unsigned long *)s = pattern;
return s;
}
#define COMMON(x) \
__asm__ __volatile__( \
"rep ; stosl" \
x \
: "=&c" (d0), "=&D" (d1) \
: "a" (pattern),"0" (count/4),"1" ((long) s) \
: "memory")
{
int d0, d1;
switch (count % 4)
{
case 0: COMMON(""); return s;
case 1: COMMON("\n\tstosb"); return s;
case 2: COMMON("\n\tstosw"); return s;
default: COMMON("\n\tstosw\n\tstosb"); return s;
}
}

#undef COMMON
}
分析:
1.count == [0,4] :
switch (count)
{
case 0:
return s;
case 1:
*(unsigned char *)s = pattern;
return s;
case 2:
*(unsigned short *)s = pattern;
return s;
case 3:
*(unsigned short *)s = pattern;
*(2+(unsigned char *)s) = pattern;
return s;
case 4:
*(unsigned long *)s = pattern;
return s;
}

2.count > 4 :
#define COMMON(x) \
__asm__ __volatile__( \
"rep ; stosl" \
x \
: "=&c" (d0), "=&D" (d1) \
: "a" (pattern),"0" (count/4),"1" ((long) s) \
: "memory")
{
int d0, d1;
switch (count % 4)
{
case 0: COMMON(""); return s;
case 1: COMMON("\n\tstosb"); return s;
case 2: COMMON("\n\tstosw"); return s;
default: COMMON("\n\tstosw\n\tstosb"); return s;
}
}

#undef COMMON

a):注意这种在函数内部使用宏的方法:
1)先用#define定义宏;
2)再用一对{}括住函数体;
3)再在后面用#undef取消所定义的的宏;

b):#define COMMON(x) \
__asm__ __volatile__( \
"rep ; stosl" \
x \
: "=&c" (d0), "=&D" (d1) \
: "a" (pattern),"0" (count/4),"1" ((long) s) \
: "memory")

参数初始值:
ax/eax: pattern
cx/ecx: count/4
di/edi: s

指令重排:
COMMON("")展开为:
eax = pattern;
edi = s;
ecx = count/4;
rep ===> while( ecx-- != 0 )
{
stosl ===> es:[edi] = eax;
edi += 4;
}
return s;

COMMON("\n\tstosb")展开为:
eax = pattern;
edi = s;
ecx = count/4;
rep ===> while( ecx-- != 0 )
{
stosl ===> es:[edi] = eax;
edi += 4;
}
x ===> stosb ===> es:[edi] = al;
edi += 1;
return s;

COMMON("\n\tstosw")展开为:
eax = pattern;
edi = s;
ecx = count/4;
rep ===> while( ecx-- != 0 )
{
stosl ===> es:[edi] = eax;
edi += 4;
}
x ===> stosw ===> es:[edi] = ax;
edi += 2;
return s;

COMMON("\n\tstosw\n\tstosb")展开为:
eax = pattern;
edi = s;
ecx = count/4;
rep ===> while( ecx-- != 0 )
{
stosl ===> es:[edi] = eax;
edi += 4;
}
x => stosw;stosb=> es:[edi] = ax;
edi += 2;
es:[edi] = al;
edi += 1;

return s;

c): 进一步分析:
{
int d0, d1;
switch (count % 4)
{
case 0: COMMON(""); return s;
case 1: COMMON("\n\tstosb"); return s;
case 2: COMMON("\n\tstosw"); return s;
default: COMMON("\n\tstosw\n\tstosb"); return s;
}
}
对剩下的字节数进行移动!!!

---------------------------------------------------------------
__constant_c_x_memset()
---------------------------------------------------------------
include/asm-i386/string.h

#define __constant_c_x_memset(s, c, count) \
(__builtin_constant_p(count) ? \
__constant_c_and_count_memset((s),(c),(count)) : \
__constant_c_memset((s),(c),(count)))

功能:对s所指定的的字符串用c填充指定的个数count个字节。

参考资料:
1.__constant_c_and_count_memset():
static inline void * __constant_c_and_count_memset(void * s, unsigned long pattern, size_t count)
{
switch (count)
{
case 0:
return s;
case 1:
*(unsigned char *)s = pattern;
return s;
case 2:
*(unsigned short *)s = pattern;
return s;
case 3:
*(unsigned short *)s = pattern;
*(2+(unsigned char *)s) = pattern;
return s;
case 4:
*(unsigned long *)s = pattern;
return s;
}
#define COMMON(x) \
__asm__ __volatile__( \
"rep ; stosl" \
x \
: "=&c" (d0), "=&D" (d1) \
: "a" (pattern),"0" (count/4),"1" ((long) s) \
: "memory")
{
int d0, d1;
switch (count % 4)
{
case 0: COMMON(""); return s;
case 1: COMMON("\n\tstosb"); return s;
case 2: COMMON("\n\tstosw"); return s;
default: COMMON("\n\tstosw\n\tstosb"); return s;
}
}

#undef COMMON
}

2.__constant_c_memset():
static inline void * __constant_c_memset(void * s, unsigned long c, size_t count)
{
int d0, d1;
__asm__ __volatile__(
"rep ; stosl\n\t"
"testb $2,%b3\n\t"
"je 1f\n\t"
"stosw\n"
"1:\ttestb $1,%b3\n\t"
"je 2f\n\t"
"stosb\n"
"2:"
:"=&c" (d0), "=&D" (d1)
:"a" (c), "q" (count), "0" (count/4), "1" ((long) s)
:"memory");
return (s);
}
---------------------------------------------------------------
__memset()
---------------------------------------------------------------
include/asm-i386/string.h

#define __memset(s, c, count) \
(__builtin_constant_p(count) ? \
__constant_count_memset((s),(c),(count)) : \
__memset_generic((s),(c),(count)))

功能:将s所指定的内存区域用c字符填充count次数。

参考资料:
1.__constant_count_memset():
#define __constant_count_memset(s,c,count) __memset_generic((s),(c),(count))

2.__memset_generic():
static inline void * __memset_generic(void * s, char c,size_t count)
{
int d0, d1;
__asm__ __volatile__(
"rep\n\t"
"stosb"
: "=&c" (d0), "=&D" (d1)
:"a" (c),"1" (s),"0" (count)
:"memory");
return s;
}
---------------------------------------------------------------
__HAVE_ARCH_MEMSET memset()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_MEMSET
#define memset(s, c, count) \
(__builtin_constant_p(c) ? \
__constant_c_x_memset((s),(0x01010101UL*(unsigned char)(c)),(count)) : \
__memset((s),(c),(count)))

功能同上:

参考资料:
1.__constant_c_x_memset():
#define __constant_c_x_memset(s, c, count) \
(__builtin_constant_p(count) ? \
__constant_c_and_count_memset((s),(c),(count)) : \
__constant_c_memset((s),(c),(count)))

2.__memset()同上。

?????(0x01010101UL*(unsigned char)(c))是什么意思???
---------------------------------------------------------------
__HAVE_ARCH_MEMSCAN memscan()
---------------------------------------------------------------
include/asm-i386/string.h

/*
* find the first occurrence of byte 'c', or 1 past the area if none
*/
#define __HAVE_ARCH_MEMSCAN
static inline void * memscan(void * addr, int c, size_t size)
{
if (!size) return addr;
__asm__("repnz; scasb\n\t"
"jnz 1f\n\t"
"dec %%edi\n"
"1:"
: "=D" (addr), "=c" (size)
: "0" (addr), "1" (size), "a" (c)
: "memory");
return addr;
}

重排指令:
edi = addr;
ecx = size;
eax = c;
ZF = 0;
repnz ====> while( ecx-- != 0 && ZF == 0 )
{
scasb ====> if( (al - es:[edi++]) == 0 )
ZF = 1;
}
jnz 1 ====> if( ZF != 0 ) goto 1;
dec %%edi ====> edi--;
1: ====> 1:

此函数的汇编非常简单,就不再啰嗦了。
线性扫描内存,找到了第一个'c',就返回找到的地址;没找到就返回所比较的最后一个位置。
#endif /* __KERNEL__ */

#endif /* !_I386_STRING_H_ */
***************************************************************
汇编写的字符串函数终于啃完了!!!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: