您的位置：首页 > 编程语言

核中汇编写的字符串函数代码分析

2012-02-05 13:36 267 查看

***************************************************************
开始啃用汇编写的字符串函数：
***************************************************************
---------------------------------------------------------------
_I386_STRING_H_宏
---------------------------------------------------------------
include/asm-i386/string.h

#ifndef _I386_STRING_H_
#define _I386_STRING_H_
当包括了该汇编写的字符串处理函数的头文件后，就定义这个宏予以说明。
---------------------------------------------------------------
__KERNEL__宏
---------------------------------------------------------------
include/asm-i386/string.h

#ifdef __KERNEL__
#include <linux/config.h>
注意：
只有定义的了__KERNEL__宏才会包含config.h头文件。
/*
* On a 486 or Pentium, we are better off not using the
* byte string operations. But on a 386 or a PPro the
* byte string ops are faster than doing it by hand
* (MUCH faster on a Pentium).
*/
下面这段注释很重要，建议看看：
/*
* This string-include defines all string functions as inline
* functions. Use gcc. It also assumes ds=es=data space, this *should be normal. Most of the string-functions are rather *heavily hand-optimized,
* see especially strsep,strstr,str[c]spn. They should work, but are not
* very easy to understand. Everything is done entirely within the register
* set, making the functions fast and clean. String instructions have been
* used through-out, making for "slightly" unclear code :-)
*
* NO Copyright (C) 1991, 1992 Linus Torvalds,
* consider these trivial functions to be PD.
*/

/* AK: in fact I bet it would be better to move this stuff all out of line. */
---------------------------------------------------------------
__HAVE_ARCH_STRCPY strcpy()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRCPY
static inline char * strcpy(char * dest,const char *src)
{
int d0, d1, d2;
__asm__ __volatile__(
"1:\tlodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b"
: "=&S" (d0), "=&D" (d1), "=&a" (d2)
:"0" (src),"1" (dest)
: "memory");
return dest;
}

分析：
1.改写指令更清楚点：
1: ---> 1:
lodsb ---> mov al,ds:[si]
inc si
stosb ---> mov es:[di],al
inc di
testb al,al ---> test al,al
jne 1 ---> jne 1
明显该循环以0结束，当读到最后一个为0的字节后，该循环终止。

2.参数分析：
S: si/esi
&: 一般情况下，gcc会把输入操作数和输出操作数分配在同一个寄存器中，因为它假设在输出产生之前所有的输入都被消耗掉了。在输出操作数之前加上"&"，可以保证输出操作数不会覆盖掉输入，即gcc将为此输出操作数分配一个输入操作数还没使用的寄存器，除非特殊声明（如用数字0-9，见下面）

0-9: 指定一个操作数，它既作输入，又作输出，而且输入操作数和输出操作数占据同一个位置（寄存器）。数字标志只能出现在输入中，指出与第I个输出操作数占据同一个位置。

int d0, d1, d2;
"=&S" (d0), "=&D" (d1), "=&a" (d2)
"0" (src),"1" (dest)
代码分析：
该输入操作数src和dst是既用作为输入操作数，又用作输出操作数的。在最开始时，src，dest作为整个函数的入口参数。将src，dest这两个char*型指针送入si/esi，di/edi中。在"0"与"1"的作用下，src与d0占据同一个寄存器si/esi，dst与d1占据同一个寄存器di/edi，所以d0，d1将分别从si/esi，di/edi中取出src，dest存入其中的函数入口参数，从而实现了将参数转移到函数局部变量上来。在函数的执行中si/esi，di/edi寄存器发生了变化。最后函数执行完毕返回时。由于src，dest前面指定的"0"和"1"说明了src，dest是既用作为输入操作数，又用作输出操作数的。且又分别与第0，1个输出操作数d0，d1占据同一个寄存器si/esi，di/edi。且又在"&"的保护下，明确指明输出操作数不能覆盖输入操作数，所以src，dest分别存入si/esi，di/edi中作为输出。

D: di/edi
a: ax/eax
"memory": 这是register-modified部分。说明内存修改不可预测，禁止编译器将其值缓存于寄存器中。

3.指令分析：
lodsb: == mov al,[si]
inc si / dec si
stosb: == mov es:[di],al
inc di / dec di
testb: == test oprd1,oprd2
把oprd1 & oprd2指令执行后，设置标志ZF,PF,SF.

---------------------------------------------------------------
__HAVE_ARCH_STRNCPY strncpy()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRNCPY
static inline char * strncpy(char * dest,const char *src,size_t count)
{
int d0, d1, d2, d3;
__asm__ __volatile__(
"1:\tdecl %2\n\t"
"js 2f\n\t"
"lodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b\n\t"
"rep\n\t"
"stosb\n"
"2:"
: "=&S" (d0), "=&D" (d1), "=&c" (d2), "=&a" (d3)
:"0" (src),"1" (dest),"2" (count)
: "memory");
return dest;
}

指令重排：
1: decl ecx ===> 1: dec cx
js 2 ===> js 2
lodsb ===> mov al,ds:[si]
inc si / dec si
stosb ===> mov es:[di],al
inc di /dec si
testb al,al ===> test al,al
jne 1 ===> jne 1

rep ===> rep
stosb ===> mov es:[di],al
inc di /dec si
2: ===> 2:

分析：
对这段代码的分析分3种情况：
若内存中为: abcde\0，
1）要求复制3个字符：
（1）初始值CX == 3
然后每次减一，复制一个字符过去;然后再判断复制的该字符是否为0
3-->2: copy a
2-->1: copy b
1-->0: copy c
0-->-1 js 2

2）要求复制5个字符：
（1）初始值CX == 5
然后每次减一，复制一个字符过去;然后再判断复制的该字符是否为0
5-->4: copy a
4-->3: copy b
3-->2: copy c
2-->1: copy d
1-->0: copy e
0-->-1 js 2
（2）所以复制5个字符: 复制5个字符：5个字符.

3）要求复制6个字符：
（1）初始值CX == 6
然后每次减一，复制一个字符过去;然后再判断复制的该字符是否为0
6-->5: copy a
5-->4: copy b
4-->3: copy c
3-->2: copy d
2-->1: copy e
1-->0: copy \0
test al,al ===> al == \0 ZF == 1成立.
jne 1 ===> 不会跳转到1

继续往下执行：此时CX == 0，al == \0
rep: 判断CX是否为0，而cx == 0,就结束循环
（2）所以复制6个字符: 复制6个字符：5个字符+一个'\0'.

4）要求复制10个字符：
初始值CX == 10
然后每次减一，复制一个字符过去;然后再判断复制的该字符是否为0
10-->9: copy a
9-->8: copy b
8-->7: copy c
7-->6: copy d
6-->5: copy e
5-->4: copy \0
test al,al ===> al == \0 ZF == 1成立.
jne 1 ===> 不会跳转到1

继续往下执行：此时CX == 4，al == \0
rep : CX==4，CX!=0，(CX=CX-1)==3，继续往下执行
copy al == \0
重复循环：
rep : CX==3，CX!=0，(CX=CX-1)==2，继续往下执行
copy al == \0
重复循环：
rep : CX==2，CX!=0，(CX=CX-1)==1，继续往下执行
copy al == \0
重复循环：
rep : CX==1，CX!=0，(CX=CX-1)==0，继续往下执行
copy al == \0
重复循环：rep: cx==0，就结束循环
（2）所以复制10个字符，先复制6个字符：5个字符+一个'\0'，再填充4个'\0'

5）要求复制0个字符：
（1）初始值CX == 0
0-->-1 js 2
（2）所以复制了0个字符。

6）要求复制-1个字符：
（1）初始值CX == -1
-1-->-2 js 2
（2）所以复制了0个字符。
注意：
static inline char * strncpy(char * dest,const char *src,size_t count)，该函数中的count是送往cx/ecx中去了，而ecx最大为32位故对有符号数最多复制2G-1个字节，即字符串不能超过（2G-1）B。
当时产生疑问，当CX<=0时，都是不复制，为何不干脆用个无符号数，这样可以扩大到4G。请看下一个函数就解决了。因为当要把两个字符串串联起来时，也是用ECX作为计数器的，而ECX为32位，最大表示范围为4G-1，所以这两个字符串的长度就各分了一半为2G-1.
rep指令说明：
重复其后面的串操作指令动作，每一次重复都先判断CX是否为0，如为0就结束循环，否则CX的值减1。
类似于loop指令，但loop指令是先把CX的值减1，后再来判断是否为0。
注意在重复过程中的减一操作，不会影响各标志。
---------------------------------------------------------------
strcat()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRCAT
static inline char * strcat(char * dest,const char * src)
{
int d0, d1, d2, d3;
__asm__ __volatile__(
"repne\n\t"
"scasb\n\t"
"decl %1\n"
"1:\tlodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b"
: "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3)
: "0" (src), "1" (dest), "2" (0), "3" (0xffffffffu)
:"memory");
return dest;
}
指令重排：
repne ===> while(ECX != 0 && ZF != 1)
scasb ===> {
if((al-es:[edi])==0)
ZF = 1;
edi++;
ECX--;
}

decl %1 ===> dec edi
1: ===> 1:
lodsb ===> mov al, ds:[esi]
inc esi
stosb ===> mov es:[edi], al
inc edi
testb %%al,%%al ===> test al, al
jne 1 ===> jne 1

参数初始值分析：
: "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3)
: "0" (src), "1" (dest), "2" (0), "3" (0xffffffffu)
src ==> si/esi 此处为： esi
dest ==> di/edi 此处为： edi
0 ==> ax/eax 此处为： ax
0xffffffffu ===> ecx 此处为： ecx
所以，esi，edi指向两个字符串的起始位置；而ax==0；ecx==0xffffffffu

一般情况分析:
初始值：
esi--->'abc\0' (src)
edi--->'123\0' (dest)
al == 0
ecx == 0xffffffffu
while(ECX != 0 && ZF != 1)
{
if((al-es:[edi])==0)
ZF = 1;
edi++;
ECX--;
}
在edi所指向的字符串中一直找到以'\0'结束的地方。然后，edi指向'\0'字节的下一个字节，ECX--；再就循环结束。此时edi=edi+4；ECX=ECX-4。

说明：可见要么在es:[edi]所指向的字符串中找到为'\0'的字符，从而能结束循环。要么该字符串大于或等于0xffffffff(2G-1B)（不计结尾处的'\0'），使得ECX减为0，从而结束循环。

dec edi
edi = edi - 1;edi就指向es:[edi]所指向的字符串中的'\0'结束处字符。

此时寄存器的值为：
esi--->'abc\0' (src)
edi--->'123\0'中的为'\0'结尾处字符 (dest)
al == 0
ecx == 0xffffffffbu

1:
mov al, ds:[esi]
inc esi
mov es:[edi], al
inc edi
test al, al
jne 1
将ds:[esi]所指向的字符串复制到es:[edi]所指向的字符串的结尾处，从es:[edi]所指向字符串的'\0'处开始。该'\0'被覆盖。

esi--->'abc\0?'中的'?'处. (src)
edi--->'123abc\0?'中的最后为'?'结尾处字符 (dest)
al == 0 ，注意这个0是从esi所指向的字符串中取出的结尾字符，而非初始化的0

功能：strcat(char * dest,const char * src)，将src所指向的字符串复制到dest所指向的字符串的后面，将dest的'\0'覆盖，dest-src串成一个字符串后，再将src的'\0'复制过来使dest-src串结的字符串结束。

算法过程：
1.先扫描dest所指向的字符串，找到其的为'\0'处；
2.再从src所指向的字符串中一一将src所指向的字符串的各个字节复制到dest以'\0'为起始处。一直复制到src所指向的字符串的最后一个'\0'，将这个'\0'复制完后。就结束程序。
可见，该函数要求src，dest所向的字符串要以'\0'结束。

特殊情况1：
初始值：
esi--->'abc\0' (src)
edi--->'123456789... ...YX' 该字符串>=0xffffffff (dest)
设edi指向es这个段的开始处，为0基址。
即：edi[0]=='1'，edi[0xffffffff]=='X'，由于edi只有32位，表示范围为0X0--->0xffffffff，共4G个字符。所以就算该字符串有多于4G的字符，esi将无法引用，所以该edi所指向的字符串到edi[0xffffffff]=='X'止。字符再多，edi再++，edi又变为了0。
esi的分析也同此。
al == 0
ecx == 0xffffffffu
while(ECX != 0 && ZF != 1)
{
if((al-es:[edi])==0)
ZF = 1;
edi++;
ECX--;
}
循环体执行0xffffffff次
由于edi所指向的字符串>=0xffffffff，则在上面的寻找edi所指向的字符串的'\0'结束符时候，就会使ECX == 0，从而结束循环，此时edi指向(0xffffffff)处的字节。（不考虑段越界）
出循环时，ECX == 0，edi == 0xffffffff。

dec edi
edi = edi - 1;edi == 0xffffffff-1，即：edi[0xffffffff-1]=='Y'。

此时寄存器的值为：
esi--->'abc\0' (src)
edi--->'123456......YX'，edi==0xffffffff-1，edi就指向edi[0xffffffff-1]=='Y'（即：0xffffffff-1）处的字节 (dest)
al == 0
ecx == 0x00000000u

1:
mov al, ds:[esi]
inc esi
mov es:[edi], al
inc edi
test al, al
jne 1
将ds:[esi]所指向的字符串'abc\0'中的esi[0]=='a'复制到es:[edi]==es:edi[0xffffffff-1]=='Y'处。该es:[0xffffffff-1]=='Y'的字节'Y'被覆盖为'a'。即：esi[0]=='a'--->edi[0xffffffff-1]=='Y'
edi--->'123456......aX'。
这时，esi++,esi[1]=='b'；edi++,edi[0xffffffff]=='X'。

再从ds:[esi]中复制下一个esi[1]=='b'，到edi[0xffffffff]=='X'
edi--->'123.....ab'，edi++，edi==0x00000000，就指向edi[0]=='1'处的字节
esi++,esi[2]=='c'.esi--->'abc\0?'中的'c'处， (src)

再从esi[2]=='c'，复制到edi[0x00000000]=='1'处。
esi++,esi[3]=='\0'，esi--->'abc\0?'中的'\0'处. (src)
edi--->'c23.....ab'，edi++，edi==0x00000001，就指向edi[0x00000001]=='2'处的字节

再从esi[3]=='\0'，复制到edi[0x00000001]=='2'处。
esi++,esi[4]=='?'，esi--->'abc\0?'中的'?'处. (src)
edi--->'c\03.....ab'，edi++，edi==0x00000002，就指向edi[0x00000002]=='3'处的字节。

所以合并后的字符串为"c\0".

与此类似，当src中的字符等于4G时，情况同上；而当src,dest均等于4G时，情况也同上。
只要src，dest中的字符之和不大于4G-1，留一个给'\0'，就OK！

当src,dest中有一个或多个为空时，情况简单：
当dest为空，而src不为空：将src所指向的字符串连同'\0'复制到dest中去！
当src为空，而dest不为空：dest不动，只将src所指的'\0'，复制并覆盖dest中的最后一个'\0'！
当src为空，而dest为空：只将src所指的'\0'，复制并覆盖dest中那个'\0'！

参考资料：
S：si/esi
D：di/edi
a：ax/eax
c：cx/ecx
&: 一般情况下，gcc会把输入操作数和输出操作数分配在同一个寄存器中，因为它假设在输出产生之前所有的输入都被消耗掉了。在输出操作数之前加上"&"，可以保证输出操作数不会覆盖掉输入，即gcc将为此输出操作数分配一个输入操作数还没使用的寄存器，除非特殊声明（如用数字0-9，见下面）

0-9: 指定一个操作数，它既作输入，又作输出，而且输入操作数和输出操作数占据同一个位置（寄存器）。数字标志只能出现在输入中，指出与第I个输出操作数占据同一个位置。
---------------------------------------------------------------
strncat()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRNCAT
static inline char * strncat(char * dest,const char * src,size_t count)
{
int d0, d1, d2, d3;
__asm__ __volatile__(
"repne\n\t"
"scasb\n\t"
"decl %1\n\t"
"movl %8,%3\n"
"1:\tdecl %3\n\t"
"js 2f\n\t"
"lodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b\n"
"2:\txorl %2,%2\n\t"
"stosb"
: "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3)
: "0" (src),"1" (dest),"2" (0),"3" (0xffffffffu), "g" (count)
: "memory");
return dest;
}
指令重排：
repne ===> while(ecx != 0 && ZF != 1)
scasb ===> {
if((al-es:[edi])==0)
ZF = 1;
edi++;
ecx--;
}
decl %1 ===> decl edi
movl %8,%3 ===> movl count,ecx
1: ===> 1:
decl %3 ===> decl ecx
js 2 ===> js 2
lodsb ===> mov al,ds:[esi]
inc esi
stosb ===> mov es:[edi],al
inc edi
testb %%al,%%al ===> test al,al
jne 1 ===> jne 1
2: ===> 2:
xorl %2,%2 ===> xor eax,eax
stosb ===> mov es:[edi],al
inc edi
参数初始值分析：
: "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3)
: "0" (src),"1" (dest),"2" (0),"3" (0xffffffffu), "g" (count)
esi: esi = src
edi: edi = dest
eax: eax = 0
ecx: ecx = 0xffffffff
"g": 让编译器决定如何装入它。

代码分析：
while(ecx != 0 && ZF != 1)
{
if((al-es:[edi])==0)
ZF = 1;
edi++;
ecx--;
}
decl edi
在es:[edi]所指向的字符串中寻找'\0'处。然后回调edi指向该'\0'。
当该字符串在4G-1个字节中时，以'\0'正常结束。而当该字符串等于4G时，以ecx==0结束循环，edi回调后指向edi[0xffffffff-1]处。而字符串大于4G则不可能。

movl count,ecx
1:
decl ecx
js 2
mov al,ds:[esi]
inc esi
mov es:[edi],al
inc edi
test al,al
jne 1
2:
xor eax,eax
mov es:[edi],al
inc edi

1：表示开始复制esi所指向的字符串到edi中去。
2：表示复制结束后，在未尾再加个'\0'。
分情况讨论：
1）若count数大于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串连同'\0'复制过了后，结束1：循环，在2：中再在'\0'的后面再复制一个'\0'，再edi++，结束程序。

2）若count数小于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串中只复制count个后，ecx将减为-1后，由js 2跳出1：，在2：中接着再在后面复制一个'\0'，再edi++，结束程序。

3）若count等于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串中复制count个后，ecx将减为0后，再在开始处ecx--，ecx == -1，由js 2跳出1：，在2：中接着再在后面复制一个'\0'，再edi++，结束程序。

4）若count为负数，在开始处ecx--，ecx == 负数，由js 2跳出1：，在2：中接着再在后面复制一个'\0'，即给edi所指向的字符串的那个'\0'再用'\0'重写一遍'\0'，再edi++，结束程序。

尽管可以复制4G个字节，由于count为有符号数，则最多复制2G-2（除掉'\0'）个字节。这显然是假设es:[edi]这个字符串最大为2G而来的，因为作者也不知道es:[edi]所指向的字符串有多长，虽然大部分不可能有2G，只有点点大，但作者却是作了最一般化的处理。
---------------------------------------------------------------
__HAVE_ARCH_STRCMP strcmp()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRCMP
static inline int strcmp(const char * cs,const char * ct)
{
int d0, d1;
register int __res;
__asm__ __volatile__(
"1:\tlodsb\n\t"
"scasb\n\t"
"jne 2f\n\t"
"testb %%al,%%al\n\t"
"jne 1b\n\t"
"xorl %%eax,%%eax\n\t"
"jmp 3f\n"
"2:\tsbbl %%eax,%%eax\n\t"
"orb $1,%%al\n"
"3:"
:"=a" (__res), "=&S" (d0), "=&D" (d1)
:"1" (cs),"2" (ct)
:"memory");
return __res;
}

初始值分析：
ax/eax：register int __res;
si/esi：const char* cs；
di/edi：const char* ct；
ZF == 0

指令重排：
1: lodsb ===> 1: mov al,ds:[esi]
inc esi
scasb ===> if((al-es:[edi])==0)
ZF = 1;
edi++;
jne 2 ===> jne 2;
testb %%al,%%al ===> testb al,al
jne 1 ===> jne 1
xorl %%eax,%%eax ===> xorl eax,eax
jmp 3 ===> jmp 3
2: sbbl %%eax,%%eax ===> 2: sbbl eax,eax
orb $1,%%al ===> orb al ，1
3: ===> 3:

1）代码剖析：
这是比较ds:[esi]和es:[edi]两个字符串是否相等。这两个字符串当以'\0'结束。函数返回值存放在eax中。将ds:[esi]中的每个字符送往al中，再与es:[edi]中的相应的各个字符进行比较，相同就置位ZF=1，然后测试al该字符是否为'\0'，如果不是则继续比较下一个字符；如果是'\0'，则就清eax为0，结束比较函数，该eax就为函数的返回值。

2）情况：
1.ds:[esi]和es:[edi]两个字符串是相等：同上，eax返回0
2.ds:[esi]和es:[edi]两个字符串不相等：
（1）ds:[esi]的字符串ASCII小于es:[edi]的ASCII
ds:[esi]=="abc\0"
es:[edi]=="xyz\0"
if((al-es:[edi])==0) ===>if( ('a'-'x')==0 )
ZF = 1; 条件不成立; CF == 1
edi++; edi++; edi指向'y'
jne 2 ; jne 2
2: sbbl eax,eax eax = eax-eax-CF=-1=0xffffffff
orb al ，1 al = 0xff

结论：
cs所指的字符串中第一个不同的字符的ASCII<ct所指的字符串第一个不同的字符的ASCII 返回值： eax==0xffffffff==-1

（2）ds:[esi]的字符串ASCII大于es:[edi]的ASCII
ds:[esi]=="xyz\0"
es:[edi]=="abc\0"
if((al-es:[edi])==0) ===>if( ('x'-'a')==0 )
ZF = 1; 条件不成立; CF == 0
edi++; edi++; edi指向'y'
jne 2 ; jne 2
2: sbbl eax,eax eax = eax-eax-CF=0
orb al ，1 al = 0|1=1=0x00000001
输出： eax==0x00000001
结论：
cs所指的字符串第一个不同的字符的ASCII>ct所指的字符串第一个不同的字符的ASCII 返回值： eax==0x00000001==1

（3）当其中一个字符串是另一个字符串的子字符串时:
ds:[esi]=="abc\0"
es:[edi]=="abc123\0"
当比较到'\0'-'1'时，结束循环，返回-1.
而是这种情况时候：
ds:[esi]=="abc123\0"
es:[edi]=="abc\0"
当比较到'1'-'\0'时，结束循环，返回1.

（4）若其中一个为无限长的字符串，另一个为有限长的字符串时：
则要么在其中的一个位置不同，跳出来同上面的分析；要么一个相当于为另一个的子字符串，分析同上。
可见，只要一个字符串符合以'\0'结束的规则，另一个字符串就算没有'\0'结束，也能正常终止函数。

（5）两个字符串均为无限长的字符串：
若两者在中间某处不等，就终止跳出，分析同上。
若两者完全相等且又无限长，则就地直比较下去。esi，edi将递增到0xffffffff，然后又回到0x00000000。若两字符串是从0x00000000开始的话，就又重复比较下去，一个死循环。若两字符串是从中间某处开始，这个内存中的0x00000000开始处或其后面有不同的字符，就会终止函数。
---------------------------------------------------------------
__HAVE_ARCH_STRNCMP strncmp()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRNCMP
static inline int strncmp(const char * cs,const char * ct,size_t count)
{
register int __res;
int d0, d1, d2;
__asm__ __volatile__(
"1:\tdecl %3\n\t"
"js 2f\n\t"
"lodsb\n\t"
"scasb\n\t"
"jne 3f\n\t"
"testb %%al,%%al\n\t"
"jne 1b\n"
"2:\txorl %%eax,%%eax\n\t"
"jmp 4f\n"
"3:\tsbbl %%eax,%%eax\n\t"
"orb $1,%%al\n"
"4:"
:"=a" (__res), "=&S" (d0), "=&D" (d1), "=&c" (d2)
:"1" (cs),"2" (ct),"3" (count)
:"memory");
return __res;
}
初始值：
ax/eax：__res
si/esi：const char * cs
di/edi：const char * ct
cx/ecx：count

指令重排：
1: decl %3 ===> 1: decl ecx
js 2 ===> js 2
lodsb ===> mov al,ds:[esi]
inc esi
scasb ===> if((al-es:[edi])==0)
ZF = 1;
edi++;
jne 3 ===> jne 3
testb %%al,%%al ===> testb al,al
jne 1 ===> jne 1
2: xorl %%eax,%%eax ===> 2: xorl eax,eax
jmp 4 ===> jmp 4
3: sbbl %%eax,%%eax ===> 3: sbbl eax,eax
orb $1,%%al ===> orb 1,al
4: ===> 4:

此函数分析同上：
1）当指定的要比较的字符个数小于两个字符串长度时：
a：两字符串相同：ecx变为-1，由js 2出循环，再由xorl eax,eax将eax清0，作为函数的返回值返回。
b：两字符串不相同：由jne 3跳出来：
b-1：当cs所指的字符串第一个不同的字符的ASCII>ct所指的字符串第一个不同的字符的ASCII 返回值： eax==0x00000001==1；
b-2：当cs所指的字符串中第一个不同的字符的ASCII<ct所指的字符串第一个不同的字符的ASCII 返回值： eax==0xffffffff==-1

2）当指定的要比较的字符个数count等于两个字符串长度时：
a：两者相等时：
由testb al,al跳出循环，再由xorl eax,eax，将eax清0，返回这个0，结束函数。
b：两者不相等时：
同上分析。

3）当指定的要比较的字符个数count大于两个字符串时：
a：两者相等时：
比较到'\0'时，由testb al,al跳出循环，再由xorl eax,eax，将eax清0，返回这个0，结束函数。
b：两者不相等时：
同上分析。

4）当指定的要比较的字符个数count<=0时：
程序流程如下：
根本就不比较，直接返回0，结束函数。
1: decl %3 ===> 1: decl ecx
js 2 ===> js 2
... ...
2: xorl %%eax,%%eax ===> 2: xorl eax,eax
jmp 4 ===> jmp 4
... ...
4: ===> 4:

---------------------------------------------------------------
__HAVE_ARCH_STRCHR strchr()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRCHR
static inline char * strchr(const char * s, int c)
{
int d0;
register char * __res;
__asm__ __volatile__(
"movb %%al,%%ah\n"
"1:\tlodsb\n\t"
"cmpb %%ah,%%al\n\t"
"je 2f\n\t"
"testb %%al,%%al\n\t"
"jne 1b\n\t"
"movl $1,%1\n"
"2:\tmovl %1,%0\n\t"
"decl %0"
:"=a" (__res), "=&S" (d0)
:"1" (s),"0" (c)
:"memory");
return __res;
}

初始值：
ax/eax：int c
si/esi：const char *s

指令重排：
movb %%al,%%ah ===> movl al,ah
1: lodsb ===> 1: mov al,ds:[esi]
inc esi
cmpb %%ah,%%al ===> cmpb ah,al
je 2 ===> je 2
testb %%al,%%al ===> testb al,al
jne 1 ===> jne 1
movl $1,%1 ===> movl 1,esi
2: movl %1,%0 ===> 2: movl esi,eax
decl %0 ===> decl eax

功能：
ds:[esi]所指向的字符串以'\0'结束，在其中从前往后寻找c字符。如果找到，就返回该字符所在字符串中的位置。如果没找到，就返回0。

改写成C语言：
al == 要找寻的字符c;
esi == 该字符串的起始偏移地址;
int eax;
char ah;
ah = al;
1:
al = *(ds*16 + esi);
esi++;
if( al == ah )
goto 2;
if( al != 0 )
goto 1;
esi = 1;
2:
eax = esi;
eax--;
return eax;

极端情况：
如果ds:[esi]所指向的字符串不以'\0'结束的话，esi一个劲的++，直到变到0xffffffff，然后又变为0x00000000，又从头开始寻找，如果开头及到ds:[esi]处都找不到该字符c，或是也没有'\0'时，就陷入一个死循环。
---------------------------------------------------------------
__HAVE_ARCH_STRRCHR strrchr()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRRCHR
static inline char * strrchr(const char * s, int c)
{
int d0, d1;
register char * __res;
__asm__ __volatile__(
"movb %%al,%%ah\n"
"1:\tlodsb\n\t"
"cmpb %%ah,%%al\n\t"
"jne 2f\n\t"
"leal -1(%%esi),%0\n"
"2:\ttestb %%al,%%al\n\t"
"jne 1b"
:"=g" (__res), "=&S" (d0), "=&a" (d1)
:"0" (0),"1" (s),"2" (c)
:"memory");
return __res;
}

初始值分析：
__res ： 0
si/esi ： const char * s
ax/eax ： c

指令重排：
movb %%al,%%ah ===> movb al,ah
1: lodsb ===> 1: mov al,ds:[esi]
inc esi
cmpb %%ah,%%al ===> cmpb ah,al
jne 2 ===> jne 2
leal -1(%%esi),%0 ===> leal [esi-1],__res(g)
2: testb %%al,%%al ===> 2: testb al,al
jne 1 ===> jne 1
本函数分析类似上面的strchr()。只不过是找到在const char *s所指向的字符串c出现的最后的位置。找到了，返回其所在地址；没找到，返回0。分析类似上面的strchr()，不再重复。
strrchr - Find the last occurrence of a character in a string.

如果s为空指针，则后果无法预料。
---------------------------------------------------------------
__HAVE_ARCH_STRLEN strlen()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRLEN
static inline size_t strlen(const char * s)
{
int d0;
register int __res;
__asm__ __volatile__(
"repne\n\t"
"scasb\n\t"
"notl %0\n\t"
"decl %0"
:"=c" (__res), "=&D" (d0)
:"1" (s),"a" (0), "0" (0xffffffffu)
:"memory");
return __res;
}

参数初始值分析：
di/edi：const char * s
ax/eax：0
cx/ecx：0xffffffff
size_t ecx = 0xffffffff;
ZF = 0;
char * edi = s;
指令重排： eax = 0;
repne ===> while(ecx != 0 && ZF == 0)
scasb ===> {
if((al-es:[edi])==0)
ZF = 1;
edi++;
ecx--;
}
notl %0 ===> ecx = !ecx;
decl %0 ===> ecx--;

此处函数主要是ecx = !ecx，由于ecx是从0xffffffff递减下来的。记住：递减计数和递增计数是一样的计数，只要在最后，取个反，就让两者相互转化了。在递减计数或递增计数过程中多计数了的值，在最后取反后，要（转化后的数--）。

至于各种情况分析，很简单，同前，无须多说。
而对于极端情况分析，edi++,ecx--到0xfffffffff--->0x00000000，情况同前。

参考：
typedef unsigned int __kernel_size_t;
typedef __kernel_size_t size_t;
---------------------------------------------------------------
__memcpy()
---------------------------------------------------------------
include/asm-i386/string.h

static inline void * __memcpy(void * to, const void * from, size_t n)
{
int d0, d1, d2;
__asm__ __volatile__(
"rep ; movsl\n\t"
"movl %4,%%ecx\n\t"
"andl $3,%%ecx\n\t"
#if 1 /* want to pay 2 byte penalty for a chance to skip microcoded rep? */
"jz 1f\n\t"
#endif
"rep ; movsb\n\t"
"1:"
: "=&c" (d0), "=&D" (d1), "=&S" (d2)
: "0" (n/4), "g" (n), "1" ((long) to), "2" ((long) from)
: "memory");
return (to);
}

参数初始值：
cx/ecx：n/4
di/edi：to
si/esi：from

指令重排： ecx = n/4;
rep ===> while( ecx-- != 0 )
movsl ===> (long)ds:[esi] = (long)es:[edi];
movl %4,%%ecx ===> ecx = n;
andl $3,%%ecx ===> ZF = ecx & 0x00000003
#if 1
jz 1 ===> if(ZF==0) goto 1;
#endif
rep ===> while( ecx-- != 0 )
movsb ===> (char)ds:[esi] = (char)es:[edi];
1: ===> 1:

分析：
1.先进行4B为单位的复制：
ecx = n/4;然后就开始复制。
2.求出ecx = ecx % 4；对不足4B的字节进行复制。
ZF = ecx & 0x00000003;
以上为一般情况分析。

3.如果 0< n <4:
则ecx = n/4 == 0;
if( ecx-- !=0 )条件不成立，不进行4B单位的复制。直接进行以字节为单位的复制。

4.如果n = 0:
两个if条件均不满足，根本就不复制。

5.如果n < 0:
函数依然工作，只是牵涉到补码了，后果未知。

如果0<n<4

参考：
typedef unsigned int __kernel_size_t;
typedef __kernel_size_t size_t;
---------------------------------------------------------------
__constant_memcpy()
---------------------------------------------------------------
include/asm-i386/string.h

/*
* This looks ugly, but the compiler can optimize it totally,
* as the count is constant.
*/
static inline void * __constant_memcpy(void * to, const void * from, size_t n)
{
long esi, edi;
if (!n) return to;
#if 1 /* want to do small copies with non-string ops? */
switch (n)
{
case 1: *(char*)to = *(char*)from; return to;
case 2: *(short*)to = *(short*)from; return to;
case 4: *(int*)to = *(int*)from; return to;
#if 1 /* including those doable with two moves? */
case 3: *(short*)to = *(short*)from;
*((char*)to+2) = *((char*)from+2); return to;
case 5: *(int*)to = *(int*)from;
*((char*)to+4) = *((char*)from+4); return to;
case 6: *(int*)to = *(int*)from;
*((short*)to+2) = *((short*)from+2); return to;
case 8: *(int*)to = *(int*)from;
*((int*)to+1) = *((int*)from+1); return to;
#endif/* 1 */
}/* switch */
#endif/* 1 */
esi = (long) from;
edi = (long) to;
if (n >= 5*4)
{
/* large block: use rep prefix */
int ecx;
__asm__ __volatile__(
"rep ; movsl"
: "=&c" (ecx), "=&D" (edi), "=&S" (esi)
: "0" (n/4), "1" (edi),"2" (esi)
: "memory"
);
}/* if */

else
{
/* small block: don't clobber ecx + smaller code */
if (n >= 4*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");

if (n >= 3*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");

if (n >= 2*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");

if (n >= 1*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
}/* else */

switch (n % 4)
{
/* tail */
case 0: return to;

case 1: __asm__ __volatile__(
"movsb"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
return to;

case 2: __asm__ __volatile__(
"movsw"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
return to;

default: __asm__ __volatile__(
"movsw\n\tmovsb"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
return to;
}/* switch */
}

代码分析：
1.对1-8，（一包括7）个字节的复制，采用不同类型的变量进行复制：
#if 1 /* want to do small copies with non-string ops? */
switch (n)
{
case 1: *(char*)to = *(char*)from; return to;
case 2: *(short*)to = *(short*)from; return to;
case 4: *(int*)to = *(int*)from; return to;
#if 1 /* including those doable with two moves? */
case 3: *(short*)to = *(short*)from;
*((char*)to+2) = *((char*)from+2); return to;
case 5: *(int*)to = *(int*)from;
*((char*)to+4) = *((char*)from+4); return to;
case 6: *(int*)to = *(int*)from;
*((short*)to+2) = *((short*)from+2); return to;
case 8: *(int*)to = *(int*)from;
*((int*)to+1) = *((int*)from+1); return to;
#endif/* 1 */
}/* switch */
#endif/* 1 */
当要复制的字节数为：1-8个之间时。执行以上这段程序。当字节数为：
1个：用char *
2个：用short *
4个：用int*

2.复制的字节数在[20,>20]，[16，19]，[12，15]，[8，11]，[4，7]：
if (n >= 5*4) //当要复制的字节数在[20,>20]时：
{
/* large block: use rep prefix */
int ecx;
__asm__ __volatile__(
"rep ; movsl"
: "=&c" (ecx), "=&D" (edi), "=&S" (esi)
: "0" (n/4), "1" (edi),"2" (esi)
: "memory"
);
}/* if */

分析： esi = (long) from;
edi = (long) to;
ecx = n/4;
rep ===> if( ecx-- != 0 )
movsl ===> {
(unsigned long)es:[edi] = ds:[esi];
}
然后就转入下一个switch{}结构体中执行：
switch (n % 4)
{
/* tail */
case 0: return to;

case 1: __asm__ __volatile__(
"movsb"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
return to;

case 2: __asm__ __volatile__(
"movsw"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
return to;

default: __asm__ __volatile__(
"movsw\n\tmovsb"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
return to;
}/* switch */
代码简单，不再啰嗦。就是再将剩下的不足4B的字节复制过去。
default是表示，n%4 == 3，先复制一个字，再复制一个字节，共3B。
--------------------------------------------------------------
else //当要复制的字节数在 4<= n <=19时：
{
/* small block: don't clobber ecx + smaller code */
//当要复制的字节数在[16,19]时：
if (n >= 4*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");

//当要复制的字节数在[12,15]时：
if (n >= 3*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");

//当要复制的字节数在[8,11]时：
if (n >= 2*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");

//当要复制的字节数在[4,7]时：
if (n >= 1*4) __asm__ __volatile__(
"movsl"
:"=&D"(edi),"=&S"(esi)
:"0"(edi),"1"(esi)
:"memory");
}/* else */

分析：
???: ecx初始值没指定？？？ecx = n/4这才对啊！
其实这些代码合并成一个：
if( n >- 1*4 )//7,[9,19]
__asm__ __volatile__(
"rep; movsl\t\n"
:"=&D"(edi),"=&S"(esi),"=C"
:"0"(edi),"1"(esi),"2"(n/4)
:"memory");

注意：
__constant_memcpy()与__memcpy()很相同，参数个数和类型一样，同时功能作用也一样。
---------------------------------------------------------------
__constant_memcpy3d()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_MEMCPY
#ifdef CONFIG_X86_USE_3DNOW/* 对下面的__constant_memcpy3d()
__memcpy3d()，memcpy()*/
#include <asm/mmx.h>
/*
* This CPU favours 3DNow strongly (eg AMD Athlon)
*/
static inline void * __constant_memcpy3d(void * to, const void * from, size_t len)
{
if (len < 512)
return __constant_memcpy(to, from, len);
return _mmx_memcpy(to, from, len);
}
？？？？_mmx_memcpy()函数找不到，只好罢手！！！
---------------------------------------------------------------
__memcpy3d()
---------------------------------------------------------------
include/asm-i386/string.h

static __inline__ void *__memcpy3d(void *to, const void *from, size_t len)
{
if (len < 512)
return __memcpy(to, from, len);
return _mmx_memcpy(to, from, len);
}
？？？？_mmx_memcpy()函数找不到，只好罢手！！！
---------------------------------------------------------------
memcpy()
---------------------------------------------------------------
include/asm-i386/string.h

#define memcpy(t, f, n) \
(__builtin_constant_p(n) ? \
__constant_memcpy3d((t),(f),(n)) : \
__memcpy3d((t),(f),(n)))
#else/* CONFIG_X86_USE_3DNOW */
/*
* No 3D Now!
*/
#define memcpy(t, f, n) \
(__builtin_constant_p(n) ? \
__constant_memcpy((t),(f),(n)) : \
__memcpy((t),(f),(n)))
#endif/* CONFIG_X86_USE_3DNOW */

int __builtin_constant_p(exp)学习:
You can use the built-in function __builtin_constant_p to determine if a value is known to be constant at compile-time and hence that GCC can perform constantfolding on expressions involving that value. The argument of the function is the value to test. The function returns the integer 1 if the argument is known to be a compiletime constant and 0 if it is not known to be a compile-time constant. A return of 0 does not indicate that the value is not a constant, but merely that GCC cannot prove it is a constant with the specified value of the ‘-O’ option.
You would typically use this function in an embedded application where memory was a critical resource. If you have some complex calculation, you may want it to be folded if it involves constants, but need to call a function if it does not. For example:

#define Scale_Value(X) \
(__builtin_constant_p (X) \
? ((X) * SCALE + OFFSET) : Scale (X))

You may use this built-in function in either a macro or an inline function. However, if you use it in an inlined function and pass an argument of the function as the argument to the built-in, GCC will never return 1 when you call the inline function with a string constant or compound literal and will not return 1 when you pass a constant numeric value to the inline function unless you specify the ‘-O’ option.

使用__builtin_constant_p()要和gcc中的-O选项配合使用。

You may also use __builtin_constant_p in initializers for static data. For instance,you can write
static const int table[] = {
__builtin_constant_p (EXPRESSION) ? (EXPRESSION) : -1,
/* . . . */
};
This is an acceptable initializer even if EXPRESSION is not a constant expression.
GCC must be more conservative about evaluating the built-in in this case, because it has no opportunity to perform optimization.Previous versions of GCC did not accept this built-in in data initializers. The earliest version where it is completely safe is 3.0.1.

---------------------------------------------------------------
__HAVE_ARCH_MEMMOVE
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_MEMMOVE
void *memmove(void * dest,const void * src, size_t n);
memmove()延用string.c中的函数。

#define memcmp __builtin_memcmp
---------------------------------------------------------------
__HAVE_ARCH_MEMCHR memchr()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_MEMCHR
static inline void * memchr(const void * cs,int c,size_t count)
{
int d0;
register void * __res;
if (!count) return NULL;
__asm__ __volatile__(
"repne\n\t"
"scasb\n\t"
"je 1f\n\t"
"movl $1,%0\n"
"1:\tdecl %0"
:"=D" (__res), "=&c" (d0)
:"a" (c),"0" (cs),"1" (count)
:"memory");
return __res;
}

功能：cs指定内存的起始位置，count指定查找的个数，c指定要查找的内容。在以cs指定的内存为查找的起始位置，以cs+count为终止位置来查找内容c。找到就返回所找到的位置；没找到就返回0。

参数初始值：
ax/eax: c
di/edi: const void * cs
cx/ecx: count
ZF = 0;
ax = c;
edi = cs;
ecx = count;
指令重排：
repne ===> while( ecx-- != 0 && ZF == 0)
{
scasb ===> if((al-es:[edi++])==0)
ZF = 1;
}
je 1 ===> if(ZF == 1) goto 1;
movl $1,%0 ===> edi = 1;
1: ===> 1:
decl %0 ===> edi--;
return edi;
返回值：如果找到了c，就返回c所在的位置，如果没找到，就返回0。
一般情况代码简单，就此住手。

特殊情况：
1.若ecx==0：则两个if条件均不满足，直接返回0，结束程序。
2.若ecx为0xffffffff巨大的数：要么在其中能找到能与c相匹配的数，返回其位置；要么找不到，当ecx--变为0时，（当ecx==0时，跳出循环时，ecx还要再--又变为0xffffffff），并返回0。
3.此处无负数，故ecx<0一情况无须多虑。由于是内存操作函数，连'\0'也可以进入比较范围。
---------------------------------------------------------------
__memset_generic()
---------------------------------------------------------------
include/asm-i386/string.h

static inline void * __memset_generic(void * s, char c,size_t count)
{
int d0, d1;
__asm__ __volatile__(
"rep\n\t"
"stosb"
: "=&c" (d0), "=&D" (d1)
:"a" (c),"1" (s),"0" (count)
:"memory");
return s;
}
ax = c;
edi = s;
ecx = count;
rep ====> while( ecx !=0 )
{
stosb ====> es:[edi] = al;
}
return s;
---------------------------------------------------------------
__constant_count_memset()
---------------------------------------------------------------
include/asm-i386/string.h

/* we might want to write optimized versions of these later */
#define __constant_count_memset(s,c,count) __memset_generic((s),(c),(count))
---------------------------------------------------------------
__constant_c_memset()
---------------------------------------------------------------
include/asm-i386/string.h

/*
* memset(x,0,y) is a reasonably common thing to do, so we want to fill
* things 32 bits at a time even when we don't know the size of the
* area at compile-time..
*/
static inline void * __constant_c_memset(void * s, unsigned long c, size_t count)
{
int d0, d1;
__asm__ __volatile__(
"rep ; stosl\n\t"
"testb $2,%b3\n\t"
"je 1f\n\t"
"stosw\n"
"1:\ttestb $1,%b3\n\t"
"je 2f\n\t"
"stosb\n"
"2:"
:"=&c" (d0), "=&D" (d1)
:"a" (c), "q" (count), "0" (count/4), "1" ((long) s)
:"memory");
return (s);
}
参数初始值分析：
ax/eax: c
cx/ecx: count/4
di/edi: void *s

指令重排：
rep ====> while( ecx-- != 0 )
{
stosl ====> (long)es:[edi] = eax;
edi += 4;
}
testb $2,%b3 ====> if( (0x02 & (char)count) == 0 )
je 1 ====> goto 1;
stosw ====> (short)es:[edi] = ax;
edi += 2;
1: testb $1,%b3 ====> 1: if( (0x01 & (char)count) == 0)
je 2 ====> goto 2;
stosb ====> (char)es:[edi] = al;
2: ====> 2:
分析：
先以4B为单位进行复制字节。完成后，再分别测试倒数第2位，最后一位是否为1，从而判断是否还剩3,2,1,0个字节。若还剩3B，则复制一个字后，还剩1B；若还剩2B，则复制一个字后，还剩0B.与后面还剩2,0B的情况一样。

特殊情况：
若count==0，则while，if条件均不满足，跳出循环。
---------------------------------------------------------------
__HAVE_ARCH_STRNLEN strnlen()
---------------------------------------------------------------
include/asm-i386/string.h

/* Added by Gertjan van Wingerde to make minix and sysv module work */
#define __HAVE_ARCH_STRNLEN
static inline size_t strnlen(const char * s, size_t count)
{
int d0;
register int __res;
__asm__ __volatile__(
"movl %2,%0\n\t"
"jmp 2f\n"
"1:\tcmpb $0,(%0)\n\t"
"je 3f\n\t"
"incl %0\n"
"2:\tdecl %1\n\t"
"cmpl $-1,%1\n\t"
"jne 1b\n"
"3:\tsubl %2,%0"
:"=a" (__res), "=&d" (d0)
:"c" (s),"1" (count)
:"memory");
return __res;
}
/* end of additional stuff */

参数初始值分析：
cx/ecx: const char * s
dx/edx: count
ax/eax: __res

指令重排：
size_t edx;
edx = count;
char * eax,ecx;
ecx = s;

movl %2,%0 ====> eax = s; //ecx = eax = s;
jmp 2 ====> goto 2;

1: cmpb $0,(%0) ====> 1: if( ((char)(ds:[eax]))==0 )
je 3 ====> goto 3;
incl %0 ====> eax++;

2: decl %1 ====> 2: edx--;
cmpl $-1,%1 ====> if( (0xffffffff & edx) != 0)
jne 1 ====> goto 1;

3: subl %2,%0 ====> 3: eax -= ecx;
return eax;
各种情况分析：
1.字符串的长度（不含'\0'） < count：
s==>"abcd\0?"
count == 5: eax已经指向'\0'，但还尚未比较之。edx==1，经过edx--后变为edx==0，从而结束函数。再经过eax-=ecx;后，eax==4，为字符串的长度（不含'\0'）作为函数返回值。

count == 6: edx==1，尚未变为0，但eax=='\0'，且经过if条件的比较后，跳出循环，eax==4，为字符串的长度（不含'\0'）作为函数返回值。

2.字符串的长度（不含'\0'） == count：
s==>"abcd\0?"
count == 4: count总共比较3次，eax最后指向'd'（但尚未比较），eax-=ecx后，eax==3，为count-1的值，也即循环的次数。

3.字符串的长度（不含'\0'） > count：
s==>"abcd\0?"
count == 3: 共循环2次后，count变为0，从而结束循环。此时比较了两个字符'a'和'b'，eax指向'c'，但尚未比较。eax-=ecx后，eax=2，为count-1，也就是所循环的次数。

4.字符串的长度（不含'\0'）== 0：
s==>"\0?"
count == 4: 返回eax==0。

5.count == 1
s==>"abcd\0?"
count == 1: 返回eax==0。

6.count == 0
s==>"abcd\0?" : edx--后，edx变为0xffffffff，要么当edx又减为0时，终止循环，eax当为0，共加了0xffffffff次，又回到原来的值；要么找到为'\0'处，此时返回字符串的长度（不含'\0'）。

功能分析：
s指定一个字符串的首地址，count指定一个长度。对该字符串进行扫描，若字符串的总长度（不含'\0'）小于count，就返回该字符串的总长度（不含'\0'）；若字符串的总长度（不含'\0'）>= count，就返回count-1；若字符串的总长度（不含'\0'）== 0或count==1就返回0。若count==0则情况未知。

---------------------------------------------------------------
__HAVE_ARCH_STRSTR strstr()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_STRSTR
extern char *strstr(const char *cs, const char *ct);
此处当是引用string.c中的strstr()函数。
---------------------------------------------------------------
__constant_c_and_count_memset()
---------------------------------------------------------------
include/asm-i386/string.h
/*
* This looks horribly ugly, but the compiler can optimize it totally,
* as we by now know that both pattern and count is constant..
*/
static inline void * __constant_c_and_count_memset(void * s, unsigned long pattern, size_t count)
{
switch (count)
{
case 0:
return s;
case 1:
*(unsigned char *)s = pattern;
return s;
case 2:
*(unsigned short *)s = pattern;
return s;
case 3:
*(unsigned short *)s = pattern;
*(2+(unsigned char *)s) = pattern;
return s;
case 4:
*(unsigned long *)s = pattern;
return s;
}
#define COMMON(x) \
__asm__ __volatile__( \
"rep ; stosl" \
x \
: "=&c" (d0), "=&D" (d1) \
: "a" (pattern),"0" (count/4),"1" ((long) s) \
: "memory")
{
int d0, d1;
switch (count % 4)
{
case 0: COMMON(""); return s;
case 1: COMMON("\n\tstosb"); return s;
case 2: COMMON("\n\tstosw"); return s;
default: COMMON("\n\tstosw\n\tstosb"); return s;
}
}

#undef COMMON
}
分析：
1.count == [0,4] :
switch (count)
{
case 0:
return s;
case 1:
*(unsigned char *)s = pattern;
return s;
case 2:
*(unsigned short *)s = pattern;
return s;
case 3:
*(unsigned short *)s = pattern;
*(2+(unsigned char *)s) = pattern;
return s;
case 4:
*(unsigned long *)s = pattern;
return s;
}

2.count > 4 :
#define COMMON(x) \
__asm__ __volatile__( \
"rep ; stosl" \
x \
: "=&c" (d0), "=&D" (d1) \
: "a" (pattern),"0" (count/4),"1" ((long) s) \
: "memory")
{
int d0, d1;
switch (count % 4)
{
case 0: COMMON(""); return s;
case 1: COMMON("\n\tstosb"); return s;
case 2: COMMON("\n\tstosw"); return s;
default: COMMON("\n\tstosw\n\tstosb"); return s;
}
}

#undef COMMON

a):注意这种在函数内部使用宏的方法：
1）先用#define定义宏；
2）再用一对{}括住函数体；
3）再在后面用#undef取消所定义的的宏；

b):#define COMMON(x) \
__asm__ __volatile__( \
"rep ; stosl" \
x \
: "=&c" (d0), "=&D" (d1) \
: "a" (pattern),"0" (count/4),"1" ((long) s) \
: "memory")

参数初始值：
ax/eax: pattern
cx/ecx: count/4
di/edi: s

指令重排：
COMMON("")展开为：
eax = pattern;
edi = s;
ecx = count/4;
rep ===> while( ecx-- != 0 )
{
stosl ===> es:[edi] = eax;
edi += 4;
}
return s;

COMMON("\n\tstosb")展开为：
eax = pattern;
edi = s;
ecx = count/4;
rep ===> while( ecx-- != 0 )
{
stosl ===> es:[edi] = eax;
edi += 4;
}
x ===> stosb ===> es:[edi] = al;
edi += 1;
return s;

COMMON("\n\tstosw")展开为：
eax = pattern;
edi = s;
ecx = count/4;
rep ===> while( ecx-- != 0 )
{
stosl ===> es:[edi] = eax;
edi += 4;
}
x ===> stosw ===> es:[edi] = ax;
edi += 2;
return s;

COMMON("\n\tstosw\n\tstosb")展开为：
eax = pattern;
edi = s;
ecx = count/4;
rep ===> while( ecx-- != 0 )
{
stosl ===> es:[edi] = eax;
edi += 4;
}
x => stosw;stosb=> es:[edi] = ax;
edi += 2;
es:[edi] = al;
edi += 1;

return s;

c): 进一步分析：
{
int d0, d1;
switch (count % 4)
{
case 0: COMMON(""); return s;
case 1: COMMON("\n\tstosb"); return s;
case 2: COMMON("\n\tstosw"); return s;
default: COMMON("\n\tstosw\n\tstosb"); return s;
}
}
对剩下的字节数进行移动！！！

---------------------------------------------------------------
__constant_c_x_memset()
---------------------------------------------------------------
include/asm-i386/string.h

#define __constant_c_x_memset(s, c, count) \
(__builtin_constant_p(count) ? \
__constant_c_and_count_memset((s),(c),(count)) : \
__constant_c_memset((s),(c),(count)))

功能：对s所指定的的字符串用c填充指定的个数count个字节。

参考资料：
1.__constant_c_and_count_memset():
static inline void * __constant_c_and_count_memset(void * s, unsigned long pattern, size_t count)
{
switch (count)
{
case 0:
return s;
case 1:
*(unsigned char *)s = pattern;
return s;
case 2:
*(unsigned short *)s = pattern;
return s;
case 3:
*(unsigned short *)s = pattern;
*(2+(unsigned char *)s) = pattern;
return s;
case 4:
*(unsigned long *)s = pattern;
return s;
}
#define COMMON(x) \
__asm__ __volatile__( \
"rep ; stosl" \
x \
: "=&c" (d0), "=&D" (d1) \
: "a" (pattern),"0" (count/4),"1" ((long) s) \
: "memory")
{
int d0, d1;
switch (count % 4)
{
case 0: COMMON(""); return s;
case 1: COMMON("\n\tstosb"); return s;
case 2: COMMON("\n\tstosw"); return s;
default: COMMON("\n\tstosw\n\tstosb"); return s;
}
}

#undef COMMON
}

2.__constant_c_memset():
static inline void * __constant_c_memset(void * s, unsigned long c, size_t count)
{
int d0, d1;
__asm__ __volatile__(
"rep ; stosl\n\t"
"testb $2,%b3\n\t"
"je 1f\n\t"
"stosw\n"
"1:\ttestb $1,%b3\n\t"
"je 2f\n\t"
"stosb\n"
"2:"
:"=&c" (d0), "=&D" (d1)
:"a" (c), "q" (count), "0" (count/4), "1" ((long) s)
:"memory");
return (s);
}
---------------------------------------------------------------
__memset()
---------------------------------------------------------------
include/asm-i386/string.h

#define __memset(s, c, count) \
(__builtin_constant_p(count) ? \
__constant_count_memset((s),(c),(count)) : \
__memset_generic((s),(c),(count)))

功能：将s所指定的内存区域用c字符填充count次数。

参考资料：
1.__constant_count_memset():
#define __constant_count_memset(s,c,count) __memset_generic((s),(c),(count))

2.__memset_generic():
static inline void * __memset_generic(void * s, char c,size_t count)
{
int d0, d1;
__asm__ __volatile__(
"rep\n\t"
"stosb"
: "=&c" (d0), "=&D" (d1)
:"a" (c),"1" (s),"0" (count)
:"memory");
return s;
}
---------------------------------------------------------------
__HAVE_ARCH_MEMSET memset()
---------------------------------------------------------------
include/asm-i386/string.h

#define __HAVE_ARCH_MEMSET
#define memset(s, c, count) \
(__builtin_constant_p(c) ? \
__constant_c_x_memset((s),(0x01010101UL*(unsigned char)(c)),(count)) : \
__memset((s),(c),(count)))

功能同上：

参考资料：
1.__constant_c_x_memset():
#define __constant_c_x_memset(s, c, count) \
(__builtin_constant_p(count) ? \
__constant_c_and_count_memset((s),(c),(count)) : \
__constant_c_memset((s),(c),(count)))

2.__memset()同上。

?????(0x01010101UL*(unsigned char)(c))是什么意思？？？
---------------------------------------------------------------
__HAVE_ARCH_MEMSCAN memscan()
---------------------------------------------------------------
include/asm-i386/string.h

/*
* find the first occurrence of byte 'c', or 1 past the area if none
*/
#define __HAVE_ARCH_MEMSCAN
static inline void * memscan(void * addr, int c, size_t size)
{
if (!size) return addr;
__asm__("repnz; scasb\n\t"
"jnz 1f\n\t"
"dec %%edi\n"
"1:"
: "=D" (addr), "=c" (size)
: "0" (addr), "1" (size), "a" (c)
: "memory");
return addr;
}

重排指令：
edi = addr;
ecx = size;
eax = c;
ZF = 0;
repnz ====> while( ecx-- != 0 && ZF == 0 )
{
scasb ====> if( (al - es:[edi++]) == 0 )
ZF = 1;
}
jnz 1 ====> if( ZF != 0 ) goto 1;
dec %%edi ====> edi--;
1: ====> 1:

此函数的汇编非常简单，就不再啰嗦了。
线性扫描内存，找到了第一个'c'，就返回找到的地址；没找到就返回所比较的最后一个位置。
#endif /* __KERNEL__ */

#endif /* !_I386_STRING_H_ */
***************************************************************
汇编写的字符串函数终于啃完了！！！

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航