您的位置:首页 > 运维架构 > Linux

Arm Linux Head.S 文件的分析(转载)

2010-10-30 19:50 411 查看
还有一篇讲到怎么编译zImage的,也挺牛的
http://blog.csdn.net/pottichu/archive/2009/06/11/4261150.aspx http://blog.csdn.net/arriod/archive/2008/08/21/2808861.aspx
这是
ARM-Linux
运行的第一个文件,这些代码是一个比较独立的代码包裹器。其作用就是解压
Linux
内核,并将
PC
指针跳到内核(
vmlinux
)的第一条指令。

Bootloader
中传入到
Linux
中的参数总共有三个,
Linux
中用到的是第二个和第三个。第二个参数是
architecture id
,第三个是
taglist
的地址。
Architecture id

arm
芯片在
Linux
中一定要唯一。
Taglist

bootload

Linux
传入的参数列表(详细的解释请参考《
booting arm linux.pdf
》)。

//
程序的入口点

.section ".start", #alloc, #execinstr

/*

* sort out different calling conventions

*/

.align

start:

.type
start,#function

.rept
8
//

重复
8
次下面的指令,也就是空出中断向量表的位置

mov
r0, r0
//

就是
nop
指令

.endr

b
1f

.word
0x016f2818
@ Magic numbers to help the loader

.word
start

@ absolute load/run zImage address

.word
_edata

@ zImage end address

1:
mov
r7, r1

@ save architecture ID

mov
r8, r2

@ save atags pointer

#ifndef __ARM_ARCH_2__

/*

* Booting from Angel - need to enter SVC mode and disable

* FIQs/IRQs (numeric definitions from angel arm.h source).

* We only do this if we were in user mode on entry.

*/

mrs
r2, cpsr
@ get current mode

tst
r2, #3

@ not user?

bne
not_angel

mov
r0, #0x17

@ angel_SWIreason_EnterSVC

swi
0x123456
@ angel_SWI_ARM

not_angel:

mrs
r2, cpsr
@ turn off interrupts to

orr
r2, r2, #0xc0

@ prevent angel from running

msr
cpsr_c, r2

#else

teqp
pc, #0x0c000003

@ turn off interrupts

#endif

一定要保证当前运行在
SVC
模式下,否则会跳到
swi
里面去(为什么?我不清楚,而且我没有处理过这个
swi
)。然后再关闭
irq

fiq


/*

* Note that some cache flushing and other stuff may

* be needed here - is there an Angel SWI call for this?

*/

/*

* some architecture specific code can be inserted

* by the linker here, but it should preserve r7, r8, and r9.

*/

读入地址表。因为我们的代码可以在任何地址执行,也就是位置无关代码(
PIC
),所以我们需要加上一个偏移量。下面有每一个列表项的具体意义。

GOT
表的初值是连接器指定的,当时程序并不知道代码在哪个地址执行。如果当前运行的地址已经和表上的地址不一样,还要修正
GOT
表。

.text

adr
r0, LC0

ldmia
r0, {r1, r2, r3, r4, r5, r6, ip, sp}

subs
r0, r0, r1

@ calculate the delta offset

@ if delta is zero, we are

beq
not_relocated
@ running at the address we

@ were linked at.

/*

* We're running at a different address.
We need to fix

* up various pointers:

*
r5 - zImage base address

*
r6 - GOT start

*
ip - GOT end

*/

add
r5, r5, r0

add
r6, r6, r0

add
ip, ip, r0

/*

* If we're running fully PIC === CONFIG_ZBOOT_ROM = n,

* we need to fix up pointers into the BSS region.

*
r2 - BSS start

*
r3 - BSS end

*
sp - stack pointer

*/

add
r2, r2, r0

add
r3, r3, r0

add
sp, sp, r0

修改
GOT
(全局偏移表)表。根据当前的运行地址,修正该表。

/*

* Relocate all entries in the GOT table.

*/

1:
ldr
r1, [r6, #0]

@ relocate entries in the GOT

add
r1, r1, r0

@ table.
This fixes up the

str
r1, [r6], #4

@ C references.

cmp
r6, ip

blo
1b


BSS
段,所有的
arm
程序都需要做这些的。

not_relocated:
mov
r0, #0

1:
str
r0, [r2], #4

@ clear bss

str
r0, [r2], #4

str
r0, [r2], #4

str
r0, [r2], #4

cmp
r2, r3

blo
1b

正如下面的注释所说,
C
环境我们已经设置好了。下面我们要打开
cache

mmu
。为什么要这样做呢?这只是一个解压程序呀?为了速度。那为什么要开
mmu
呢,而且只是做一个平板式的映射?还是为了速度。如果不开
mmu
的话,就只能打开
icache
。因为不开
mmu
的话就无法实现内存管理,而
io
区是决不能开
dcache
的。

/*

* The C runtime environment should now be setup

* sufficiently.
Turn the cache on, set up some

* pointers, and start decompressing.

*/

bl
cache_on

是不是要跟读进去呢?对于只是对流程感兴趣的人只是知道打开
cache
就行了。不过跟进去是很有乐趣的,这就是为什么虽然
Linux
如此庞大,但仍有人会孜孜不倦的研究它的每一行代码的原因吧。反过来说,对于
Linux
内核的整体把握更加重要,要不然就成盲人摸象了。还有,想做
ARM
高手的人可以读
Linux
下的每一个汇编文件,因为
Linux
内核用
ARM
的东西还是比较全的。

mov
r1, sp

@ malloc space above stack

add
r2, sp, #0x10000
@ 64k max

对下面这些地址的理解其实还是很麻烦,但有篇文档写得很清楚《
About TEXTADDR, ZTEXTADDR, PAGE_OFFSET etc...
》。下面程序的意义就是保证解压地址和当前程序的地址不重叠。上面分配了
64KB
的空间来做解压时的数据缓存。

/*

* Check to see if we will overwrite ourselves.

*
r4 = final kernel address
//

内核执行的最终实地址

*
r5 = start of this image
//

该程序的首地址

*
r2 = end of malloc space (and therefore this image)

* We basically want:

*
r4 >= r2 -> OK

*
r4 + image length <= r5 -> OK

*/

cmp
r4, r2

bhs
wont_overwrite

add
r0, r4, #4096*1024
@ 4MB largest kernel size

cmp
r0, r5

bls
wont_overwrite

如果空间不够了,只好解压到缓冲区地址后面。调用
decompress_kernel
进行解压缩,这段代码是用
c
实现的,和架构无关。

mov
r5, r2

@ decompress after malloc space

mov
r0, r5

mov
r3, r7

bl
decompress_kernel

完成了解压缩之后,由于空间不够,内核也没有解压到正确的地址,必须通过代码搬移来搬到指定的地址。搬运过程中有可能会覆盖掉现在运行的这段代码,所以必须将有可能会执行到的代码搬运到安全的地方,这里用的是解压缩了的代码的后面。

add
r0, r0, #127

bic
r0, r0, #127

@ align the kernel length

/*

* r0
= decompressed kernel length

* r1-r3
= unused

* r4
= kernel execution address

* r5
= decompressed kernel start

* r6
= processor ID

* r7
= architecture ID

* r8
= atags pointer

* r9-r14 = corrupted

*/

add
r1, r5, r0

@ end of decompressed kernel

adr
r2, reloc_start

ldr
r3, LC1

add
r3, r2, r3

1:
ldmia
r2!, {r9 - r14}
@ copy relocation code

stmia
r1!, {r9 - r14}

ldmia
r2!, {r9 - r14}

stmia
r1!, {r9 - r14}

cmp
r2, r3

blo
1b

bl
cache_clean_flush
//

因为有代码搬移,所以必须先清理(
clean
)清除(
flush

cache


add
pc, r5, r0

@ call relocation code

decompress_kernel
共有
4
个参数,解压的内核地址、缓存区首地址、缓存区尾地址、和芯片
ID
,返回解压缩代码的长度。

/*

* We're not in danger of overwriting ourselves.
Do this the simple way.

*

* r4
= kernel execution address

* r7
= architecture ID

*/

wont_overwrite:
mov
r0, r4

mov
r3, r7

bl
decompress_kernel

b
call_kernel

针对于不会出现代码覆盖的情况,就简单了。直接解压缩内核并且跳转到首地址运行。
call_kernel
这个函数我们会在下面分析它。

.type
LC0, #object

LC0:

.word
LC0

@ r1

.word
__bss_start
@ r2

.word
_end

@ r3

.word
zreladdr

@ r4

.word
_start

@ r5

.word
_got_start
@ r6

.word
_got_end

@ ip

.word
user_stack+4096

@ sp

LC1:

.word
reloc_end - reloc_start

.size
LC0, . - LC0

上面这个就是刚才我们说过的地址表,里面有几个符号的地址定义。
LC0
是在这里定义的。
Zreladdr
是在当前目录下的
Makfile
里定义的。其他的符号是在
lds
里定义的。

下面我们来分析一下有关
cache

mmu
的代码。通过这些代码我们可以看到
Linux
的高手们是如何通过汇编来实现各个
ARM
处理器的识别,以达到通用的目的。

/*

* Turn on the cache.
We need to setup some page tables so that we

* can have both the I and D caches on.

*

* We place the page tables 16k down from the kernel execution address,

* and we hope that nothing else is using it.
If we're using it, we

* will go pop!

*

* On entry,

*
r4 = kernel execution address

*
r6 = processor ID

*
r7 = architecture number

*
r8 = atags pointer

*
r9 = run-time address of "start"
(???)

* On exit,

*
r1, r2, r3, r9, r10, r12 corrupted

* This routine must preserve:

*
r4, r5, r6, r7, r8

*/

.align
5

cache_on:
mov
r3, #8

@ cache_on function

b
call_cache_fn

这里涉及到了很多
MMU

cache

writebuffer

TLB
的操作和协处理器的编程。具体编程的东西,我就不想多说了,可以对这
ARM
的手册逐行的理解。至于为什么要这样做,熟悉了他们的工作原理后也就不难理解了(《
ARM
嵌入式系统开发》这本书就有个比较好的说明)。因为这里包含了太多的代码搬运、解压等费时的操作,所以打开
cache
是有必要的。由于要用到数据
cache
所以需要对
mmu
进行配置。为了简单这里制作了一级映射,而且是物理地址和虚拟地址相同的
1:1
映射。

__setup_mmu:
sub
r3, r4, #16384

@ Page directory size

bic
r3, r3, #0xff

@ Align the pointer

bic
r3, r3, #0x3f00

/*

* Initialise the page tables, turning on the cacheable and bufferable

* bits for the RAM area only.

*/

mov
r0, r3

mov
r9, r0, lsr #18

mov
r9, r9, lsl #18

@ start of RAM

add
r10, r9, #0x10000000
@ a reasonable RAM size

mov
r1, #0x12

orr
r1, r1, #3 << 10

add
r2, r3, #16384

1:
cmp
r1, r9

@ if virt > start of RAM

orrhs
r1, r1, #0x0c

@ set cacheable, bufferable

cmp
r1, r10

@ if virt > end of RAM

bichs
r1, r1, #0x0c

@ clear cacheable, bufferable

str
r1, [r0], #4

@ 1:1 mapping

add
r1, r1, #1048576

teq
r0, r2

bne
1b

参考下面的注释,如果当前在
flash
中运行,我们再映射
2MB
。就算是当前在
RAM
中执行其实也没关系,只不过是做了重复工作。

/*

* If ever we are running from Flash, then we surely want the cache

* to be enabled also for our execution instance...
We map 2MB of it

* so there is no map overlap problem for up to 1 MB compressed kernel.

* If the execution is in RAM then we would only be duplicating the above.

*/

mov
r1, #0x1e

orr
r1, r1, #3 << 10

mov
r2, pc, lsr #20

orr
r1, r1, r2, lsl #20

add
r0, r3, r2, lsl #2

str
r1, [r0], #4

add
r1, r1, #1048576

str
r1, [r0]

mov
pc, lr

__armv4_cache_on:

mov
r12, lr

bl
__setup_mmu

mov
r0, #0

mcr
p15, 0, r0, c7, c10, 4
@ drain write buffer

mcr
p15, 0, r0, c8, c7, 0
@ flush I,D TLBs

mrc
p15, 0, r0, c1, c0, 0
@ read control reg

orr
r0, r0, #0x5000

@ I-cache enable, RR cache replacement

orr
r0, r0, #0x0030

bl
__common_cache_on

mov
r0, #0

mcr
p15, 0, r0, c8, c7, 0
@ flush I,D TLBs

mov
pc, r12

__common_cache_on:

#ifndef DEBUG

orr
r0, r0, #0x000d

@ Write buffer, mmu

#endif

mov
r1, #-1

mcr
p15, 0, r3, c2, c0, 0
@ load page table pointer

mcr
p15, 0, r1, c3, c0, 0
@ load domain access control

mcr
p15, 0, r0, c1, c0, 0
@ load control register

mov
pc, lr

/*

* All code following this line is relocatable.
It is relocated by

* the above code to the end of the decompressed kernel image and

* executed there.
During this time, we have no stacks.

*

* r0
= decompressed kernel length

* r1-r3
= unused

* r4
= kernel execution address

* r5
= decompressed kernel start

* r6
= processor ID

* r7
= architecture ID

* r8
= atags pointer

* r9-r14 = corrupted

*/

下面这段代码是在解压空间不够的情况下需要重新定位的,具体原因上面已经说明。

.align
5

reloc_start:
add
r9, r5, r0

debug_reloc_start

mov
r1, r4

1:

.rept
4

ldmia
r5!, {r0, r2, r3, r10 - r14}
@ relocate kernel

stmia
r1!, {r0, r2, r3, r10 - r14}

.endr

cmp
r5, r9

blo
1b

debug_reloc_end

这是最后一个函数了,这个时候一切实质性的工作已经做完。关闭
cache
,并跳转到真正的内核入口。

call_kernel:
bl
cache_clean_flush

bl
cache_off

mov
r0, #0

@ must be zero

mov
r1, r7

@ restore architecture number

mov
r2, r8

@ restore atags pointer

mov
pc, r4

@ call kernel

/*

* Here follow the relocatable cache support functions for the

* various processors.
This is a generic hook for locating an

* entry and jumping to an instruction at the specified offset

* from the start of the block.
Please note this is all position

* independent code.

*

*
r1
= corrupted

*
r2
= corrupted

*
r3
= block offset

*
r6
= corrupted

*
r12 = corrupted

*/

通过下面函数我们可以通过
proc_types
结构体数组我们可以顺利的找到现在的处理器型号,并且会根据
R3
的偏移量跳转到相应的函数中。里面涉及到协处理器
CP15

c0
的操作,如果有疑问,可以参考
ARM
相关手册。

call_cache_fn:
adr
r12, proc_types

mrc
p15, 0, r6, c0, c0
@ get processor ID

1:
ldr
r1, [r12, #0]

@ get value

ldr
r2, [r12, #4]

@ get mask

eor
r1, r1, r6

@ (real ^ match)

tst
r1, r2

@
& mask

addeq
pc, r12, r3
@ call cache function

add
r12, r12, #4*5

b
1b

/*

* Table for cache operations.
This is basically:

*
- CPU ID match

*
- CPU ID mask

*
- 'cache on' method instruction

*
- 'cache off' method instruction

*
- 'cache flush' method instruction

*

* We match an entry using: ((real_id ^ match) & mask) == 0

*

* Writethrough caches generally only need 'on' and 'off'

* methods.
Writeback caches _must_ have the flush method

* defined.

*/

.type
proc_types,#object

proc_types:

.word
0x41560600
@ ARM6/610

.word
0xffffffe0

b
__arm6_cache_off
@ works, but slow

b
__arm6_cache_off

mov
pc, lr

@
b
__arm6_cache_on

@ untested

@
b
__arm6_cache_off

@
b
__armv3_cache_flush

.word
0x00000000
@ old ARM ID

.word
0x0000f000

mov
pc, lr

mov
pc, lr

mov
pc, lr

.word
0x41007000
@ ARM7/710

.word
0xfff8fe00

b
__arm7_cache_off

b
__arm7_cache_off

mov
pc, lr

.word
0x41807200
@ ARM720T (writethrough)

.word
0xffffff00

b
__armv4_cache_on

b
__armv4_cache_off

mov
pc, lr

.word
0x00007000
@ ARM7 IDs

.word
0x0000f000

mov
pc, lr

mov
pc, lr

mov
pc, lr

@ Everything from here on will be the new ID system.

.word
0x4401a100
@ sa110 / sa1100

.word
0xffffffe0

b
__armv4_cache_on

b
__armv4_cache_off

b
__armv4_cache_flush

.word
0x6901b110
@ sa1110

.word
0xfffffff0

b
__armv4_cache_on

b
__armv4_cache_off

b
__armv4_cache_flush

@ These match on the architecture ID

.word
0x00020000
@ ARMv4T

.word
0x000f0000

b
__armv4_cache_on

b
__armv4_cache_off

b
__armv4_cache_flush

.word
0x00050000
@ ARMv5TE

.word
0x000f0000

b
__armv4_cache_on

b
__armv4_cache_off

b
__armv4_cache_flush

.word
0x00060000
@ ARMv5TEJ

.word
0x000f0000

b
__armv4_cache_on

b
__armv4_cache_off

b
__armv4_cache_flush

.word
0x00070000
@ ARMv6

.word
0x000f0000

b
__armv4_cache_on

b
__armv4_cache_off

b
__armv6_cache_flush

.word
0

@ unrecognised type

.word
0

mov
pc, lr

mov
pc, lr

mov
pc, lr

.size
proc_types, . - proc_types

/*

* Turn off the Cache and MMU.
ARMv3 does not support

* reading the control register, but ARMv4 does.

*

* On entry,
r6 = processor ID

* On exit,
r0, r1, r2, r3, r12 corrupted

* This routine must preserve: r4, r6, r7

*/

.align
5

cache_off:
mov
r3, #12

@ cache_off function

b
call_cache_fn

//
代码略

这里分配了
4K
的空间用来做堆栈。

reloc_end:

.align

.section ".stack", "w"

user_stack:
.space
4096
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: