您的位置:首页 > 其它

xen块设备体系结构(6)

2011-09-12 22:36 363 查看

blktap 续

blktap_device

blktap_device的结构很简单:

struct blktap_device {

spinlock_t lock;

struct gendisk *gd;

};

其中struct gendisk结构是内核块设备结构block_device用到的通用disk结构

blktap_device_open

从内核的通用结构 block_device -> bd_disk 中得到硬盘数据结构 struct gendisk 。 从gendisk->private_data中得到 blktap_device

我们/dev/xen/blktap-2/tapdiskXXX 这些块设备用得就是 blktap_device 结构

blktap_device_release

通过传入的gendisk结构,得到blktap_device, block_device, blktap 等结构, 调用blktap_device_release之后,最后把blktap结构的 dev_inuse 位设置为BLKTAP_DEVICE_CLOSED, 调用 blktap_ring_kick_user ,wake_up blktap->ring设备里的 poll_wait 信号。

blktap_device_getgeo

返回struct hd_geometry,包含块设备head, cylinder, sector等信息

blktap_device_create

blktap 环设备 blktapXXX,当调用ioctl 并传入cmd为BLKTAP2_IOCTL_CREATE_DEVICE时,会调用 blktap_device_create 来创建tapdevXXX设备。

if (test_bit(BLKTAP_DEVICE, &tap->dev_inuse))

return -EEXIST;

if (blktap_device_validate_params(tap, params))

return -EINVAL;

gd = alloc_disk(1);

if (!gd) {

err = -ENOMEM;

goto fail;

}

if (minor < 26) {

sprintf(gd->disk_name, "td%c", 'a' + minor % 26);

} else if (minor < (26 + 1) * 26) {

sprintf(gd->disk_name, "td%c%c",

'a' + minor / 26 - 1,'a' + minor % 26);

} else {

const unsigned int m1 = (minor / 26 - 1) / 26 - 1;

const unsigned int m2 = (minor / 26 - 1) % 26;

const unsigned int m3 = minor % 26;

sprintf(gd->disk_name, "td%c%c%c",

'a' + m1, 'a' + m2, 'a' + m3);

}

gd->major = blktap_device_major;

gd->first_minor = minor;

gd->fops = &blktap_device_file_operations;

gd->private_data = tapdev;

spin_lock_init(&tapdev->lock);

rq = blk_init_queue(blktap_device_do_request, &tapdev->lock);

if (!rq) {

err = -ENOMEM;

goto fail;

}

elevator_init(rq, "noop");

gd->queue = rq;

rq->queuedata = tapdev;

tapdev->gd = gd;

blktap_device_configure(tap, params);

add_disk(gd);

if (params->name[0])

strncpy(tap->name, params->name, sizeof(tap->name)-1);

set_bit(BLKTAP_DEVICE, &tap->dev_inuse);

dev_info(disk_to_dev(gd), "sector-size: %u capacity: %llu\n",

queue_logical_block_size(rq),

(unsigned long long)get_capacity(gd));

return 0;

test_bit 检查 tap设备是否在使用,如果已被使用报错退出。blktap_device_validate_params 检查blktap_params参数。比如sector size 不能 < 512 or > 4096,disk的capacity是否超过最大值 等。调用alloc_disk 创建一个gendisk结构,然后初始化这个gendisk结构,如下:

gd->major = blktap_device_major;

gd->first_minor = minor;

gd->fops = &blktap_device_file_operations;

gd->private_data = tapdev;

调用 blk_init_queue 初始化,关于blk_init_queue有如下描述

* Description:

* If a block device wishes to use the standard request handling procedures,

* which sorts requests and coalesces adjacent requests, then it must

* call blk_init_queue(). The function @rfn will be called when there

* are requests on the queue that need to be processed.

调用 elevator_init 初始化 request_queue rq

调用 add_disk(gendisk *),把struct gendisk 在内核注册

调用 blktap_device_configure,对tapdevXXX设备进行配置,其中blktap_params 参数由copy_from_user从user space得到:

set_capacity: 设置gendisk 磁盘大小 = 传入的 capacity

blk_queue_logical_block_size: set logical block size = 传入的sector_size

blk_queue_max_sectors:max_sectors 最小为8, 最大为1024个sector。注意这里的sector大小是块驱动认为的固定大小 512 bytes

blk_queue_segment_boundary / blk_queue_max_segment_size : per segment 的 size是 4K

blk_queue_max_phys_segments / blk_queue_max_hw_segments : request_queue 每个 request 最多有11个segment,每个segment 4k,相当于8个sectors大小

blktap_device_destroy

blktapXXX设备执行ioctl, command为BLKTAP2_IOCTL_REMOVE_DEVICE时,执行blktap_device_destroy。

blktap_device_destroy会调用 blk_cleanup_queue,这是内核的通用函数

void blk_cleanup_queue(struct request_queue *q)

{

/*

* We know we have process context here, so we can be a little

* cautious and ensure that pending block actions on this device

* are done before moving on. Going into this function, we should

* not have processes doing IO to this device.

*/

blk_sync_queue(q);

mutex_lock(&q->sysfs_lock);

queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);

mutex_unlock(&q->sysfs_lock);

if (q->elevator)

elevator_exit(q->elevator);

blk_put_queue(q);

}

我们知道request_queue里的IO请求都是异步的,在关闭tapdevXXX 设备的时候,这些请求是需要进行清理的。这通过blk_sync_queue来实现。

/**

* blk_sync_queue - cancel any pending callbacks on a queue

* @q: the queue

*

* Description:

* The block layer may perform asynchronous callback activity

* on a queue, such as calling the unplug function after a timeout.

* A block device may call blk_sync_queue to ensure that any

* such activity is cancelled, thus allowing it to release resources

* that the callbacks might use. The caller must already have made sure

* that its ->make_request_fn will not re-add plugging prior to calling

* this function.

*

*/

void blk_sync_queue(struct request_queue *q)

{

del_timer_sync(&q->unplug_timer);

del_timer_sync(&q->timeout);

cancel_work_sync(&q->unplug_work);

}

blk_sync_queue应该对于没有返回的IO请求,取消之前的注册行为,相当于discard这些请求了。

blktap_device_fail_queue

该函数调用 __blktap_next_queued_rq 遍历 request_queue,对每个请求调用 __blktap_end_queued_rq(rq, -EIO)

我们回顾下blktapXXX设备提供了如下操作

static struct file_operations blktap_ring_file_operations = {

.owner = THIS_MODULE,

.open = blktap_ring_open,

.release = blktap_ring_release,

.ioctl = blktap_ring_ioctl,

.mmap = blktap_ring_mmap,

.poll = blktap_ring_poll,

};

blktap_ring_poll

blktap_ring_poll 会调用 blktap_device_run_queue,里面又是一个循环,对request_queue里的所有request, 调用 blktap_device_make_request 。

blktap_device_make_request 首先调用blktap_ring_make_request,生成 blktap_request 结构,然后调用 blktap_request_get_pages 为blktap_request 分配页框,最后调用 blktap_ring_submit_request

blktap_device_do_request 是 tapdevXXX 块设备初始化函数 blk_init_queue 传入的函数指针。这个指针具体做什么的请参考内核块设备。blktap_device_do_request 调用了blktap_ring_kick_user,用来 wake_up 一个 blktap_ring->poll_wait 结构。还记得之前的blktap_ring_poll函数么,该函数调用 poll_wait(filp, &ring->poll_wait, wait) 一直阻塞在 poll_wait
这个wait_queue list 上。所以可以认为 blktap_ring_kick_user 用来唤醒 blktap_ring_poll 函数,把request_queue里的request submit上去。

blktap_ring_submit_request 把请求放到IO环里,下一步应该是tapdisk2 来处理这些IO请求了
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: