您的位置：首页 > 其它

ceph关于multipart读取数据的总结

2016-11-08 13:54 197 查看

通过计算RGWObjManifest的obj_iterator中的各种偏移量来获取下一个multipart的location相关信息，目的是通过location的object字段来从ceph中按照长度读取需要的obj

下面是一个location的结构

    location = {

      orig_obj = "pydev.tar.gz.2~8fanG4JO3SshIxkLVlGfVZZG0IdxGLV.1",

      loc = "",

      object = "_multipart_pydev.tar.gz.2~8fanG4JO3SshIxkLVlGfVZZG0IdxGLV.1", -------------    通过这个字段来组装multipart名字

      instance = "",

      bucket = {

        tenant = "",

        name = "zhou",

        data_pool = "default.rgw.buckets.data",

        data_extra_pool = "default.rgw.buckets.non-ec",

        index_pool = "default.rgw.buckets.index",

        marker = "64fc737b-5e37-4154-8c29-da273b587feb.244109.1",

        bucket_id = "64fc737b-5e37-4154-8c29-da273b587feb.244109.1",

        oid = ""

      },

      ns = "multipart",

      in_extra_data = false,

      index_hash_source = ""

    }

通过以上的两个红色字段生成了mutipart的obj ID

64fc737b-5e37-4154-8c29-da273b587feb.244109.1[b]_multipart_pydev.tar.gz.2~8fanG4JO3SshIxkLVlGfVZZG0IdxGLV.1[/b]

该值和通过命令./rados -p default.rgw.buckets.data ls查询的名称格式是一样的。

multipart的大小是根据上传消息中获取，如果切片的大小>4M，radogw会按照ceph自身的4M一个单位来分片，

比如上传消息按照5M切片，那么radosgw会先分一个4M的mutipart，剩余的1M作为shadow。

每个切片分出的第一个4M作为mutipart，索引为0，以后该切片的拆分都作为shadow，索引从0开始累加。

顺序读取每个切片（multipart+若干shadow），如果切片中有shadow在读取shadow。

那么如何判断multipart是否存在shadow呢？

这里看下RGWObjManifest类的数据结构中包含的obj_iterator成员构成：

class obj_iterator {

    RGWObjManifest *manifest; ----------所属的manifest

    uint64_t part_ofs; /* where current part starts */                ----------- 当前part的开始的偏移量，[b]每个切片的累加值，用于计算不足4M的part。[/b]

    uint64_t stripe_ofs; /* where current stripe starts */          ------------当前stripe的开始的偏移量，该参数用于计算整体读取数量，读多少累加多少

    uint64_t ofs;       /* current position within the object */     ------------当前位置在object中的偏移量，[b]stripe_ofs的副本，当ofs == object.size()，停止读取。[/b]

    uint64_t stripe_size;      /* current part size */                    ------------当前part
4000
大小，用于计算读取的长度[b]，公式为stripe_size = MIN(rule->part_size - (stripe_ofs - part_ofs), rule->stripe_max_size)[/b]

    int cur_part_id;                                                                     ------------当前part ID，只涉及到[b]mutipart的索引，从1开始[/b]

    int cur_stripe;                                                                        ------------当前[b]切片索引从0开始。切换一个切片重置为0。[/b]

    string cur_override_prefix;                                                  -----------前缀用于组成location的object

    rgw_obj location;                                                                  -----------当前object，该结构中包括object名字，可以直接访问该object读取数据，可以通过get_location方法来获取

    map<uint64_t, RGWObjManifestRule>::iterator rule_iter; --------指向manifest的rule中begin

    map<uint64_t, RGWObjManifestRule>::iterator next_rule_iter; --------指向manifest的rule中end

    map<uint64_t, RGWObjManifestPart>::iterator explicit_iter; ----------详细游标，目前流程未涉及

......

}

在RGWObjManifest::obj_iterator iter = astate->manifest.obj_find(ofs)时，上述的参数就已经赋值如下：

(gdb) p stripe_size

$12 = 4194304

(gdb) p stripe_ofs

$13 = 0

(gdb) p part_ofs

$14 = 0

(gdb) p ofs

$15 = 0

(gdb) p cur_part_id

$16 = 1

(gdb) p stripe_size

$17 = 4194304

stripe_ofs默认每次累加4M，当前切片偏移量+4M >=
part_ofs + rule->part_size，表示该切片下存在小于等于4M的part（也就是存在__shadow__ 部分），这时候重置cur_stripe=0，part_ofs += rule->part_size（切片单位），
stripe_ofs = part_ofs。

开始为读取__shadow__ 部分做准备，当get_location方法被调用后，相应的location被返回，待读取的__shadow__部分名字存在于返回的location中。

stripe_size = MIN(rule->part_size - (stripe_ofs - part_ofs), rule->stripe_max_size) ------计算出了待读取的长度。

当到达最后一个mutipart时[b]stripe_ofs = next_rule_iter->second.start_ofs满足，那么[/b]

[b]rule_iter指向了最后一个next_rule_iter，[b]cur_part_id赋值了 rule_iter->second.start_part_num，[/b][/b]

      bool last_rule = (next_rule_iter == manifest->rules.end());

      /* move to the next rule? */

      if (!last_rule && stripe_ofs >= next_rule_iter->second.start_ofs) {

        rule_iter = next_rule_iter;

        last_rule = (next_rule_iter == manifest->rules.end());

        if (!last_rule) {

          ++next_rule_iter;

        }

        cur_part_id = rule_iter->second.start_part_num;

      } else {

        cur_part_id++;

      }

以上cur_part_id从1到9， cur_stripe则表示每个[b]cur_part_id下的shadow序号从1开始，切换cur_part_id则重置为0，[/b]

RGWObjManifest类中update_location（）调用了如下代码：

if (cur_stripe == 0) {----------------------------------当cur_stripe=0时，mannifest的location中object的组合以.cur_part_id结尾

      snprintf(buf, sizeof(buf), ".%d", (int)cur_part_id);

      oid += buf;

      ns= RGW_OBJ_NS_MULTIPART;

    } else {-----------------------------------------------当cur_stripe=0时，mannifest的location中object的组合以.cur_part_id_cur_stripe结尾

      snprintf(buf, sizeof(buf), ".%d_%d", (int)cur_part_id, (int)cur_stripe);

      oid += buf;

      ns = shadow_ns;

}

正好反映了上述rados -p default.rgw.buckets.data ls的查询结果。

读取代码功能在如下函数中，下面进行详细分析：

int RGWRados::iterate_obj(RGWObjectCtx& obj_ctx, rgw_obj& obj,

                          off_t ofs, off_t end,

                          uint64_t max_chunk_size,

                          int (*iterate_obj_cb)(rgw_obj&, off_t, off_t, off_t, bool, RGWObjState *, void *),

                          void *arg)

{

--------

if (astate->has_manifest) {

    /* now get the relevant object stripe */

    RGWObjManifest::obj_iterator iter = astate->manifest.obj_find(ofs);

    RGWObjManifest::obj_iterator obj_end = astate->manifest.obj_end();

    for (; iter != obj_end && ofs <= end; ++iter) {

      off_t stripe_ofs = iter.get_stripe_ofs();   ---------------------------
当前的切片偏移量，初值为0

      off_t next_stripe_ofs = stripe_ofs + iter.get_stripe_size(); -----下一个切片的偏移量，为退出while循环使用

      while (ofs < next_stripe_ofs && ofs <= end) {

        read_obj = iter.get_location();           --------------------- 返回location，其中包括的待读取的object名字。

        uint64_t read_len = min(len, iter.get_stripe_size() - (ofs - stripe_ofs));
------------- 计算出了要读取的长度。

        read_ofs = iter.location_ofs() + (ofs - stripe_ofs); ------------- 计算出了要读取的起始偏移量。

        if (read_len > max_chunk_size) {

          read_len = max_chunk_size;

        }

        reading_from_head = (read_obj == obj);

        r = iterate_obj_cb(read_obj, ofs, read_ofs, read_len, reading_from_head, astate, arg); ---- 开始读取数据。

        if (r < 0) {

          return r;

        }

        len -= read_len;

        ofs += read_len; ---- 累加读取的长度

      }

    }

}

---------

}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航