您的位置:首页 > 其它

IPv6 stack中的一些小小的surprise---(1)

2015-04-13 11:18 344 查看
#

# 前2个星期,networking team和QA,遇到一些IPv6的问题,于是来问我(因为我是从kernel-level network stack提供支持)。

#

# 一些问题,有点超过原本的理解的意料之外。

#

# 所以,创建这个文档,在里面记录一下这些意料之外的问题。

#

===============================================================================================================

--- index:

---------------------------------------------------------------------------------------------------------------

=> sysctl "ipv6_devconf->max_addresses",only limit the number of autoconfigured global address, not manual-configured address

---------------------------------------------------------------------------------------------------------------

=> "autoconf route" and "autoconf address" are *independent* from each other, delete "autoconf address" does not invalidate "autoconf route"

---------------------------------------------------------------------------------------------------------------

=> delete / add link-local address, then would fail to start autoconf process, because due to timing, newly-added link-local address is still in DAD and in IFA_F_TENTATIVE state and not actually valid

---------------------------------------------------------------------------------------------------------------

=> autoconf prefix length *MUST* be 64, otherwise, "autoconf address" can not be configured sucessfullly with any non-64 RA prefix length

---------------------------------------------------------------------------------------------------------------

===============================================================================================================

@@ sysctl "ipv6_devconf->max_addresses",only limit the number of autoconfigured global address, not manual-configured address

---------------------------------------------------------------------------------------------------------------

从"max_addresses"这个名字直观上来理解,应该是限制某一interface上所配置的IPv6 address的数量。

但是,实际情况是:

其只是限制autoconfigured global address的数量。

没有对manual-configured address的数量进行限制。
# 从user-level来的,无论是手动配置或者是DHCPv6。

---------------------------------------------------------------------------------------------------------------

--- 见:

[bug] [IPv6]shutdown eth0 and then no shutdown eth0,mgt0 appear two global ipv6 address.html

--- 聊天记录:

[13:37:01] 同事w: 哦

[13:37:24] 同事w: 这个问题的原因我是知道的。是由于dhcpv6导致的

[13:38:02] 同事w: 我原来想的是为啥max_address 为2,还能配置成功

[13:39:04] 我: 噢,那是我刚刚想错了,后来发了消息后再查了下,的确刚刚没想通。如果有DHCPv6的影响在,那就补上了逻辑。

[13:39:12] 我: 说得通了。

[13:42:34] 同事w: 嗯。多出来的这个地址是dhcpv6分配的,不是autoconfig分配的,如果是autoconfig分配的话,后几位应该和linklocal的是一样的

[13:42:46] 同事w: 我只是觉得为啥max_address 为2,还能配置成功

[13:42:54] 同事w: 这个不知道合不合理

[13:45:11] 我: autoconf是内核自动配的,dhcpv6和manual,是upper-level通过IOCTL来配的。

[13:45:34] 我: 查了下代码,max_address只对autoconf有限制,对upper-level从IOCTL来配没有限制。

[13:46:03] 同事w: 哦。不知道这个合不合理

[13:48:35] 我: 单单从内核的角度来看,也可以算合理,也可以算不合理吧。但是我们之前定义只允许一个global address,是想依赖"max_address"给我们做检查和限制的,我们之前对max_address的理解有偏差,现在感觉它的检查和限制不是很够啊。

[13:49:01] 我: 你就觉得在你那边code里再加一下检查,好加吗?

[13:49:59] 同事w: 有点困难

[13:50:21] 同事w: 因为autoconfig的地址添加的时候我这边感知不到

[13:50:38] 同事w: 如果先添加了一个其他地址,然后在添加auto的话

[13:50:47] 同事w: 这边无法感知

[13:51:15] 同事w: max_address只对autoconf有限制,对upper-level从IOCTL来配没有限制。

[13:51:25] 同事w: 这个可以解决吗

[13:51:45] 我: 这个也到可以解决,要改下kernel。

[13:52:08] 我: 要不这样,既然要改kernel。

[13:52:33] 我: 那我看看能不能,一边在IOCTL加限制。

[13:52:54] 我: 一边也做到“感知autoconfig的地址”

[13:53:05] 我: “感知autoconfig的地址”这点对以后也比较方便。

[13:53:24] 同事w: 好的

---------------------------------------------------------------------------------------------------------------

--- code trace:

------------------------------------------------------

#

# autoconf trace

#

"icmpv6_protocol->handler()" = icmpv6_rcv() ->

case NDISC_ROUTER_ADVERTISEMENT: ndisc_rcv(skb); ->

case NDISC_ROUTER_ADVERTISEMENT: ndisc_router_discovery(skb);

addrconf_prefix_rcv(skb->dev, (u8 *)p, (p->nd_opt_len) << 3, ndopts.nd_opts_src_lladdr != NULL);

int max_addresses = in6_dev->cnf.max_addresses;

/* Do not allow to create too much of autoconfigured

* addresses; this would be too easy way to crash kernel.

*/

if (!max_addresses ||
# "max_addresses"限制了autoconfigured global address

ipv6_count_addresses(in6_dev) < max_addresses)
# 的数量。

ifp = ipv6_add_addr(in6_dev, &addr, NULL,

pinfo->prefix_len,

addr_type&IPV6_ADDR_SCOPE_MASK,

addr_flags, valid_lft,

prefered_lft);

------------------------------------------------------

#

# manually conf trace from user-level

#

inet6_ioctl() ->

case SIOCSIFADDR: return addrconf_add_ifaddr(net, (void __user *) arg);

err = inet6_addr_add(net, ireq.ifr6_ifindex, &ireq.ifr6_addr, NULL,

ireq.ifr6_prefixlen, IFA_F_PERMANENT,

INFINITY_LIFE_TIME, INFINITY_LIFE_TIME);

#

# manually conf没有做任何"max_addresses"的检查

#

ifp = ipv6_add_addr(idev, pfx, peer_pfx, plen, scope, ifa_flags,

valid_lft, prefered_lft);

---------------------------------------------------------------------------------------------------------------

===============================================================================================================

@@ "autoconf route" and "autoconf address" are *independent* from each other, delete "autoconf address" does not invalidate "autoconf route"

---------------------------------------------------------------------------------------------------------------

原本是知道:

在autoconfiguration的情况下,IPv6 $(addrconf)会:

根据router RA,来配置一个
ed66
"autoconf address"和配置一个"autoconf route"

QA做的试验和结果是:

手动删除"autoconf address",这条"autoconf route"仍然保留
# 期望是自动被kernel删除掉。

之后,仍然可以ping到router,和具有相同prefix的other hosts
# 因为"autoconf route"没有被删除。

经过研究code和动手实验,得出结论:

"autoconf route" and "autoconf address" are *independent* from each other, delete "autoconf address" does not invalidate "autoconf route"

2者的配置和删除,根本就是独立的。

---------------------------------------------------------------------------------------------------------------

--- 见:

[bug] [IPv6 route]When mgt0 get ipv6 address via auto mode,ipv6 route table can't delete ipv6 address entry auto.html

test log---1455.txt

---------------------------------------------------------------------------------------------------------------

--- code trace:

------------------------------------------------------

#

# receiving router RA

#

"icmpv6_protocol->handler()" = icmpv6_rcv() ->

case NDISC_ROUTER_ADVERTISEMENT: ndisc_rcv(skb); ->

case NDISC_ROUTER_ADVERTISEMENT: ndisc_router_discovery(skb);

addrconf_prefix_rcv(skb->dev, (u8 *)p, (p->nd_opt_len) << 3, ndopts.nd_opts_src_lladdr != NULL);

#

# *先*是配置"autoconf route"。

#

/*

* Two things going on here:

* 1) Add routes for on-link prefixes

* 2) Configure prefixes with the auto flag set

*/

addrconf_prefix_route(&pinfo->prefix, pinfo->prefix_len,

dev, expires, flags);

struct fib6_config cfg = {

.fc_table = RT6_TABLE_PREFIX,

.fc_metric = IP6_RT_PRIO_ADDRCONF,

.fc_ifindex = dev->ifindex,

.fc_expires = expires,

.fc_dst_len = plen,

.fc_flags = RTF_UP | flags,

.fc_nlinfo.nl_net = dev_net(dev),

.fc_protocol = RTPROT_KERNEL,

};

ip6_route_add(&cfg);

#

# *然后*是配置"autoconf address"。

#

/* Try to figure out our local address for this prefix */

if (pinfo->autoconf && in6_dev->cnf.autoconf) {

/* Do not allow to create too much of autoconfigured

* addresses; this would be too easy way to crash kernel.

*/

if (!max_addresses ||

ipv6_count_addresses(in6_dev) < max_addresses)

ifp = ipv6_add_addr(in6_dev, &addr, NULL,

pinfo->prefix_len,

addr_type&IPV6_ADDR_SCOPE_MASK,

addr_flags, valid_lft,

prefered_lft);

------------------------------------------------------

#

# manually delete a address ( assuming it is an "autoconf address" )

#

inet6_ioctl() ->

case SIOCDIFADDR: return addrconf_del_ifaddr(net, (void __user *) arg);

err = inet6_addr_del(net, ireq.ifr6_ifindex, &ireq.ifr6_addr,

ireq.ifr6_prefixlen);

#

# [*][*] 手动删除掉一个address,假设其是一个"autoconf address",但完全没有删除"autoconf route"的操作(__就算是通过ipv6_del_addr()中的notification或者rtm event)。

#

list_for_each_entry(ifp, &idev->addr_list, if_list) {

if (ifp->prefix_len == plen &&

ipv6_addr_equal(pfx, &ifp->addr)) {

in6_ifa_hold(ifp);

read_unlock_bh(&idev->lock);

ipv6_del_addr(ifp);

return 0;

}

}

---------------------------------------------------------------------------------------------------------------

--- 结论和最后的理解:

test using 2 linux PC, with kernel version 3.10, they also have this issue.

In fact, this is not a problem, but just linux kernel IPv6 stack implement so:

autoconfigured global address

autoconfigured route

they are configured separately and independently on reception of route RA. And they are *independent*.

autoconfigured route means:

local host get to know from route RA that, it neigbhor to other hosts with the prefix announced by RA, and can reach those neighbors hosts, simply by its link-local address.

therefore, even a autoconfigured global address is deleted by hand, linux kernel would NOT delete autoconfigured route automatically, because it assume local host could still access neighbours through this route, by its
link-local address.

Some may see this behavior is not apprioriate, some may see all right.

But Linux kernel IPv6 stack implement so.

---------------------------------------------------------------------------------------------------------------

===============================================================================================================

@@ delete / add link-local address, then would fail to start autoconf process, because due to timing, newly-added link-local address is still in DAD and in IFA_F_TENTATIVE state and not actually valid

---------------------------------------------------------------------------------------------------------------

实际上这不是official kernel本身的问题,而是我们自己的code的问题。

因为我们自己code提供了这样2个接口:

修改(先删除再添加)link-local address

强行重新kick autoconf process
# official kernel并没有提供这样的接口

所以,如果这2个操作连续地进行,那么:

后添加的link-local address,仍然是在DAD中,处于IFA_F_TENTATIVE state,被看做暂时不可用。
# DAD是asychronous的。

那么addrconf_kick_global_autoconf(),会fail在:
# 这个函数是我自己写的,是从official kernel的逻辑中抠出来的。

if ((ll_ifp = __ipv6_get_lladdr(dev, IFA_F_TENTATIVE)) == NULL) {
//no link-local address configured on this interface

ah_debug("no non-TENTATIVE link-local address yet ( might still in DAD )\n");

err = -EACCES;

goto out_unlock;

}

---------------------------------------------------------------------------------------------------------------

===============================================================================================================

@@ autoconf prefix length *MUST* be 64, otherwise, "autoconf address" can not be configured sucessfullly with any non-64 RA prefix length

---------------------------------------------------------------------------------------------------------------

--- 见聊天记录:

[2015/4/7 16:17:55] 同事g: ipv6还有个问题:当对端的设备端口ipv6地址前缀超过64位,mgt0就不能通过auto方式获取到地址了

[2015/4/7 16:17:55] 同事g: 这个问题是你这边的么

[2015/4/7 16:18:28] 我: let me check

[2015/4/7 16:18:34] 同事g: OK

[2015/4/7 16:19:28] 我: 内核IPv6协议栈,只支持64位

[2015/4/7 16:19:33] 我: 多了少了都不行。

[2015/4/7 16:20:36] 我: 你换个linux PC试试,应该也有这样的问题

[2015/4/7 16:22:01] 同事g: 这个不能和eui-64一样去添加后面的字节么?

[2015/4/7 16:22:48] 同事g: 我看看思科是怎么做的

[2015/4/7 16:36:01] 同事g: 思科也是一样的

[2015/4/7 16:36:23] 同事g: 当网络前缀不等于64也是协商不到ipv6地址

[2015/4/7 16:37:16] 我: 可能是RFC定义如此

[2015/4/7 16:37:28] 同事g: 是的

---------------------------------------------------------------------------------------------------------------

--- code trace:

#

# When receiving RA and configure autoconf address

#

"icmpv6_protocol->handler()" = icmpv6_rcv() ->

case NDISC_ROUTER_ADVERTISEMENT: ndisc_rcv(skb); ->

case NDISC_ROUTER_ADVERTISEMENT: ndisc_router_discovery(skb);

addrconf_prefix_rcv(skb->dev, (u8 *)p, (p->nd_opt_len) << 3, ndopts.nd_opts_src_lladdr != NULL);

/* Try to figure out our local address for this prefix */

if (pinfo->autoconf && in6_dev->cnf.autoconf) {

#

# 如果RA中prefix length为64,才会跳转到"ok" label,去配置"autoconf address",

#

# 否则,打印出"prefix with wrong length" message,就直接退出了。

#

if (pinfo->prefix_len == 64) {

goto ok;

}

net_dbg_ratelimited("IPv6 addrconf: prefix with wrong length %d\n",

pinfo->prefix_len);

in6_dev_put(in6_dev);

return;

ok:

if (!max_addresses ||

ipv6_count_addresses(in6_dev) < max_addresses)

ifp = ipv6_add_addr(in6_dev, &addr, NULL,

pinfo->prefix_len,

addr_type&IPV6_ADDR_SCOPE_MASK,

addr_flags, valid_lft,

prefered_lft);

---------------------------------------------------------------------------------------------------------------

--- conclusion:

简单看了下RFC,没有看到对autoconf prefix length必须是64的强制规定。
# 也许是我没找到。

鉴于cisco也是如此,应该64是一个工业界的惯例吧。

---------------------------------------------------------------------------------------------------------------

--- attention:

[*][*] 注意,因为"autoconf address"和"autoconf route"是独立的,那么,就算autoconf prefix length不是64,"autoconf address"不能配置成功,但是"autoconf route"还是能够配置成功的。

---------------------------------------------------------------------------------------------------------------

===============================================================================================================
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  network---ipv6