网络拓扑

1
2
3
4
5
6
7
8
9
10
                         br0
+
vrf-h1 | vrf-h2
+ +---+----+ +
| | | |
192.0.2.1/24 + + + + 192.0.2.2/24
swp1 swp2 swp3 swp4
+ + + +
| | | |
+--veth--+ +--veth--+

可以参考此处搭建测试环境

测试MAC地址学习

执行ping -I swp1 swp4前,vrf-h1内的swp1并不知道swp4的mac地址是多少,swp1会发送一个arp广播,br0通过这个arp req习得swp1的mac地址在port swp2下,并把这个广播转发到port swp3下,swp3和swp4相连,swp4收到arp请求,回复一个arp reply,br0通过这个arp reply习得swp4的mac地址在port swp3下,这个arp reply的目的mac是swp1的,br0查找fdb表,匹配到了这个mac需要发送给port swp2,最后,swp1收到了arp reply,至此,可以发送ICMP消息了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
root@raspi:~# ip neigh flush dev swp1             
root@raspi:~# ip vrf exec vrf-h1 ping 192.0.2.2 -c 1
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.413 ms

--- 192.0.2.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.413/0.413/0.413/0.000 ms
root@raspi:~#

root@raspi:~# tcpdump -nei br0
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
10:25:18.480791 92:82:b2:63:db:c1 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.2 tell 192.0.2.1, length 28
10:25:18.480970 4a:c3:b7:bc:e3:2b > 92:82:b2:63:db:c1, ethertype ARP (0x0806), length 42: Reply 192.0.2.2 is-at 4a:c3:b7:bc:e3:2b, length 28
10:25:18.481010 92:82:b2:63:db:c1 > 4a:c3:b7:bc:e3:2b, ethertype IPv4 (0x0800), length 98: 192.0.2.1 > 192.0.2.2: ICMP echo request, id 8, seq 1, length 64
10:25:18.481082 4a:c3:b7:bc:e3:2b > 92:82:b2:63:db:c1, ethertype IPv4 (0x0800), length 98: 192.0.2.2 > 192.0.2.1: ICMP echo reply, id 8, seq 1, length 64
^C
4 packets captured
4 packets received by filter
0 packets dropped by kernel
root@raspi:~#

我们将在ping前后,观察br0的fdb表的变化。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
root@raspi:~# ip neigh flush dev swp1 
root@raspi:~# ip -j addr show dev swp1 | jq -r '.[].address'
92:82:b2:63:db:c1
root@raspi:~# ip -j addr show dev swp4 | jq -r '.[].address'
4a:c3:b7:bc:e3:2b
root@raspi:~#
# br0下没有swp1和swp4的mac表项
root@raspi:~# bridge -j fdb show br br0 brport swp2 | jq '.[0] | select(.mac == "92:82:b2:63:db:c1")'
root@raspi:~# bridge -j fdb show br br0 brport swp3 | jq '.[0] | select(.mac == "4a:c3:b7:bc:e3:2b")'
root@raspi:~#
root@raspi:~#
# ping测试
root@raspi:~# ip vrf exec vrf-h1 ping 192.0.2.2 -c 1
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.413 ms

--- 192.0.2.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.413/0.413/0.413/0.000 ms
root@raspi:~#
# fdb新增两个表项
root@raspi:~# bridge -j fdb show br br0 brport swp2 | jq '.[0] | select(.mac == "92:82:b2:63:db:c1")'
{
"mac": "92:82:b2:63:db:c1",
"vlan": 1,
"flags": [],
"master": "br0",
"state": ""
}
root@raspi:~# bridge -j fdb show br br0 brport swp3 | jq '.[0] | select(.mac == "4a:c3:b7:bc:e3:2b")'
{
"mac": "4a:c3:b7:bc:e3:2b",
"vlan": 1,
"flags": [],
"master": "br0",
"state": ""
}
root@raspi:~#
# 一段时间后,表项老化
root@raspi:~# bridge -j fdb show br br0 brport swp2 | jq '.[0] | select(.mac == "92:82:b2:63:db:c1")'
root@raspi:~# bridge -j fdb show br br0 brport swp3 | jq '.[0] | select(.mac == "4a:c3:b7:bc:e3:2b")'
root@raspi:~#

深入理解

上一节我们通过ping触发了地址学习,本节使用mausezahn模拟发包,更详细的观察一下mac地址学习过程。

禁用地址学习

我们把swp2的地址学习功能禁用,观察一下现象。

1
root@raspi:~# bridge link set dev swp2 learning off

mz从swp1发送一个源MAC为de:ad:be:ef:13:37,目的MAC为FF:FF:FF:FF:FF:FF的数据包

1
2
3
4
5
6
7
root@raspi:~# mausezahn swp1 -c 1 -p 64 -a de:ad:be:ef:13:37 -t ip
Mausezahn will send 1 frames... 0.00 seconds (2762 packets per second)
root@raspi:~#
root@raspi:~# tcpdump -nei swp2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on swp2, link-type EN10MB (Ethernet), snapshot length 262144 bytes
11:56:51.601352 de:ad:be:ef:13:37 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 98: 192.0.2.1 > 255.255.255.255: ip-proto-0 64

观察br0下brport swp2中没有学习到相关的MAC表项。

1
2
3
# 没学到
root@raspi:~# bridge -j fdb show br br0 brport swp2 | jq -e ".[] | select(.mac == \"de:ad:be:ef:13:37\")"
root@raspi:~#

代码位置

1
2
3
4
5
int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
if (p->flags & BR_LEARNING)
br_fdb_update(br, p, eth_hdr(skb)->h_source, vid, 0);
}

启用地址学习

启用mac地址学习

1
root@raspi:~# bridge link set dev swp2 learning on 

mz从swp1发送一个源MAC为de:ad:be:ef:13:37,目的MAC为FF:FF:FF:FF:FF:FF的数据包

1
2
root@raspi:~# mausezahn swp1 -c 1 -p 64 -a de:ad:be:ef:13:37 -t ip
Mausezahn will send 1 frames... 0.02 seconds (42 packets per second)

使用tpoint查看新增的fdb表

1
2
3
root@raspi:/home/zrf/git/perf-tools/bin# ./tpoint bridge:br_fdb_update
Tracing bridge:br_fdb_update. Ctrl-C to end.
<...>-2411 [001] .Ns2. 11324.815010: br_fdb_update: br_dev br0 source swp2 addr de:ad:be:ef:13:37 vid 1 flags 0x0

同时在brport swp2上查看此表项

1
2
3
4
5
6
7
8
9
root@raspi:~# bridge -j fdb show br br0 brport swp2 | jq '.[] | select(.mac == "de:ad:be:ef:13:37")'
{
"mac": "de:ad:be:ef:13:37",
"vlan": 1,
"flags": [],
"master": "br0",
"state": ""
}
root@raspi:~#

根据fdb表项进行端口转发

上一个实验我们知道,由于存在mac地址学习,swp1发出的包让br0学习到了MACde:ad:be:ef:13:37在brport swp2上,本节的实验,我们会让swp4发送目的MAC为de:ad:be:ef:13:37的数据包,观察br0什么情况下会进行转发。

在swp2上关闭未知单播洪范,确保只有在FDB中存在匹配条目时,数据包才会被转发到该端口:

1
root@raspi:~# bridge link set dev swp2 flood off

我们在swp2和swp3上抓包,观察swp4发送数据包时,br0有没有把这个数据包从swp3转发到swp2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# swp4发送单播
root@raspi:~# mausezahn swp4 -c 1 -p 64 -b de:ad:be:ef:13:37 -t ip
Mausezahn will send 1 frames... 0.02 seconds (43 packets per second)
root@raspi:~#

# swp3收到72:1b:95:be:e5:9f > de:ad:be:ef:13:37的数据包
root@raspi:/home/zrf/git/perf-tools/bin# tcpdump -nei swp3
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on swp3, link-type EN10MB (Ethernet), snapshot length 262144 bytes
14:45:33.505397 72:1b:95:be:e5:9f > de:ad:be:ef:13:37, ethertype IPv4 (0x0800), length 98: 192.0.2.2 > 255.255.255.255: ip-proto-0 64

# br0没有转发
root@raspi:~# tcpdump -nei swp2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on swp2, link-type EN10MB (Ethernet), snapshot length 262144 bytes

代码位置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
bool mcast_hit = false;

switch (pkt_type) {
case BR_PKT_UNICAST:
dst = br_fdb_find_rcu(br, eth_hdr(skb)->h_dest, vid);
break;
}

if (dst) {

} else {
if (!mcast_hit) {
list_for_each_entry_rcu(p, &br->port_list, list) {

switch (pkt_type) {
case BR_PKT_UNICAST:
/* 由于我们关闭了单播flood,且没有找到dst,这个brport被跳过了 */
if (!(p->flags & BR_FLOOD))
continue;
}
prev = maybe_deliver(prev, p, skb, local_orig);
}

}
else
br_multicast_flood(mdst, skb, brmctx, local_rcv, false);
}
}

这是由于我们关闭了未知单播转发,br0在fdb表中,没有查到目的mac为de:ad:be:ef:13:37的表项,于是不进行转发。

在此之前,如果我们通过swp1发送一个源MAC为de:ad:be:ef:13:37的数据,br0就会学习到这个MAC地址应该被转发到swp2,我们在swp2就能抓到包了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# 让br0学习mac转发规则
root@raspi:~# mausezahn swp1 -c 1 -p 64 -a de:ad:be:ef:13:37 -t ip
Mausezahn will send 1 frames... 0.02 seconds (50 packets per second)
root@raspi:~#
# swp4发送数据包
root@raspi:~# mausezahn swp4 -c 1 -p 64 -b de:ad:be:ef:13:37 -t ip
Mausezahn will send 1 frames... 0.02 seconds (44 packets per second)
root@raspi:~#
# br0仍然存在mac转发规则
root@raspi:~# bridge -j fdb show br br0 brport swp2 | jq '.[] | select(.mac == "de:ad:be:ef:13:37")'
{
"mac": "de:ad:be:ef:13:37",
"vlan": 1,
"flags": [],
"master": "br0",
"state": ""
}

# 抓包可以看到,br0有把单播转发到swp2
root@raspi:~# tcpdump -nei swp2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on swp2, link-type EN10MB (Ethernet), snapshot length 262144 bytes
15:34:26.817210 de:ad:be:ef:13:37 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 98: 192.0.2.1 > 255.255.255.255: ip-proto-0 64
15:34:30.559814 72:1b:95:be:e5:9f > de:ad:be:ef:13:37, ethertype IPv4 (0x0800), length 98: 192.0.2.2 > 255.255.255.255: ip-proto-0 64

另一个证明方式
swp1进入混杂模式,基于目标MAC地址添加一个过滤器规则,丢弃发往de:ad:be:ef:13:37的数据包。

1
2
3
4
root@raspi:~# ip link set swp1 promisc on
root@raspi:~# tc qdisc add dev swp1 ingress
# 需要内核编译开启CONFIG_NET_CLS_FLOWER
root@raspi:~# tc filter add dev swp1 ingress protocol ip pref 1 handle 101 flower dst_mac de:ad:be:ef:13:37 action drop

swp4发送一个目标MAC地址为de:ad:be:ef:13:37的数据包:

1
2
3
4
5
6
7
root@raspi:~# mausezahn swp4 -c 1 -p 64 -b de:ad:be:ef:13:37 -t ip
Mausezahn will send 1 frames... 0.00 seconds (3745 packets per second)

root@raspi:~# tcpdump -nei swp3
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on swp3, link-type EN10MB (Ethernet), snapshot length 262144 bytes
11:37:45.825442 72:1b:95:be:e5:9f > de:ad:be:ef:13:37, ethertype IPv4 (0x0800), length 98: 192.0.2.2 > 255.255.255.255: ip-proto-0 64

swp1没有收到swp4发送的数据包,这是因为swp2关闭了未知单播洪范(此时有9个包被丢弃):

1
2
3
4
5
6
7
8
9
10
11
12
13
root@raspi:~# tc -s filter show dev swp1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x65
dst_mac de:ad:be:ef:13:37
eth_type ipv4
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 15352 sec used 224 sec firstused 14312 sec
Action statistics:
Sent 756 bytes 9 pkt (dropped 9, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

让swp1发送源MAC为de:ad:be:ef:13:37的数据包,br0会学习到该mac的转发规则,在表项未老化之前,swp4发送目的mac为de:ad:be:ef:13:37的数据包,br0会进行转发,我们在swp1上可以查看到因为tc规则被丢弃的一个数据包。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
root@raspi:~# mausezahn swp1 -c 1 -p 64 -a de:ad:be:ef:13:37 -t ip  
Mausezahn will send 1 frames... 0.02 seconds (44 packets per second)
root@raspi:~# mausezahn swp4 -c 1 -p 64 -b de:ad:be:ef:13:37 -t ip
Mausezahn will send 1 frames... 0.02 seconds (44 packets per second)
# 被丢弃了10个包,说明又有一个新的数据包被swp1收到并匹配到了规则
root@raspi:~# tc -s filter show dev swp1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x65
dst_mac de:ad:be:ef:13:37
eth_type ipv4
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 15558 sec used 9 sec firstused 14518 sec
Action statistics:
Sent 840 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
backlog 0b 0p requeues 0