网络拓扑 1 2 3 4 5 6 7 8 9 10 br0 + vrf-h1 | vrf-h2 + +---+----+ + | | | | 192.0.2.1/24 + + + + 192.0.2.2/24 swp1 swp2 swp3 swp4 + + + + | | | | +--veth--+ +--veth--+
可以参考此处搭建测试环境
测试MAC地址学习 执行ping -I swp1 swp4
前,vrf-h1内的swp1并不知道swp4的mac地址是多少,swp1会发送一个arp广播,br0通过这个arp req习得swp1的mac地址在port swp2下,并把这个广播转发到port swp3下,swp3和swp4相连,swp4收到arp请求,回复一个arp reply,br0通过这个arp reply习得swp4的mac地址在port swp3下,这个arp reply的目的mac是swp1的,br0查找fdb表,匹配到了这个mac需要发送给port swp2,最后,swp1收到了arp reply,至此,可以发送ICMP消息了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 root@raspi:~# ip neigh flush dev swp1 root@raspi:~# ip vrf exec vrf-h1 ping 192.0.2.2 -c 1 PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data. 64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.413 ms --- 192.0.2.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.413/0.413/0.413/0.000 ms root@raspi:~# root@raspi:~# tcpdump -nei br0 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes 10:25:18.480791 92:82:b2:63:db:c1 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.0.2.2 tell 192.0.2.1, length 28 10:25:18.480970 4a:c3:b7:bc:e3:2b > 92:82:b2:63:db:c1, ethertype ARP (0x0806), length 42: Reply 192.0.2.2 is-at 4a:c3:b7:bc:e3:2b, length 28 10:25:18.481010 92:82:b2:63:db:c1 > 4a:c3:b7:bc:e3:2b, ethertype IPv4 (0x0800), length 98: 192.0.2.1 > 192.0.2.2: ICMP echo request, id 8, seq 1, length 64 10:25:18.481082 4a:c3:b7:bc:e3:2b > 92:82:b2:63:db:c1, ethertype IPv4 (0x0800), length 98: 192.0.2.2 > 192.0.2.1: ICMP echo reply, id 8, seq 1, length 64 ^C 4 packets captured 4 packets received by filter 0 packets dropped by kernel root@raspi:~#
我们将在ping前后,观察br0的fdb表的变化。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 root@raspi:~# ip neigh flush dev swp1 root@raspi:~# ip -j addr show dev swp1 | jq -r '.[].address' 92:82:b2:63:db:c1 root@raspi:~# ip -j addr show dev swp4 | jq -r '.[].address' 4a:c3:b7:bc:e3:2b root@raspi:~# # br0下没有swp1和swp4的mac表项 root@raspi:~# bridge -j fdb show br br0 brport swp2 | jq '.[0] | select(.mac == "92:82:b2:63:db:c1")' root@raspi:~# bridge -j fdb show br br0 brport swp3 | jq '.[0] | select(.mac == "4a:c3:b7:bc:e3:2b")' root@raspi:~# root@raspi:~# # ping测试 root@raspi:~# ip vrf exec vrf-h1 ping 192.0.2.2 -c 1 PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data. 64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.413 ms --- 192.0.2.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.413/0.413/0.413/0.000 ms root@raspi:~# # fdb新增两个表项 root@raspi:~# bridge -j fdb show br br0 brport swp2 | jq '.[0] | select(.mac == "92:82:b2:63:db:c1")' { "mac": "92:82:b2:63:db:c1", "vlan": 1, "flags": [], "master": "br0", "state": "" } root@raspi:~# bridge -j fdb show br br0 brport swp3 | jq '.[0] | select(.mac == "4a:c3:b7:bc:e3:2b")' { "mac": "4a:c3:b7:bc:e3:2b", "vlan": 1, "flags": [], "master": "br0", "state": "" } root@raspi:~# # 一段时间后,表项老化 root@raspi:~# bridge -j fdb show br br0 brport swp2 | jq '.[0] | select(.mac == "92:82:b2:63:db:c1")' root@raspi:~# bridge -j fdb show br br0 brport swp3 | jq '.[0] | select(.mac == "4a:c3:b7:bc:e3:2b")' root@raspi:~#
深入理解 上一节我们通过ping触发了地址学习,本节使用mausezahn
模拟发包,更详细的观察一下mac地址学习过程。
禁用地址学习 我们把swp2的地址学习功能禁用,观察一下现象。
1 root@raspi:~# bridge link set dev swp2 learning off
mz从swp1发送一个源MAC为de:ad:be:ef:13:37
,目的MAC为FF:FF:FF:FF:FF:FF
的数据包
1 2 3 4 5 6 7 root@raspi:~# mausezahn swp1 -c 1 -p 64 -a de:ad:be:ef:13:37 -t ip Mausezahn will send 1 frames... 0.00 seconds (2762 packets per second) root@raspi:~# root@raspi:~# tcpdump -nei swp2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on swp2, link-type EN10MB (Ethernet), snapshot length 262144 bytes 11:56:51.601352 de:ad:be:ef:13:37 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 98: 192.0.2.1 > 255.255.255.255: ip-proto-0 64
观察br0下brport swp2中没有学习到相关的MAC表项。
1 2 3 # 没学到 root@raspi:~# bridge -j fdb show br br0 brport swp2 | jq -e ".[] | select(.mac == \"de:ad:be:ef:13:37\")" root@raspi:~#
代码位置
1 2 3 4 5 int br_handle_frame_finish (struct net *net, struct sock *sk, struct sk_buff *skb) { if (p->flags & BR_LEARNING) br_fdb_update(br, p, eth_hdr(skb)->h_source, vid, 0 ); }
启用地址学习 启用mac地址学习
1 root@raspi:~# bridge link set dev swp2 learning on
mz从swp1发送一个源MAC为de:ad:be:ef:13:37
,目的MAC为FF:FF:FF:FF:FF:FF
的数据包
1 2 root@raspi:~# mausezahn swp1 -c 1 -p 64 -a de:ad:be:ef:13:37 -t ip Mausezahn will send 1 frames... 0.02 seconds (42 packets per second)
使用tpoint查看新增的fdb表
1 2 3 root@raspi:/home/zrf/git/perf-tools/bin# ./tpoint bridge:br_fdb_update Tracing bridge:br_fdb_update. Ctrl-C to end. <...>-2411 [001] .Ns2. 11324.815010: br_fdb_update: br_dev br0 source swp2 addr de:ad:be:ef:13:37 vid 1 flags 0x0
同时在brport swp2上查看此表项
1 2 3 4 5 6 7 8 9 root@raspi:~# bridge -j fdb show br br0 brport swp2 | jq '.[] | select(.mac == "de:ad:be:ef:13:37")' { "mac": "de:ad:be:ef:13:37", "vlan": 1, "flags": [], "master": "br0", "state": "" } root@raspi:~#
根据fdb表项进行端口转发 上一个实验我们知道,由于存在mac地址学习,swp1发出的包让br0学习到了MACde:ad:be:ef:13:37
在brport swp2上,本节的实验,我们会让swp4发送目的MAC为de:ad:be:ef:13:37
的数据包,观察br0什么情况下会进行转发。
在swp2上关闭未知单播洪范,确保只有在FDB中存在匹配条目时,数据包才会被转发到该端口:
1 root@raspi:~# bridge link set dev swp2 flood off
我们在swp2和swp3上抓包,观察swp4发送数据包时,br0有没有把这个数据包从swp3转发到swp2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # swp4发送单播 root@raspi:~# mausezahn swp4 -c 1 -p 64 -b de:ad:be:ef:13:37 -t ip Mausezahn will send 1 frames... 0.02 seconds (43 packets per second) root@raspi:~# # swp3收到72:1b:95:be:e5:9f > de:ad:be:ef:13:37的数据包 root@raspi:/home/zrf/git/perf-tools/bin# tcpdump -nei swp3 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on swp3, link-type EN10MB (Ethernet), snapshot length 262144 bytes 14:45:33.505397 72:1b:95:be:e5:9f > de:ad:be:ef:13:37, ethertype IPv4 (0x0800), length 98: 192.0.2.2 > 255.255.255.255: ip-proto-0 64 # br0没有转发 root@raspi:~# tcpdump -nei swp2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on swp2, link-type EN10MB (Ethernet), snapshot length 262144 bytes
代码位置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 int br_handle_frame_finish (struct net *net, struct sock *sk, struct sk_buff *skb) { bool mcast_hit = false ; switch (pkt_type) { case BR_PKT_UNICAST: dst = br_fdb_find_rcu(br, eth_hdr(skb)->h_dest, vid); break ; } if (dst) { } else { if (!mcast_hit) { list_for_each_entry_rcu(p, &br->port_list, list ) { switch (pkt_type) { case BR_PKT_UNICAST: if (!(p->flags & BR_FLOOD)) continue ; } prev = maybe_deliver(prev, p, skb, local_orig); } } else br_multicast_flood(mdst, skb, brmctx, local_rcv, false ); } }
这是由于我们关闭了未知单播转发,br0在fdb表中,没有查到目的mac为de:ad:be:ef:13:37
的表项,于是不进行转发。
在此之前,如果我们通过swp1发送一个源MAC为de:ad:be:ef:13:37
的数据,br0就会学习到这个MAC地址应该被转发到swp2,我们在swp2就能抓到包了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # 让br0学习mac转发规则 root@raspi:~# mausezahn swp1 -c 1 -p 64 -a de:ad:be:ef:13:37 -t ip Mausezahn will send 1 frames... 0.02 seconds (50 packets per second) root@raspi:~# # swp4发送数据包 root@raspi:~# mausezahn swp4 -c 1 -p 64 -b de:ad:be:ef:13:37 -t ip Mausezahn will send 1 frames... 0.02 seconds (44 packets per second) root@raspi:~# # br0仍然存在mac转发规则 root@raspi:~# bridge -j fdb show br br0 brport swp2 | jq '.[] | select(.mac == "de:ad:be:ef:13:37")' { "mac": "de:ad:be:ef:13:37", "vlan": 1, "flags": [], "master": "br0", "state": "" } # 抓包可以看到,br0有把单播转发到swp2 root@raspi:~# tcpdump -nei swp2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on swp2, link-type EN10MB (Ethernet), snapshot length 262144 bytes 15:34:26.817210 de:ad:be:ef:13:37 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 98: 192.0.2.1 > 255.255.255.255: ip-proto-0 64 15:34:30.559814 72:1b:95:be:e5:9f > de:ad:be:ef:13:37, ethertype IPv4 (0x0800), length 98: 192.0.2.2 > 255.255.255.255: ip-proto-0 64
另一个证明方式 swp1进入混杂模式,基于目标MAC地址添加一个过滤器规则,丢弃发往de:ad:be:ef:13:37
的数据包。
1 2 3 4 root@raspi:~# ip link set swp1 promisc on root@raspi:~# tc qdisc add dev swp1 ingress # 需要内核编译开启CONFIG_NET_CLS_FLOWER root@raspi:~# tc filter add dev swp1 ingress protocol ip pref 1 handle 101 flower dst_mac de:ad:be:ef:13:37 action drop
swp4发送一个目标MAC地址为de:ad:be:ef:13:37
的数据包:
1 2 3 4 5 6 7 root@raspi:~# mausezahn swp4 -c 1 -p 64 -b de:ad:be:ef:13:37 -t ip Mausezahn will send 1 frames... 0.00 seconds (3745 packets per second) root@raspi:~# tcpdump -nei swp3 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on swp3, link-type EN10MB (Ethernet), snapshot length 262144 bytes 11:37:45.825442 72:1b:95:be:e5:9f > de:ad:be:ef:13:37, ethertype IPv4 (0x0800), length 98: 192.0.2.2 > 255.255.255.255: ip-proto-0 64
swp1没有收到swp4发送的数据包,这是因为swp2关闭了未知单播洪范(此时有9个包被丢弃):
1 2 3 4 5 6 7 8 9 10 11 12 13 root@raspi:~# tc -s filter show dev swp1 ingress filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x65 dst_mac de:ad:be:ef:13:37 eth_type ipv4 not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 installed 15352 sec used 224 sec firstused 14312 sec Action statistics: Sent 756 bytes 9 pkt (dropped 9, overlimits 0 requeues 0) backlog 0b 0p requeues 0
让swp1发送源MAC为de:ad:be:ef:13:37
的数据包,br0会学习到该mac的转发规则,在表项未老化之前,swp4发送目的mac为de:ad:be:ef:13:37
的数据包,br0会进行转发,我们在swp1上可以查看到因为tc规则被丢弃的一个数据包。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 root@raspi:~# mausezahn swp1 -c 1 -p 64 -a de:ad:be:ef:13:37 -t ip Mausezahn will send 1 frames... 0.02 seconds (44 packets per second) root@raspi:~# mausezahn swp4 -c 1 -p 64 -b de:ad:be:ef:13:37 -t ip Mausezahn will send 1 frames... 0.02 seconds (44 packets per second) # 被丢弃了10个包,说明又有一个新的数据包被swp1收到并匹配到了规则 root@raspi:~# tc -s filter show dev swp1 ingress filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x65 dst_mac de:ad:be:ef:13:37 eth_type ipv4 not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 installed 15558 sec used 9 sec firstused 14518 sec Action statistics: Sent 840 bytes 10 pkt (dropped 10, overlimits 0 requeues 0) backlog 0b 0p requeues 0