NSX-T东西向路由

网络4个月前发布 刘丰源
50 0 0

本文来简要介绍一下两个逻辑交换机之间通过Tire1逻辑路由器通信的东西向路径.

实验拓扑如图:
NSX-T东西向路由
虚拟机t1的IP为:6.6.100.11, t2的IP为:6.6.200.12.

N-VDS或者DVS上的端口以Geneve的VNI互相隔离, 因而一个Geneve VNI就决定了一个逻辑交换机/分段. 我的环境的两个逻辑交换机的Geneve VNI如图:
NSX-T东西向路由
可以看到ls-geneve-100的VNI为65537, ls-geneve-200的VNI为65536.

使用命令net-vdl2 -l查看VNI:

[root@esxi-01:~] net-vdl2 -l
Global States:
    Control Plane Out-Of-Sync:  No
    VXLAN UDP Port: 4789
    Geneve UDP Port:    6081
NSX VDS:    DSwitch
    VDS ID: 50 02 70 16 c2 cd 74 37-fb a6 ff 0b 1b cd 0e ee
    MTU:    1600
    Segment ID: 10.10.10.0
    Transport VLAN ID:  300
    VTEP Count: 1
    CDO status: enabled (deactivated)
        VTEP Interface: vmk10
            DVPort ID:  b58c174b-a07f-43a6-b0ca-7830de39f50f
            Switch Port ID: 67108877
            Endpoint ID:    0
            VLAN ID:    300
            Label:      10292
            Uplink Port ID: 2214592537
            Is Uplink Port LAG: No
            IP:     10.10.10.101
            Netmask:    255.255.255.0
            Segment ID: 10.10.10.0
            GW IP:      10.10.10.1
            GW MAC:     ff:ff:ff:ff:ff:ff
            IP Acquire Timeout: 0
            Multicast Group Count:  0
            Is DRVTEP:  Yes
    Network Count:  3
        Logical Network:    65538
            Routing Domain: 00000000-0000-0000-0000-000000000000
            Multicast Routing Domain: 00000000-0000-0000-0000-000000000000
            Replication Mode:   Source Unicast
            Control Plane:  Enabled (Multicast Proxy,ARP proxy)
            Controller: 10.44.205.85 (up)
            MAC Entry Count:    0
            ARP Entry Count:    0
            Port Count: 1
        Logical Network:    65537
            Routing Domain: 98334210-1ec6-4176-a718-581908b718c5
            Multicast Routing Domain: 00000000-0000-0000-0000-000000000000
            Replication Mode:   MTEP Unicast
            Control Plane:  Enabled (Multicast Proxy,ARP proxy)
            Controller: 10.44.205.85 (up)
            MAC Entry Count:    0
            ARP Entry Count:    1
            Port Count: 2
        Logical Network:    65536
            Routing Domain: 98334210-1ec6-4176-a718-581908b718c5
            Multicast Routing Domain: 00000000-0000-0000-0000-000000000000
            Replication Mode:   MTEP Unicast
            Control Plane:  Enabled (Multicast Proxy,ARP proxy)
            Controller: 10.44.205.85 (up)
            MAC Entry Count:    0
            ARP Entry Count:    0
            Port Count: 2
    Routing Domain Count:   2
        Routing DomainID:   00000000-0000-0000-0000-000000000000
        Routing DomainID:   98334210-1ec6-4176-a718-581908b718c5

在ESXi01上查看DVS的端口信息:

[root@esxi-01:~] nsxdp-cli vswitch instance list
DvsPortset-1 (DSwitch)           50 02 70 16 c2 cd 74 37-fb a6 ff 0b 1b cd 0e ee
Total Ports:2560 Available:2540
  Client                         PortID          DVPortID                             MAC                  Uplink
  Management                     67108868                                             00:00:00:00:00:00    n/a
  vmnic0                         2214592520      10                                   00:00:00:00:00:00
  Shadow of vmnic0               67108873                                             00:50:56:5c:37:04    n/a
  vmk0                           67108876        1                                    00:50:56:b1:59:3e    vmnic0
  vmk10                          67108877        b58c174b-a07f-43a6-b0ca-7830de39f50f 00:50:56:69:15:41    vmnic1
  vmk50                          67108878        8b2a4724-274f-46d0-a99b-580352399aa9 00:50:56:61:3f:85    void
  vdr-vdrPort                    67108883        vdrPort                              02:50:56:56:44:52    vmnic1
  spf-spfPort                    67108886        spfPort50027016c2cd7437              02:50:56:56:45:52    vmnic1
  vmnic1                         2214592537      11                                   00:00:00:00:00:00
  Shadow of vmnic1               67108890                                             00:50:56:5f:1e:d7    n/a
  t1.eth0                        67108910        e25a8fa7-0c21-4dae-b252-6d22ef33c1c5 00:50:56:82:70:f0    vmnic1
  t3.eth0                        67108917        c932ef38-c49f-4e28-8672-6ca34db2b38c 00:50:56:82:a0:05    vmnic1

可以看到, 所有的逻辑交换机端口都接在同一个虚拟交换机上. 逻辑路由器(Logical Router)由SR: Service Router和DR: Distributed Router构成。DR分布在相应传输区域的传输节点上,SR则部署在Edge节点中。上边交换机端口vdrPort是ESXi主机上DR实例接到虚拟交换机的端口, 它可以理解为是trunk端口. 所有逻辑交换机的广播域流量都可以从它通过.

需要注意的vdrPort的MAC地址在所有传输节点上都是相同的, 默认为02:50:56:56:44:52.

在ESXi-01主机上查看DR:

[root@esxi-01:~] nsxcli -c get logical-routers
Tue Nov 22 2022 UTC 03:53:42.083
                                  Logical Routers Summary
 ------------------------------------------------------------------------------------------
               VDR UUID                LIF num  Route num  Max Neighbors  Current Neighbors
 98334210-1ec6-4176-a718-581908b718c5     2         2          50000              3

接着查看DR的接口信息:

[root@esxi-01:~] nsxcli -c get logical-router 98334210-1ec6-4176-a718-581908b718c5 interfaces
Tue Nov 22 2022 UTC 03:57:19.784
                         Logical Router Interfaces
---------------------------------------------------------------------------
IPv6 DAD Status Legend:  [A: DAD_Sucess], [F: DAD_Duplicate], [T: DAD_Tentative], [U: DAD_Unavailable]

LIF UUID                 : 39c68523-a185-49a7-9f86-4792e6696a8f
Mode                     : [b'Routing']
Overlay VNI              : 65536
IP/Mask                  : 6.6.200.1/24
Mac                      : 02:50:56:56:44:52
Connected DVS            : DSwitch
Control plane enable     : True
Replication Mode         : 0.0.0.1
Multicast Routing        : [b'Enabled', b'Oper Down']
State                    : [b'Enabled']
Flags                    : 0x80388
DHCP relay               : Not enable
DAD-mode                 : ['LOOSE']
RA-mode                  : ['UNKNOWN']

LIF UUID                 : 4adea6ee-5dbf-4ff8-8fa4-6670bb70982f
Mode                     : [b'Routing']
Overlay VNI              : 65537
IP/Mask                  : 6.6.100.1/24
Mac                      : 02:50:56:56:44:52
Connected DVS            : DSwitch
Control plane enable     : True
Replication Mode         : 0.0.0.1
Multicast Routing        : [b'Enabled', b'Oper Down']
State                    : [b'Enabled']
Flags                    : 0x80388
DHCP relay               : Not enable
DAD-mode                 : ['LOOSE']
RA-mode                  : ['UNKNOWN']

可以看到6.6.100.1和6.6.200.1两个接口的MAC地址都为:02:50:56:56:44:52.

现在我们来看6.6.100.11到6.6.200.12的网络路径.

在t1上清空ARP信息, 然后ping虚拟机t2. 因为目标IP6.6.200.12不在相同子网内, 会先发送ARP请求来确认网关6.6.100.1的MAC地址.

我们在t1.eth0, vdrPort和uplink上进行抓包.

只有在t1.eth0端口上抓到ARP请求:

[root@esxi-01:~] pktcap-uw --switchport 67108910 --dir 2 -o - | tcpdump-uw -ner -
The switch port id is 0x0400002e.
pktcap: The output file is -.
pktcap: No server port specifed, select 7799 as the port.
pktcap: Local CID 2.
pktcap: Listen on port 7799.
reading from file -, link-type EN10MB (Ethernet)
pktcap: Accept...
pktcap: Vsock connection from port 1096 cid 2.
11:45:24.879494 00:50:56:82:70:f0 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 6.6.100.1 tell 6.6.100.11, length 46
11:45:24.879541 02:50:56:56:44:52 > 00:50:56:82:70:f0, ethertype ARP (0x0806), length 60: Reply 6.6.100.1 is-at 02:50:56:56:44:52, length 46

猜测虚拟交换机层面对虚拟子网网关实现了ARP代答, 这样发送向网关的流量导向本机的vdrPort, 尽管各个ESXi主机上的vdrPort的MAC地址都相同也不会冲突, 因为这样的ARP请求不会送到其他ESXi主机上.

接下来, 在虚拟机t1上长ping t2, 我们分别在ESXi-01和ESXi-02的vdrPort上抓包.

在发送方t1所在ESXi-01上的vdrPort, 可以看到两个request包, 但没有reply包:

12:17:25.381567 00:50:56:82:70:f0 > 02:50:56:56:44:52, ethertype IPv4 (0x0800), length 98: 6.6.100.11 > 6.6.200.12: ICMP echo request, id 9652, seq 11, length 64
12:17:25.381593 02:50:56:56:44:52 > 00:50:56:82:a6:ae, ethertype IPv4 (0x0800), length 98: 6.6.100.11 > 6.6.200.12: ICMP echo request, id 9652, seq 11, length 64
12:17:26.382613 00:50:56:82:70:f0 > 02:50:56:56:44:52, ethertype IPv4 (0x0800), length 98: 6.6.100.11 > 6.6.200.12: ICMP echo request, id 9652, seq 12, length 64
12:17:26.382645 02:50:56:56:44:52 > 00:50:56:82:a6:ae, ethertype IPv4 (0x0800), length 98: 6.6.100.11 > 6.6.200.12: ICMP echo request, id 9652, seq 12, length 64

而在虚拟机t2所在的ESXi-02上的vdrPort, 只有reply包:

12:17:25.588603 00:50:56:82:a6:ae > 02:50:56:56:44:52, ethertype IPv4 (0x0800), length 98: 6.6.200.12 > 6.6.100.11: ICMP echo reply, id 9652, seq 11, length 64
12:17:25.588627 02:50:56:56:44:52 > 00:50:56:82:70:f0, ethertype IPv4 (0x0800), length 98: 6.6.200.12 > 6.6.100.11: ICMP echo reply, id 9652, seq 11, length 64
12:17:26.590845 00:50:56:82:a6:ae > 02:50:56:56:44:52, ethertype IPv4 (0x0800), length 98: 6.6.200.12 > 6.6.100.11: ICMP echo reply, id 9652, seq 12, length 64
12:17:26.590873 02:50:56:56:44:52 > 00:50:56:82:70:f0, ethertype IPv4 (0x0800), length 98: 6.6.200.12 > 6.6.100.11: ICMP echo reply, id 9652, seq 12, length 64

因而数据包的路由是在数据包发送方主机上的DR实例来实现, 数据包到达目标主机后, 直接解封装送到目标虚拟机.

整体路径如图:
NSX-T东西向路由
所有ESXi主机上的vdrPort的MAC地址都一致, 且vdrport上可以接收到uplink所连接物理网络的数据包. 一般情况下该MAC地址并不会暴露到物理网络中, 但当虚拟交换机上的某uplink接口down掉, 启用standby uplink时, ESXi会广播发送Reverse ARP向物理交换机宣告这些MAC在该端口下, 这种情况下会导致vdrPort的MAC地址暴露到物理网络, 如:

14:53:52.919368 00:50:56:6c:e2:6a > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is 00:50:56:6c:e2:6a tell 00:50:56:6c:e2:6a, length 46
14:53:52.919379 02:50:56:56:44:52 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is 02:50:56:56:44:52 tell 02:50:56:56:44:52, length 46
14:53:52.919397 00:50:56:6c:e2:6a > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is 00:50:56:6c:e2:6a tell 00:50:56:6c:e2:6a, length 46
14:53:52.919397 00:50:56:6c:e2:6a > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is 00:50:56:6c:e2:6a tell 00:50:56:6c:e2:6a, length 46
14:53:52.919406 00:50:56:53:71:23 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is 00:50:56:53:71:23 tell 00:50:56:53:71:23, length 46
14:53:52.919409 2c:f0:5d:1d:b0:41 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is 2c:f0:5d:1d:b0:41 tell 2c:f0:5d:1d:b0:41, length 46

当不同的uplink异常, 多台ESXi启用不同的uplink后, 该MAC会暴露在不同的物理交换机端口, 因而交换机可能会告警存在mac-address flapping.

参考:

© 版权声明

相关文章

暂无评论

暂无评论...