一则routeroos突然断流的故障排查

使用了一条移动500M线路。上线routeros服务器之后使用82580网卡做了端口汇聚(端口绑定),5700交换机使用了eth-trunk和ros对联,S5700 22端口和移动相连。上线后发现移动线路和下联bras的线路瞬间流量为0,然后迅速恢复。将ros从6.22降级到5.25问题依旧。

无意发现华为S5700交换机出现mac地址漂移(浮动)提示。具体为如下内容:

1
2
3
4
5
6
7
8
#Dec 19 2014 14:04:39+08:00 QZ_WAN1_S5700 L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 MAC move detected, VlanId = 202, MacAddress = 001b-cd03-23a0, Original-Port = Eth-Trunk1, Flapping port = GE0/0/22. Please check the network accessed to flapping port.
#Dec 19 2014 14:03:55+08:00 QZ_WAN1_S5700 DS/4/DATASYNC_CFGCHANGE:OID 1.3.6.1.4.1.2011.5.25.191.3.1 configurations have been changed. The current change number is 125, the change loop count is 0, and the maximum number of records is 4095.
#Dec 19 2014 13:54:39+08:00 QZ_WAN1_S5700 L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 MAC move detected, VlanId = 202, MacAddress = 001b-cd03-23a0, Original-Port = Eth-Trunk1, Flapping port = GE0/0/22. Please check the network accessed to flapping port.
#Dec 19 2014 13:44:39+08:00 QZ_WAN1_S5700 L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 MAC move detected, VlanId = 202, MacAddress = 001b-cd03-23a0, Original-Port = Eth-Trunk1, Flapping port = GE0/0/22. Please check the network accessed to flapping port.
#Dec 19 2014 13:34:39+08:00 QZ_WAN1_S5700 L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 MAC move detected, VlanId = 202, MacAddress = 001b-cd03-23a0, Original-Port = Eth-Trunk1, Flapping port = GE0/0/22. Please check the network accessed to flapping port.
#Dec 19 2014 13:24:39+08:00 QZ_WAN1_S5700 L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 MAC move detected, VlanId = 202, MacAddress = 001b-cd03-23a0, Original-Port = Eth-Trunk1, Flapping port = GE0/0/22. Please check the network accessed to flapping port.
#Dec 19 2014 13:23:35+08:00 QZ_WAN1_S5700 L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 MAC move detected, VlanId = 202, MacAddress = 001b-cd03-23a0, Original-Port = Eth-Trunk1, Flapping port = GE0/0/22. Please check the network accessed to flapping port.
#Dec 19 2014 13:14:39+08:00 QZ_WAN1_S5700 L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 MAC move detected, VlanId = 202, MacAddress = 001b-cd03-23a0, Original-Port = Eth-Trunk1, Flapping port = GE0/0/22. Please check the network accessed to flapping port.

ros网卡的mac地址发现漂移到了移动过来的光纤接口和vlan上,显然是不正常的。因为交换机原理是:基于源地址学习基于目的地址转发。试想如果ros的mac地址在移动的核心网络里飘来飘去肯定会发生数据流无法到达ros或者无法发送出去的问题,猜测是运营商网络中存在环路导致。

华为工程师提供了流统计配置进行统计,监视移动带宽接口是否有源地址为ros网卡的mac地址的数据帧到本地S5700交换机上:

设备配置了流量统计信息,匹配的两个地址为172.16.111.124、172.16.100.139,如设备存在丢包,我们可通过流量统计功能来确认包是丢在什么设备上。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
acl number 4001
rule permit source-mac aaaa-bbbb-cccc
#
traffic classifier c2 operator and
if-match acl 4001
#
traffic behavior b2
statistic enable
#
traffic policy p2
classifier c2 behavior b2
#
interface GigabitEthernet0/0/22
traffic-policy p2 inbound

清除接口统计计数命令

1
reset traffic policy statistics interface  GigabitEthernet 0/0/12 inbound

查询接口统计计数命令的确发现有不正常数据上来,随后保修移动。移动动作缓慢目前还在解决中。

1
display traffic policy  statistics interface  GigabitEthernet 0/0/12 inbound verbose rule-base
1
2
3
4
5
6
7
8
9
10
11
12
13
[QZ_WAN1_S5700-GigabitEthernet0/0/21]display traffic policy  statistics interface GigabitEthernet 0/0/22 inbound verbose  rule-base

Interface: GigabitEthernet0/0/22
Traffic policy inbound: p2
Rule number: 1
Current status: OK!
---------------------------------------------------------------------
Classifier: c2 operator and
Behavior: b2
Board : 0
rule 5 permit source-mac 001b-cd03-23a0
Passed Packet 442,Passed Bytes -
Dropped Packet 0,Dropped Bytes -