Output Drops on Interface/s on Cisco Catalyst 3650
=====================
+ Problem Description:
=====================
The customer mentioned that he is observing output drops in the flow of the traffic, and he suspects that issue to be caused due to a Cisco Catalyst 3650 switch.
=====================
+ Action Plan:
=====================
- Better understanding of the issue by describing the problem description from a technical point of view, in addition to collecting additional information that will help troubleshoot the issue:
- Is this a new implementation?
- Is this a standalone or stack setup?
- What are the interface/s IDs that show such behavior (output drops)?
- On the other hand, we are going to collect the following output from the Switch/es?
- Show tech
- Show platform hard fed switch [#] qos queue config interface [interface_ID]
- Show platform hard fed switch [#] qos queue status interface [interface_ID]
=========================================================================
- The questions collected help us identify the environment in which we are working.
- Show tech will help identify the version of the switch and whether the output drops observed due to an error observed on the logging file or the observed behavior is due to a hardware failure.
- The (show platform hard fed switch [#] qos queue config interface [interface _ID]), the # is the number of the switch in the stack. If the switch is running as a standalone, we always place it with 1, and the interface_ID must be replaced with the full name of the interface (Gig x/y or Gig x/y/z), and the outcome of the output will give us a better understanding if the quality of service applied under the interface and what is it.
- The (show platform hard fed switch [#] qos queue status interface [interface_ID]) The (show platform hard fed switch [#] qos queue config interface [interface _ID]), the # is the number of the switch in the stack. If the switch is running as a standalone, we always place it with 1, and the interface_ID must be replaced with the full name of the interface (Gig x/y or Gig x/y/z), and the outcome of the output will help us identify the queueing system and what queue is causing the drop to be observed.
=========================================================================
- After checking the output and if we observe output drops on one of the queues, we need to provide the customer with the following configuration which will be placed on the switch/es to mitigate the output drops:
Switch# conf t
Switch(config)# qos queue-stats-frame-count
Switch(config)# exit
Switch# clear counters
- After implementing the above command, we will proceed with keeping the switch/es under monitoring for 24 hours Afterward, we will proceed with collecting the following outputs in order to confirm if the output drops is still observed:
Switch# show interface
Switch# show platform hard fed switch [#] qos queue config interface [interface_ID]
Switch# show platform hard fed switch [#] qos queue stats interface [interface_ID]
- If the output drops, observed after implementing the configuration, we will proceed with collecting SPAN capture on the interface, that experience drop:
Switch# conf t
Switch(config)# no monitor session all
Switch(config)# monitor session [session_number] source interface [interface_ID] [both | rx | tx]
Switch(config)# monitor session [session_number] destination interface [interface_ID]
- The session number for both destination and source must be the same, on the other hand, the interface ID on the source must be different form the destination interface. Finally, we have to connect a laptop that has Wireshark installed on it to capture the traffic.
- After collecting the capture, we should determine from the capture file if the output drops are being observed due to a burst of traffic:
- Capture the right data. For example, you are seeing drops in a particular class. Capture data for only that class or use the proper wireshark filer to get the packets that should be hitting the class, which we need to investigate.
- Once you have the proper packets identified, then in Wireshark go to statistics >I/O graph
- Click on the I/O graph, and you will see a window pop up that will tell you packets per second, like below:
Here, we are very clearly seeing the spike in packet packets per second, which is going from less than 100 to 700 packets within seconds. You can see the “interval” at the bottom; you can change the interval to get more granular data in milliseconds, so let's see how many packets we were seeing in milliseconds.
Looking at the graph, we know that we have a few fairly long bars where packets per milliseconds was more than 30 packets. If the CIR rate on the router was 10 Packets per second and the buffer length was 5 packets, then definitely we were dropping 15 packets for that interval.
- Finally, in order to mitigate the burst of traffic observed, on the switch/es, we need to apply the following solutions after consulting the customer:
- Increase the bandwidth of the links (port channel)
- Hardware upgrade with more capable switch/es, to handle the traffic generated by our network.
====================
+ Link References:
====================
- How to read capture:
https://protocoholic.com/2018/05/24/wireshark-how-to-identify-burst-of-traffic-in-network/
- How to collect SPAN & RSPAN:
- Cisco link on how to troubleshoot output drops issue on catalyst 3650:
- Useful links:
https://community.cisco.com/t5/switching/cisco-3650-high-total-output-drops/td-p/4020630
https://community.cisco.com/t5/switching/c3650-output-queue-drop-oqd-problem/td-p/4309374
Comments
0 comments
Please sign in to leave a comment.