Experienced extrem slow SAN traffic? It may be caused by severe latency bottleneck detected error on ISL link, trunking, or even a single port link.
How to diagnostic severe latency bottleneck detected error over SAN link? Here is one example shows how to troubleshoot SAN latency bottleneck issue.
Detect and check the error
If you don't have monitorying alert configured at you site, more likely you will get noticed by traffic drop over the port(traffic on single port, traffic between SAN switches, etc..) and application slowness. After other components check(server, HBA etc..), here are two from troubleshooting commands for SAN switch diagnostics.
2016/01/20-08:04:32, [AN-1010], 297, FID 128, WARNING, san48b-5-sw2, Severe latency bottleneck detected at slot 0 port 10.
2016/02/04-13:14:04, [AN-1010], 298, FID 128, WARNING, san48b-5-sw2, Severe latency bottleneck detected at slot 0 port 10.
frames enc crc crc too too bad enc disc link loss loss
tx rx in err g_eof shrt long eof out c3 fail sync sig
8: 0 0 0 0 0 0 0 0 0 0 0 0 0
9: 2.3g 2.7g 0 0 0 0 0 0 0 0 2 0 2
10: 249.0m 35.3m 59.2k 59.1k 59.0k 0 0 32 70.0k 816 0 0 0
11: 247.3m 27.3m 0 0 0 0 0 0 0 0 0 0 0
In above output, as you can see port 10 enc_in,enc_out, crc err, crc g_eof all show high error counts.
There could be different reasons that can cause SAN switch severe latency issue, first, make sure you don't have bandwidth issue, especially when using ISL/trunking.
Note: when SAN latency issue appears, mostly you won't see the port or links saturation. In opposity, the traffic will drop to very low due to errors.
In this case, according to the Brocade SAN switch porterrshow counters explaination, enc_in/enc_out more likely endicate external probem, cable or SFP.
This is a sign of a hardware problem. Suggested actions would be to replace the cable or SFP, move cable to another port, or run porttest.
enc_out errors on their own imply a cable/connector problem. Enc_out errors and crc_err together imply GBIC/SFP problem
Same as crc err and crc g_eof
- A mathematical formula generates counters at the sending port. The receiving port uses the same formula to check and compare. Generally speaking. crc_err and enc_out errors together imply GBIC/SFP problem. Suggested actions would be to replace the cable or SFP, move cable to another port, or run porttest.
Solve the problem
So, clearly, this case is related with cable or SFP, but not rush to replace cable yet. First, check the cable if there is a sharp curve, if there is , then recable it, clean it with professional tools. Followed by the following actions on both SAN switches.
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err
10: 2.4g 2.2g 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
11: 1.8g 80.2m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
If doesn't solve the problem, change cable, then SFP to fix the problem.