Network Outage? Where to start?

So you have a network outage?

Real World Scenerio, Not focused of any certification. So you get a call “There is a network outage”, or so they say. Your first task a network engineer is to generate proof it this is indeed an outage, or if the issue got escalated too soon. First means of bussiness, we want to follow the OSI model for troubleshooting. Though, before even getting there, you must ask the important questions to who brought this issue to you. Those questions will be:

  • How many devices are affected?
  • How many areas are being affected?
  • How many users are being affected?
  • Is this affecting day to day bussiness?

You want to ask as many questions as possible to narrow down the issue. As a network engineer you will always be plagued with “the network is down” and P1 Emergencies that turn out to be out of our hands. Anything from a user being locked out, and application/server issue, or even User error can simulate a network outage. We need to generate enough proof to see if we need to step in and possibly open a case with Cisco TAC, or generate enough proof to prove that the network is healthy and issue is elsewhere.

Severity

With the questions above you will net be able to tell if this is a network issue, but you will be able to tell how much time you have. You also will be able to tell how fast you will need to move before more people become aware of this issue (upper management upset = no good). Also after determining the severity of this outage, you will be able to to have a baseline of how many people are affected, and devices. From here you will need to gather the following:

  • IPs
  • MAC Addresses
  • Device Types
  • Users Affected

From this, you can figure out which subnets are affected. How many subnets are affected. What type of devices are affected, what protocols these devices must be usings, and what type of user are affected.

Detective Stage

Now its time to start testing if the network is actually down. First start by pinging affected IP addresses. Though extremely simple, this test alone can already start telling you what might be wrong. Keep in mind, some places have ping disabled for security purposes. Due to this, you must also try other methods to test network connectivity.

The next test is check switchport and check cables. To check switchports, look for an error port, or admin down. Many things could cause this, but this could also happen if the end user moved ethernet cable to a different port that is shut. You can also check cables at this stage, depending if you are remote or on property. You can replace ethernet cable or send an IT tech to do it.

Whether these two tests were pass or fails, if you are still having issue it is time to check the Firewall. In the firewall you will need the source IP. Check for any blocks. If you see blocks then something might be wrong on the network and something must have changed. If this is the case, check firewall rule that might be blocking in, come up with the theory, and update rule. However, if you see no blocks only allows, that issue is probably not on the network side. Instead it may be higher on the OSI model like the application layer.

The Hardest Part, Announcing your findings

This may be the toughest part of being a network engineer and that is, announcing your findings to your team. The network is blamed on the daily basis, and this does not mean that its not always the network. Though in the case that you 100% know the network in your environment is completely healthy, the affected devices ping, there are no blocks on the firewall, and you are able to see traffic between the affected endpoints/resources. Then there is not much more you can do to troubleshoot and must relay this information.

Be ready to have proof and documentations on why or why this is not a network issue. Also be ready to defend your theory. In the world if IT everyone wants to make sure the end user and company are online and working. With this comes added stress, especially if upper management is involved. This means things may get heated fast, but as long as you stay composed, and certain of your findings. To move forward, the other teams working on higher parts of the OSI model must get involved to continue looking for a solution.

Conclusion

This may be a bit different than text book, but its real world advice. With this post marks a turn on this website to focus on real world troubleshooting and protocols. Hopefully this is helpful to new network engineers that stumble on this page. Thanks for reading!

newsletter

Thanks for stopping by 👋

Sign up to receive awesome content in your inbox, every month.

We don’t spam! Read our privacy policy for more info.

Leave a Comment

Let us know you are human: