What caused today’s Internet hiccup
Like others, you may have noticed some instability and general sluggishness on the Internet today. In this post we’ll take a closer look at what happened, including some of the BGP details! At around 8am UTC Internet users on different mailing lists, forums and twitter, reported slow connectivity and intermediate outages. Examples can be found on the Outages mailing list company support site such as liquidweb and of course on Nanog. How stable is the Internet So how do we know if the Internet was really unstable today? One way to look at this is by looking at the outages visible in BGP over the last 12 months. On average we see outages for about 6,033 unique prefixes per day, affecting on average 1470 unique Autonomous Systems. These numbers are global averages and it’s worth noting that certain networks or geographical areas are more stable than others. If we look at the number of detected outages by BGPmon today we see outage for 12,563 unique prefixes affecting 2,587unique Autonomous Systems. This is well above the daily average and indeed both the unique prefixes and the unique Autonomous Systems count are the highest we’ve seen in the last 12 months. Based on this data it is fair to say that Internet did indeed suffer from more hiccups than normally. The graph below shows the number of autonomous systems that were affected by an outage just over the last month. Note that most outage occur during week days and that the Internet is in fact much more stable during the weekend and holidays.
The table above shows that depending on the provider and location the numbers differ slightly, but are indeed very close. It also shows that right now the number of prefixes is still several thousands under the 512,000 limit so it shouldn’t be an issue. However when we take a closer look at our BGP telemetry we see that starting at 07:48 UTC about 15,000 new prefixes were introduced into the global routing table.
details see this Cisco doc.
What caused the instability Folks quickly started to speculate that this might be related to a known default limitation in older Cisco routers. These routers have a default limit of 512K routing entries in their TCAM memory. The BGP routing table size depends on a number of variables. The primary contributors are Internal, or local routes (more specific and VPN routes used only in the local AS) and External routes. These External routes are all the routes of everyone on the Internet, we sometimes refer to this as the global routing table of Default Free Zone (DFZ). For most networks on the Internet today the global routing table is the major contributor, while local routes vary from a few dozen to a few hundred. Only a small percentage of the networks have a few thousand Internal routes. The size of the global routing table today (August 12 2014) is about 500k, this number marginally varies per provider. Let’s compare a few major ISP’s. The table below shows the number of IPv4 prefixes received per provider on a full feed BGP session.
|Provider||Location||Number of IPv4 routes on full BGP feed|