Alarm bells are ringing in the Internet engineering community over an obscure statistic that indicates the ‘Net is growing – in size and complexity – at a faster rate than today’s routers can handle.
This recent finding has prompted the Internet Engineering Task Force (IETF) to embark on an overhaul of the communications protocol that handles routing across the Internet’s backbone.
At stake for large companies may be the need to buy more powerful network gear earlier than originally intended.
“The sky is not falling, but the sky is hanging a little low,” says Fred Baker, a routing pioneer with Cisco Systems Inc. and a member of the IETF’s Internet Architecture Board. “The issue needs to be addressed soon.”
The statistic in question is the number of entries in the backbone’s routing table, the master list of network destinations that is stored in backbone routers and used to determine the best path from one network to another. The size and traffic in the routing table are indicators of the overall health of the Internet, in particular how well the individual networks that comprise the Internet are communicating with each other.
After years of predictable growth, the size of the routing table and traffic in it exploded during the past six months, topping 104,000 entries in March, compared with 75,000 a year ago. Even more troubling is evidence that frequent updates to the routing table entries by network managers are causing instability in the Internet’s backbone routing infrastructure.
Nobody knows how big or how active the routing table can get before the Internet’s core routers start crashing. But current projections show that if the growth goes unchecked, the Internet could face a router processing-power crunch in as soon as 18 months.
“It’s not the size of the table, but the number of updates per second that kills a router stone dead,” explains Geoff Huston, a Telstra official who tracks this statistic for the IETF. “By the time the table gets to around 200,000 entries, we may be pushing a default-free router well beyond its processing capability.”
In addition, the churn in the Internet’s routing table means that it is taking longer to propagate accurate routing information globally across the Internet.
The time it takes the Internet to process a route withdrawal or a route announcement is getting longer, Huston says. “This is again a processor overload issue.”
What’s driving the increase in routing table entries is the rising popularity of multi-homing on corporate networks. Multi-homing is the term used to describe a network configuration in which one Internet server is connected to two different ISPs for improved reliability and redundancy. A multi-homed network requires a separate entry in the routing table for each ISP and most large companies and dot-coms multi-home their networks. Huston estimates that 70 per cent of the announcements in today’s routing table are related to multi-homing.
Multihoming is popular because the cost of transmission circuits is plummeting, making it less expensive to buy Internet access services from two or more ISPs. At the same time, companies are more concerned about the reliability of their networks and less willing to trust one service provider.
“Half of the companies that are multihomed should have gotten better service from their providers,” says Patrik Faltstrom, a Cisco engineer and co-chair of the IETF’s Applications Area. “ISPs haven’t done a good enough job explaining to their customers that they don’t need to multihome.”
For network managers with multihomed networks, the growing size and complexity of the Internet’s routing table means they may need to buy bigger, more expensive routers and upgrade them more frequently, experts say. That’s because routers must store a view of the routing table for each ISP they use. The router processing problem is worse for backbone providers, which store hundreds of views of the routing table in their routers.
“When you talk about the size of the routing tables, it’s a symptom that you’re talking about, like a cough,” Baker says. “The issue isn’t that the routing tables are too big or multihoming is bad. It’s that these trends are driving equipment costs and putting more burdens on the routing protocols.”
In response, the IETF plans to revamp the 6-year-old standard used in multihoming and backbone routing: the Border Gateway Protocol (BGP) 4. Network managers run BGP4 on routers to load balance and back up their Internet traffic across multiple ISPs.
The IETF is concerned about whether BGP4 “can scale up to carrying millions or even tens of millions of distinct routing entries,” Huston says. “I believe we can scale up BGP to about two to three times of today without too much drama … we could tweak BGP to scale up to 10 to 20 times the size of today if we had better use of route attributes that allow selective aggregation.”
Selective aggregation would reduce the number and frequency of changes to the routing tables. For example, when the Internet’s backbone loses a link today, hundreds of messages are sent to the routing tables saying that individual routes are down. In the future, BGP might be able to announce this message once. Similarly, routers can communicate only with a neighbor or the whole Internet using BGP4. In the future, routers might also be able to communicate with the other routers along one path of the Internet.
To tackle these questions of how best to revamp BGP4, the IETF has launched a new effort called Prefix Taxonomy Ongoing Measurement and Inter Network Experiment (Ptomaine). The BGP4 redesign should be done in about a year.
The need to revamp BGP “is a problem we’ve been playing ostrich with for years,” says Randy Bush, co-chair of the IETF’s operations and management area and vice president of IP Networking at Verio. “An upgrade to BGP will give network managers improved tools that allow them to meet their redundancy and reliability needs with multi-homing and balance their traffic without being bad ‘Net citizens” by overloading the routing tables.
Ultimately, the IETF may need to develop a new framework for routing, in which load balancing and other traffic engineering messages used in multi-homed networks are carried by protocols other than BGP.