Measure for measure: Making metrics matter

David Newman

23 years ago

Describing a network box with the word box “is an attempt to describe a beach by looking at…each and every grain of sand.”

Geoff Huston, chief scientist at Australian carrier Telstra Corp. Ltd., said that. Huston was upbraiding me for my part in an effort to measure quality-of-service (QoS) mechanisms in routers. For his purposes, he was right: Single-box measurements aren’t helpful in describing service levels.

Huston’s statement underscores a key testing issue: Services and devices are different, and the right metrics for one may not be right for the other.

Consider latency, the delay added by a device or system. It’s a vital metric in assessing the performance of virtually any device. In Gigabit Ethernet gear, latencies in the tens of microseconds are common.

Let’s look at service measurements. To go across the U.S. and back, a beam of light travelling through fibre takes at least 40 msec. Even on shorter trips, latency won’t begin to degrade application performance until it reaches up in the milliseconds.

Conversely, uptime is one of the most important service metrics, yet in a test lab, where test durations tend to be in seconds, it’s not all that important.

So what are meaningful box and service benchmarks?

Let’s take the service case first. Reliability and uptime metrics are always critical. Questions to ask providers include: What percentage of time is a circuit available? How many outages occur, and how long does it take to restore service?

Throughput is also useful, but a more meaningful predictor of application performance is “goodput” – the amount of user data received in a given interval, minus any bytes that had to be retransmitted.

Latency may not be terribly meaningful by itself, but a few related metrics can be. Jitter – variation in packet arrival times – is critical for voice and video applications, where even small amounts of change can lead to severely degraded performance.

Latency distribution is another useful way to assess services. While jitter describes packet-by-packet variations in delay, a latency distribution histogram describes a service at its best and worst.

In assessing box performance, the classic metrics include throughput, forwarding rate, latency and jitter. For applications that are sensitive to reordering of packets (including anything running over TCP), I’d add sequencing to the list.

Latency matters for boxes, although lately it’s been suggested that it’s time to revise the standard practice of measuring only at the throughput level. It’s also useful to measure delay with lighter loads.

It’s not valid to label as latency any delay measurement taken where loss exists. In this situation, what’s really being measured is queue depth. Still, it may be valid to measure delay where congestion exists, especially when assessing QoS mechanisms.

Tests can produce gigabytes of raw data. Rendering the data fit for human consumption requires two things: an understanding of what’s being measured, and the application of meaningful metrics to describe it. Without the latter, the raw data is just so many grains of sand.

David Newman is president of Network Test, an independent benchmarking and network design consultancy. He can be reached atdnewman@networktest.com.