FRAMINGHAM, Mass. — High port density, high throughput, and very low latency are bedrock requirements in the data centre, and Force10 Networks Inc.’s new S4810 top-of-rack switch delivers on all three counts.
At the same time, testing revealed some interesting limitations in the “merchant silicon” chips increasingly seen in data-centre switches. Tests turned up anomalies in the areas of cut-through latency; media access control address learning; and link aggregation failover handling. Beyond the switching silicon, the S4810 also turned in mixed results in multicast scalability.
The S4810 is a 1U top-of-rack switch with multiple interface options. It has 48 SFP+ ports for 1G/10G Ethernet (we tested it with 48 10G Ethernet transceivers) and four QSFP+ ports for 40G uplinks. With 10GBase-SR transceivers, the switch drew 202 watts when idle and 219 watts with its data plane fully loaded.
The switch runs the Force10 Operating System (FTOS), which includes a command-line interface (CLI) that’s nearly a clone of Cisco System Inc.’s IOS. Experienced Cisco users will have no trouble configuring and managing this switch.
Although we tested the switch as a layer-2 data center device, it also supports layer-3 features, including major IPv4 routing protocols and static routing of IPv6 traffic, via a US$2,000 software upgrade.
Significantly, the switch does not yet support some key data centre protocols, according to a features questionnaire completed by Force10. These include the data centre bridging extensions (DCBX); IEEE 802.1Qbb priority-based flow control (PFC); 802.1Qau congestion notification; and 802.1Qaz traffic shaping. Force10 says these features are slated for third-quarter 2011 release.
The S4810 put up solid numbers when it comes to basic unicast traffic handling. It delivers line-rate throughput, regardless of unicast frame size. Better still for delay-sensitive applications, the S4810 offers sub-microsecond average latency when configured in store-and-forward mode. This is one of the first store-and-forward switches we’ve tested to break the microsecond barrier.
We expected average latency to be lower still with the S4810 configured as a cut-through device, but that wasn’t always the case. For frame sizes of 256 bytes and larger, cut-through latency was significantly higher than the equivalent test in store-and-forward mode. Further, cut-through latency increased with frame length.
Usually cut-through devices usually have two properties: They tend to be very fast (since they start forwarding a frame before it’s fully received, unlike store-and-forward devices which wait until the entire frame is cached before switching it) and they have roughly the same average latency regardless of frame length.
With the S4810, these properties better described the store-and-forward results than cut-through ones.
This is partially explained by a characteristic of the Broadcom 56845 application-specific integrated circuit (ASIC) used in the S4810. According to Force10, the chip still acts in store-and-forward mode for frames shorter than 624 bytes, even when set for cut-through operation. This could explain higher cut-through latency for medium-length frames (say, between 256 and 624 bytes) but it’s still puzzling why cut-through latency would be higher for longer frames. The testing RFCs require different measurement methods for store-and-forward and cut-through latency, and we checked and rechecked results to verify we’d used the appropriate methods for each. Force10 and other labs also have confirmed this behavior.
Given the latency results, we’d recommend leaving the switch in its default store-and-forward mode. There’s a performance advantage for doing so, and users get the extra benefit of error checking that store-and-forward operation provides.
Another anomaly appeared in tests of MAC address capacity, which determines how many devices can be attached to a switch. This metric is especially important for virtualization and cloud computing, where virtual machine counts in a single broadcast domain can rise into the tens of thousands.
The S4810’s data sheet states its MAC capacity as 128,000; in practice, we found the limit to be slightly lower, averaging 117,145 addresses depending on which set of pseudorandom addresses we used. The switch ASIC’s hashing algorithm accounts for the difference.
What’s more, the actual number of addresses the switch can learn in production is likely to be far lower than 117,000. Typically, address capacity tests are conducted using only three ports. When we configured the Spirent TestCenter traffic generator to offer a set of nearly 100,000 pseudorandom addresses across 48 ports, the switch learned only about 94,000 of these due to hash collisions.
Through trial and error, we found that the switch would learn at most around 25,000 addresses without hash collisions when we distributed addresses across 48 ports.
To be sure, 25,000 addresses is still a huge number, more than enough for the vast majority of data centers. Then again, some heavy users of virtualization already are pushing above this figure. Further, we think data-sheet numbers should give users meaningful guidance on the limits of switch performance, not theoretical best-case estimates.
The S4810 allows up to eight ports to be combined into a link aggregation group (LAG) and uses the link aggregation control protocol (LACP) to dynamically add and remove LAG members. We took one LAG member offline, as might occur in the event of a link or transceiver failure, to see how the switch would distribute that port’s traffic across remaining members of the LAG.
Traffic distribution was not uniform in this failover test. After we disabled a port, the switch redistributed all of its traffic to the first two ports in the LAG. On a lightly loaded network this wouldn’t be a problem, but it could result in oversubscription and frame loss on a heavily loaded LAG. Still, this is an improvement over LAG behavior we saw on some switches in last year’s test, where all traffic from a failed LAG port was redistributed to just one other LAG member.
As a final test of unicast performance, we checked the S4810 for “forward pressure,” a mechanism some switches use to avoid congestion by forwarding frames illegally fast. The S4810 doesn’t have that problem. Its clock is set to run at 40
parts per million (ppm) faster than Ethernet’s theoretical line rate, but that’s well within the 100-ppm tolerance allowed in the Ethernet specification.
We measured the S4810’s multicast performance with tests of IGMP group capacity; group join and leave times; and throughput and latency. The first two of these stress the switch’s control plane via the switch’s software and CPU, while
throughput stresses the data plane via the ASIC. Using IGMP snooping, the switch learned 3,000 multicast groups in our capacity test. That’s higher than all but one top-of-rack switch tested last year, and a useful figure for trading and videoconferencing applications that require large number of multicast groups.
The switch’s join/leave times were another story. With all receivers subscribed to 989 multicast groups, the S4810 took an average of 21.7 seconds to join each group and 18.3 seconds to leave. That’s much higher than most switches in last year’s test, which also handled 989 groups. The S4810’s maximum join and leave times were higher still, at 49.8 and 53.7 seconds respectively. These high IGMP processing times suggest an overload of the switch’s CPU.
More evidence of an overload came in a buffer-overflow message we saw when running this test (and the group capacity test) immediately after a switch reboot. The fact that the switch did not display this message on the second and subsequent test iterations suggests an issue with initial loading of a multicast software module into memory when large group counts are involved. Another issue we saw (on all iterations, not just the first one) is that the switch’s CLI erroneously reported the same port twice as a member of a given multicast group.
Force10 said it replicated these results in-house, and found much lower join and leave times – of 1 second or less – when 100 groups were involved instead of nearly 1,000. The manufacturer also says it’s doing more optimization work on this new platform.
For network managers whose foremost switch requirements are high port density and very low latency, the S4810 is a good fit. The S4810 still has more work to do in the areas of data center features support and multicast processing speeds. These involve software fixes, and Force10 says they’re already in the works. Hardware anomalies, such as those involving MAC address learning and link aggregation failover, are harder to fix and may take longer to address.
(From Network World U.S.)