Microprocessors giants such as Intel Corp. and Advanced Micro Devices Inc. (AMD) have been performing a risky technological high-wire act ever since their chips – some as fast as 1GHz – began exceeding the system bus speed, according to several industry experts.
The problem involves chip testing. Because internal processor speeds now outrun the surrounding system bus by such a huge margin, some experts say it is practically impossible to thoroughly test the logic within a high-speed processor without incurring costs and delays that would put the fully tested chip out of reach. Maximum bus speeds range from 133MHz to 200MHz.
“We’ve hit the speed wall. You could crank out a 2GHz chip today but you couldn’t tell if it was any good,” said Joe Jones, the CEO of Bridgepoint, an Austin, Texas-based semiconductor testing company with clients such as Texas Instruments Inc. and Philips Electronics NV. “We’ve painted ourselves [into] a corner.”
Jones warns that if a new chip architecture that allows for more thorough internal testing is not developed, more recalls of high-performance chips can be expected. And, he added, a huge increase in network and Internet communication errors will begin to occur as even minor miscalculations multiply.
The SIA (Semiconductor Industry Association) agrees, and has identified a built-in self test for processors as one of five objectives the processor industry must resolve if it is to continue to follow Moore’s Law of ever-increasing clock speeds, an industry precept.
Achieving the SIA’s objective, however, will require a radical and expensive re-design that chip makers are not prepared to shoulder, Jones said.
“It will take a serious amount of resources to see any change,” Jones said.
Currently, high-speed chips such as Intel’s Pentium III and AMD’s Athlon processor are tested by submitting the chips to a series of known algorithms that yield predictable results. Jones said this “black box” approach does not reveal everything that’s happening in the chip, but just confirms a narrow set of tests based on prior knowledge.
“We are at the breaking point of the current algorithms,” he said. “If you introduce the least amount of unreliability, you wind up shipping computers that are grossly unreliable.”
Dave Ranhoff, COO of Credence Systems, a Fremont, Calif.-based test equipment maker, concurs. “Suppliers will hit a wall trying to meet next-generation device test pressures.”
Officials for Santa Clara, Calif.-based Intel and Sunnyvale, Calif.-based AMD each responded by saying they were confident of their testing procedures.
Yet Nathan Brookwood, an analyst at Saratoga, Calif.-based Insight 64, said that Intel and AMD science have indeed pushed the limits of physics. “The fact that the engineering and testers do as well as they do is really quite awesome.” But “The industry tends to make evolutionary changes, not revolutionary changes. And the changes they make to accommodate this extra complexity will not dramatically change things, but incrementally change things,” Brookwood said.
In fairness, Jones credits the technological savvy of Intel and AMD engineers for being able to crank up the speed of their processors to the rate they have, and compares them with the engineers who developed supersonic jets in the pre-transistor age.
“The market has done the right thing. They have the most performance with the lowest cost, but the time is now for a change,” Jones said.
At ITC 2000, an international testing conference held last month in Atlantic City, N.J., “the talk was all about high-speed testing and how to fix it,” Jones said. “There was no clear solution – the entire theme of the conference was finding forward solutions [for] high-speed tests, but the people in the hallways were shaking their heads asking ‘how the heck are we going to do this?’ “
Jones believes a software approach to a high-speed processor self-test has the best chance for success. By adding additional intelligence to high-speed processors at the software level, precise testing may be feasible, he said.
Two specific problems plague the hardware components in high-speed processors, according to Jones. The first he called a “race condition,” where data moving at vastly different speeds within the processor fails to accurately synchronize, causing a miscalculation. This condition can cause “addition to get there faster than multiplication, for example, mixing up the logic and yielding a bad result,” Jones said.
Second, “there are chip arbiters that decide when and where things occur on a branch instruction inside the processor, but when you speed them up, things that occur in a certain order in the slower surrounding architecture can wind up being in reverse order, also bringing a bad result,” Jones said.
Using a group of slower processors in parallel processing could resolve the problem, but if this fix was applied to PCs, Jones said “you would then be facing super computing problems with a PCs.”
“Servers went multiprocessor a few years ago to avoid the problems PCs are hitting today,” Jones said.