Livermore Labs CIO David Cooper manages an advanced technology lab, but what he has learned about data management and employee recruitment and retention could benefit any IT operation.
Like many other young men at the time, David Cooper was enthralled by the Sputnik space capsule, which the Russians launched in 1957, when he was a high school student in Bolling, Tex.
But unlike most of his peers, Cooper followed through on his dream. He went to work on the Gemini, Apollo and space shuttle missions and became one of the most influential IT managers at NASA. Now, as the CIO at Livermore, Calif.-based Lawrence Livermore National Laboratory, he drives IT policy for what is arguably the most advanced computer installation in the world.
Cooper, who holds a doctorate in physics, is an unapologetic optimist on the potential benefits of computing technology to society. He says he sees his role as providing the tools to scientists to make the world a better place to live.
For example, the Accelerated Strategic Computing Initiative (ASCI), which he oversees, will make it possible for scientists to simulate a nuclear bomb blast in 3-D, eliminating the need for live testing.
Cooper shared his experiences and observations on everything from new technology initiatives to security, data management and recruiting with Computerworld’s (U.S.) Mark Hall.
CW: What are the technology advances of the ASCI program at Lawrence Livermore Labs?
Cooper: Cray [Research Inc., the original Cray computer company] spent between US$200 million and US$300 million in research and development for each generation of new supercomputer. They would sell, at most, a few hundred of them. It was apparent that pursuing this technology path would not allow us to perform a 3-D simulation of nuclear weapons in the lifetime of a weapons scientist. As a matter of fact, we estimated that it would take up to 6,000 years running on a Cray XMP.
So we looked for another solution that was faster and hopefully less costly. The answer was to use commodity parts – the workstations you and I have on our desks – and couple them together with a high-speed switch and special software to make the entire collection of parts work at some reasonable level of efficiency. This produces a “parallel” computer that consists of thousands of processors.
Taking a sophisticated computer code, like a nuclear weapons code, and partitioning it across thousands of processors is a very difficult task. But we have a lot of very smart people working on this problem.
CW: Parallel computing has been going on for a long time. It has lots of failures, such as Thinking Machines and Kendall Square. What has made ASCI work?
Cooper: First, we went to several large companies that have computing as a part of their business plan and got a commitment from them. Second, and perhaps more important, we concentrated on the software. It’s almost always the software, both application and operational, that make these large computers work. The development of a high-speed switch to achieve efficiency was also critical.
The ASCI program made a long-term commitment to invest in the development of all aspects of these machines. Converting any sophisticated computer code … that’s running fine on a Cray-type computer to a massively parallel machine requires a large investment and an awful lot of time.
CW: Describe the ASCI program commitment. Was it hard to sell?
Cooper: The ASCI program plan calls for a 10-year investment in all aspects of large parallel computers. The budget is currently about US$600 million per year. Only a quarter of the budget goes for computing platforms. About 40 per cent goes to the application teams to develop new codes. At Livermore, we have teams of up to 30 people all working together to develop codes that work efficiently on these large systems.
A large part of the selling of the [ASCI] program was convincing computer companies to bid on machines that consisted of thousands of processors. Everyone knew that they would not be selling many 8,000-processor machines. But they would be selling thousands of similar but smaller systems, such as 32- or 64-processor machines.
CW: You already have nearly 3 petabytes of data with the ASCI project. Have you learned anything about data management that’s applicable to other CIOs?
Cooper: The techniques we use for data storage, compression and analysis are readily applicable to a number of applications. Weapons designers previously used two-dimensional tables of numbers in their analysis. With terabyte-size data files, they can no longer do this, so we made a large investment in scientific visualization capability. At the lab, we have a 9-by-15-foot visualization wall that we use to display the results of simulations. I have seen weapons designers who have been designing weapons for 30 years say, “I didn’t know this was going on.” These visualization techniques are also applicable to numerous other scientific and engineering disciplines. The ASCI program, by investing in scientific visualization and data analysis techniques, has opened up new markets for these capabilities.
CW: The ASCI project has massive power and cooling requirements. How do you handle the facility infrastructure around the computing infrastructure?
Cooper: We work with multiple vendors to get estimates of the power and cooling requirements of their future systems. If the estimates are too far apart, we go back and ask for a refinement. We do this until we are satisfied that we have a reasonable estimate of the requirements. Back-of-the-envelope calculations won’t do. We need to know the details about these systems’ requirements because we are building the facilities to house them right now.
CW: How do you attract computer scientists to the lab when in nearby Silicon Valley they can earn more and get stock options?
Cooper: I use the ASCI program as a recruiting tool. I say to people, “Look, here’s an opportunity, maybe once in your life, to make a difference in the environment [eliminating underground nuclear tests] and to work on a pre-eminent, leading-edge, defining state-of-the-art program. How many times in your career do you think you’re going to be able to do this? Come work for me for three years and make a difference. Then you can go over to Silicon Valley and make your millions.” I know that once they come to work at the lab that they will love the challenge and the environment and many will stay.
I believe that if we continue to develop ASCI-like computers, we will, in the next generation or two, be able to simulate the human body and determine whether or not a drug taken into the human body will result in a deformed child or even result in cancer at some later stage of life.
If the automobile companies had the supercomputers 10 or 12 years ago that we have today, they would have been able to design an air bag that when deployed would be gentle on a young adult or infant.
Other applications are [accurate] weather prediction in advance for, say, 21 days, prediction of the impact of deforestation, environmental cleanup, safer and more efficient aircraft, etc.
CW: Once you get people into the lab, how do you develop their careers?
Cooper: We have a variety of programs and opportunities for training available to our employees. We don’t have a formal mentoring program yet, [but] we make sure that people are associated with someone in their field. The worst thing you can do is hire someone and turn them loose without any real guidance.
CW: How do you train employees to be managers?
Cooper: Well, I think Livermore has not done a very good job of this. Consequently, some of the organizations, including mine, have started an Emerging Leaders program. First-line supervisors nominate people, or people can self-nominate to participate in the program. We introduce them to management and management techniques. We invite in speakers. We expose them to different parts of the lab. By having a colleague or a working relationship with someone in another organization at the laboratory, we can hopefully avoid the “stovepiping” that takes place in large laboratories like Livermore.
CW: If you had to give one piece of advice to a CIO building a new data centre, what would you tell him or her?
Cooper: The most important thing I could tell them is to pay strict attention to cybersecurity. I’m convinced that I could select a team and get to virtually anything connected to the Web. There are so many vulnerabilities out there. If one needs to worry about the integrity of the data, then one must worry about cybersecurity. Before one is willing to accept the risks, I strongly recommend that there be a detailed threat identification and a formal risk analysis. In my opinion, CIOs need to go out and hire a chief security officer and fund a staff to support this activity. There are too many people who simply put too much faith in firewalls. There are many levels of sophistication of firewalls and many are just capable of keeping out the “kiddie” hackers, quite frankly.