Mapping out the human genome was relatively easy compared to the task that now faces scientists in the genomics field – understanding how complex, 3D proteins work. But the largest supercomputer in Canada should help MDS Proteomics Inc. in its quest.
The drug discovery company, which recently opened its doors, is now home to of one the 10 most powerful non-military supercomputers in the world – in this case, an IBM Linux-based Beowulf cluster with 202 nodes, each of which has twice the computing power of an average computer.
The computer is used, among other things, to check which molecules fit in with which 3D protein structures.
“So we can run through about two million small molecules to test whether or not they’ll stick to a protein target a week on one half of that computer system,” said Christopher Hogue, the CIO at MDS Proteomics and a scientist and assistant professor of biochemistry at the University of Toronto. “Screening on the computer is an advance that allows us to lower the cost of the physical screening.”
Hogue spent 10 weeks last year determining which type of supercomputer would best fit MDS Proteomic’s needs.
“And it was clearly an advantage to put it on an Intel platform as opposed to a RISC platform like (Compaq’s) Alpha or the PA-RISC from HP or even the AIX system from IBM. There was clearly a cost advantage for using Intel processors – a striking cost advantage. So a lot of our competitors in bioinformatics have adopted the Alpha platform and they’ve paid a lot more per processor for now roughly the same speed that we have,” he said.
“Three years ago, the Alpha platform was the best in floating point processor on the market. However, most of the bioinformatics, especially DNA sequencing has no floating point (calculations) at all,” Hogue added.
What’s important for the type of research that MDS is doing is the ability to do parallel computations, and it was trying to implement the software in a way that was parallelized and scalable that was the biggest challenge, Hogue said.
Though there are large strides being made in this field, the knowledge is still rudimentary, according to Caroline Kovac, general manager of IBM life sciences during a panel discussion on proteomics held recently at MDS’ new lab.
“There’s exponential growth of data, but there’s not exponential growth of knowledge or wisdom about that data yet. What we’re doing right now is using pretty rudimentary algorithms to mine the data and mine points of interest,” she said.
“It seems like every time we discover something here, we open a door, we discover 10 more questions.”
Despite the unanswered questions the field of proteomics is undergoing incredible strides thanks to the growing relationship between life sciences and information technology, said Anthony Pawson, the co-director of Samuel Lunenfeld Research Institute at Mount Sinai Hospital in Toronto and a professor at the University of Toronto.
“When I was a graduate student, people were just starting to sequence DNA . . . We’re in a similar revolution. In the protein world we’ve only just acquired the technology to really automate protein analysis. And we’re just starting to automate the proteomic data into a comprehensional form, because it’s only in the last two years we’ve really learned what are the principles underlying how proteins and cells are put together. We’ve never really had a way to compute all these complex things before.”
Information technology is also benefiting, Kovac said.
“Life sciences is one of the big growth segments in information technology. It is in fact driving high-performance computing.”