By allowing users worldwide to share and distribute music for free via the Web, Napster has already made headlines – not to mention ruffling more than a few feathers – by challenging the music industry’s traditional distribution system. Soon, similar technology may be used in a whole new way – to facilitate the sharing of genetic data by scientists involved in the Human Genome Project.
When it began in 1990, the Human Genome Project’s first objective was to sequence the more than 100,000 genes in human DNA. That goal was met in June 2000. Now it’s on to the next step: figuring out the interactions of the three billion base pairs of guanine, thymine, adenine and cytosine that make up those genes. That could take a while, especially because much of the data is scattered in the databases of scientists across the world.
Lincoln Stein, a bioinformaticist at the Cold Spring Harbor Laboratory in Cold Spring Harbor, N.Y., has proposed a way for researchers working on the human genome to share data through the Distributed Sequence Annotation System (DAS), which would use technology similar to both Napster and Gnutella, the Nullsoft software that lets users share more than just music files.
At the heart of DAS would be a centralized reference server that would hold a detailed map of the genome; researchers could then make comparisons and annotations to that map using information maintained on their own hard drives.
“About a year ago, a lot of the data was inaccessible,” Stein says. “It was in paper journals or ad hoc Web sites. You couldn’t take it [from the Web sites] and easily compare it because of different formats.”
DAS would provide a common format in XML, which would be searchable through add-ons Stein and his team have built onto an Apache Web Server. The result? Users worldwide would be able to view relevant data from one another’s hard drives.
Stein says DAS differs from Napster in that it depends on someone manually creating links to data sources, whereas Napster goes through the entire Web. As a result, DAS “won’t fill up the bandwidth the way Napster does,” Stein says. “We run terabytes. Napster runs tens to hundreds of terabytes.”