Meta believes that its new AI Research SuperCluster (RSC) will be among the fastest AI supercomputers running in the world once it’s fully built.
The supercomputer is being constructed in phases. In the current phase 1, the RCS already features 6080 Nvidia A100 GPUs, 175 petabytes of bulk storage, 46 petabytes of cache storage and 10 petabytes of network file system storage. Each GPU communicates over a 200 Gb/s HDR Infiniband network.
Once it is completed, the RCS will have 16,000 GPUs and a data system that can serve one exabyte of training data per second, reaching 2.5 times the performance of its current state. Meta hopes to reach the final stage in July.
Meta built the RSC to train large AI models in natural language processing (NLP), and to research and train models using trillions of examples. It will also aid in building AI that works across hundreds of different languages, will assist in analyzing media, and in developing augmented reality tools. Moreover, the company says that creating more advanced AI in speech and vision analysis will better identify harmful content.
Looking ahead, Meta wants to leverage the RSC to build entirely new AI models for workloads such as real-time translation, collaboration, and to ultimately help construct a richer metaverse.
Meta says this new system is already 20 times faster than its current Nvidia V100-based clusters in computer vision workflow, and three times faster than its current research clusters in training large-scale NLP models.
In its announcement, the company also spoke about privacy in broad strokes. It explained that the entire data path, from the storage systems to the GPU, is encrypted. All data must go through a privacy review to confirm that it has been correctly anonymized. And because data is only decrypted in memory, even a physical security breach will not leak any information.