Meta has redesigned its data centers in order to meet the increasing demand for artificial intelligence (AI) workloads.
Meta momentarily suspended other data center developments in order to focus AI activities, and went on a reconsideration of its infrastructure, which resulted in the scrapping of Meta’s in-house chip since it failed to fulfill expectations. Meta then adopted graphics processing units (GPUs) and went on a new endeavour to create a completely new in-house microprocessor.
The concept, according to Meta’s global head of data center strategic engineering, Alan Duong, would support multiple white space deployments, with clusters ranging from 1,000 to over 30,000 accelerators, depending on Meta’s individual needs. Each configuration and accelerator selection would necessitate a distinct approach to hardware and network system architecture.
Duong also described the data center’s architecture, emphasizing the integration of accelerators and network technologies for fast AI training. Because of the near proximity of the servers, fiber deployment was optimized, which is critical for linking these servers.
To address the higher power and cooling demands of GPUs compared to CPUs, Meta’s new data center design incorporates water cooling. However, the majority of servers will rely on air handling units, while an increasing number of AI training servers will benefit from a cooling distribution unit that supplies water directly to the chips. Notably, some of the cooling infrastructure employed by Meta includes the USystems ColdLogik RDHx.
The sources for this piece include an article in DataCenterDynamics.