Tesla CEO and Twitter/X owner Elon Musk announced Monday that his AI startups, xAI, had officially begun training its Memphis supercomputer, what he describes as “the most powerful AI training cluster in the world.”
Once fully operational, Musk plans to use it to build “world’s most powerful AI by every metric by December of this year,” which presumably will be Grok 3.
Nice work by @xAI team, @X team, @Nvidia & supporting companies getting Memphis Supercluster training started at ~4:20am local time.
With 100k liquid-cooled H100s on a single RDMA fabric, it’s the most powerful AI training cluster in the world!
— Elon Musk (@elonmusk) July 22, 2024
xAI’s “Gigafactory of Compute,” where the supercomputer is housed, is located in a former Electrolux production facility in Memphis, Tennessee, and was announced just last month. Per Musk, the training cluster will utilize 100,000 Nvidia’s H100 GPUs. Those are based on the Hopper microarchitecture in a network roughly four times larger than the current state-of-the-art clusters. Those include the 60k Intel GPU Aurora at the Argonne National Lab, the ~38k AMD GPU Frontier in Oak Ridge, and Microsoft’s Eagle, which runs 14,400 NVIDIA H100 GPUs.
Opening this training facility constitutes the largest capital investment by a new-to-market company in Memphis’ history, according to President and CEO of Greater Memphis Chamber Ted Townsend. The supercomputer will be used “to fuel and fund the AI space for all of his [Musk’s] companies first, obviously with Tesla and SpaceX,” he said. “If you can imagine the computational power necessary to place humans on the surface of Mars, that is going to happen here in Memphis.”
However, despite the multibillion-dollar investment by xAI, the facility is only expected to generate a few hundred local jobs. What’s more, the “[Tennessee Valley Authority] does not have a contract in place with xAI,” per a report from WREG.
They “are working with xAI and our partners at [Memphis Light, Gas and Water] on the details of the proposal and electricity demand needs.” The TVA also pointed out that any project over 100 Megawatts (MW) needs its approval to connect to the state’s power grid. Musk’s facility could draw up to 150MW during peak usage, estimates MLGW President Doug McGowen.