Determined to keep pace with competitors in the generative AI field, Meta is investing billions in its own AI initiatives. The company is allocating some of that massive amount of money towards AI researcher recruitment. However, the development of hardware, particularly chips dedicated to running and training Meta’s AI models, receives a significant portion of the budget.
Meta revealed its latest chip development today, which coincidentally happened to be a day after Intel’s announcement of its newest AI accelerator hardware. Meta has introduced the “next-gen” version, the latest iteration of the Meta Training and Inference Accelerator (MTIA). This advanced chip is capable of running various models, including those used for ranking and recommending display ads on Meta’s platforms, such as Facebook.
The next-generation MTIA is built on a 5nm process, which is an improvement compared to the previous version, MTIA v1, which was built on a 7nm process. In chip manufacturing, the term “process” refers to the smallest possible component size on the chip. The new MTIA model is bigger in size and comes with a greater number of processing cores compared to its previous version. Although it consumes more power at 90W compared to 25W, it comes with the advantage of having more internal memory at 128MB instead of 64MB. Additionally, it runs at a higher average clock speed of 1.35 GHz, a significant improvement from the previous 800 MHz.
Meta announces that the next-generation MTIA is now operational in 16 data center regions, providing a significant 3x improvement in performance compared to MTIA v1. If the claim of “3x” seems unclear, you’re correct; we had the same thought. Meta, on the other hand, only disclosed the figure after testing the performance of “four key models” on both chips.
“Due to our control over the entire stack, we can achieve higher efficiency compared to GPUs that are available commercially,” Meta states in a blog post shared with Eltrys.
Meta’s hardware showcase is quite unique for a few reasons, especially considering it follows a press briefing on their ongoing generative AI initiatives just 24 hours prior.
According to the blog post, Meta is currently not utilizing the next-gen MTIA for generative AI training workloads. However, the company states that it is actively investigating this through a variety of ongoing programs. Meta acknowledges that the next-gen MTIA will not replace GPUs for running or training models. Instead, it will work alongside them to enhance their capabilities.
Meta’s progress seems to be a bit sluggish, possibly not meeting its desired pace.
It is highly likely that Meta’s AI teams are facing significant pressure to reduce expenses. Just like a data scientist, the company is planning to invest a whopping $18 billion by the end of 2024 in GPUs to train and run generative AI models. Considering that training costs for advanced generative models can reach tens of millions of dollars, having in-house hardware seems like a very appealing option.
And while Meta’s hardware is falling behind, competitors are surging ahead, which, I imagine, is causing a lot of frustration for Meta’s leadership.
This week, Google made its latest custom chip, TPU v5p, available to Google Cloud customers. Additionally, they introduced their new dedicated chip, Axion, designed specifically for running models. Amazon possesses a variety of specialized AI chip families. Microsoft entered the competition last year by introducing the Azure Maia AI Accelerator and the Azure Cobalt 100 CPU.
In the blog post, Meta mentions that it took less than nine months to develop the next-gen MTIA, which is actually a shorter timeframe compared to the usual duration for Google TPUs. However, Meta still has a long way to go in order to establish its own independence from third-party GPUs and keep up with its strong competitors.