Nvidia Reveals Blackwell B200 GPU, The ‘world’s Most Powerful Chip’ For AI

‘Built to democratize trillion-parameter AI.’

By Sean Hollister, a senior editor and founding member of The Verge who covers gadgets, games, and toys. He spent 15 years editing the likes of CNET, Gizmodo, and Engadget.

The Blackwell B200 GPU.

Image: Nvidia

Nvidia’s must-have H100 AI chip made it a multitrillion-dollar company, one that may be worth more than Alphabet and Amazon, and competitors have been fighting to catch up. But perhaps Nvidia is about to extend its lead — with the new Blackwell B200 GPU and GB200 “superchip.”

Nvidia CEO Jensen Huang holds up his new GPU on the left, next to an H100 on the right, from the GTC livestream.

Image: Nvidia

Nvidia says the new B200 GPU offers up to 20 petaflops of FP4 horsepower from its 208 billion transistors and that a GB200 that combines two of those GPUs with a single Grace CPU can offer 30 times the performance for LLM inference workloads while also potentially being substantially more efficient. It “reduces cost and energy consumption by up to 25x” over an H100, says Nvidia.

Training a 1.8 trillion parameter model would have previously taken 8,000 Hopper GPUs and 15 megawatts of power, Nvidia claims. Today, Nvidia’s CEO says 2,000 Blackwell GPUs can do it while consuming just four megawatts.

On a GPT-3 LLM benchmark with 175 billion parameters, Nvidia says the GB200 has a somewhat more modest seven times the performance of an H100, and Nvidia says it offers 4x the training speed.

Here’s what one GB200 looks like. Two GPUs, one CPU, one board.

Image: Nvidia

Nvidia told journalists one of the key improvements is a second-gen transformer engine that doubles the compute, bandwidth, and model size by using four bits for each neuron instead of eight (thus, the 20 petaflops of FP4 I mentioned earlier). A second key difference only comes when you link up huge numbers of these GPUs: a next-gen NVLink switch that lets 576 GPUs talk to each other, with 1.8 terabytes per second of bidirectional bandwidth.

That required Nvidia to build an entire new network switch chip, one with 50 billion transistors and some of its own onboard compute: 3.6 teraflops of FP8, says Nvidia.

Nvidia says it’s adding both FP4 and FP6 with Blackwell.

Image: Nvidia

Previously, Nvidia says, a cluster of just 16 GPUs would spend 60 percent of their time communicating with one another and only 40 percent actually computing.

Nvidia is counting on companies to buy large quantities of these GPUs, of course, and is packaging them in larger designs, like the GB200 NVL72, which plugs 36 CPUs and 72 GPUs into a single liquid-cooled rack for a total of 720 petaflops of AI training performance or 1,440 petaflops (aka 1.4 exaflops) of inference. It has nearly two miles of cables inside, with 5,000 individual cables.

The GB200 NVL72.

Image: Nvidia

Each tray in the rack contains either two GB200 chips or two NVLink switches, with 18 of the former and nine of the latter per rack. In total, Nvidia says one of these racks can support a 27-trillion parameter model. GPT-4 is rumored to be around a 1.7-trillion parameter model.

The company says Amazon, Google, Microsoft, and Oracle are all already planning to offer the NVL72 racks in their cloud service offerings, though it’s not clear how many they’re buying.

And of course, Nvidia is happy to offer companies the rest of the solution, too. Here’s the DGX Superpod for DGX GB200, which combines eight systems in one for a total of 288 CPUs, 576 GPUs, 240TB of memory, and 11.5 exaflops of FP4 computing.

DGX SuperPOD with DGX GB200 systems Image

Image: Nvidia

Nvidia says its systems can scale to tens of thousands of the GB200 superchips, connected together with 800Gbps networking with its new Quantum-X800 InfiniBand (for up to 144 connections) or Spectrum-X800 ethernet (for up to 64 connections).

We don’t expect to hear anything about new gaming GPUs today, as this news is coming out of Nvidia’s GPU Technology Conference, which is usually almost entirely focused on GPU computing and AI, not gaming. But the Blackwell GPU architecture will likely also power a future RTX 50-series lineup of desktop graphics cards.

https://www.theverge.com/rss/index.xml

Sean Hollister

March 18, 2024 7:39 pm

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

‘Built to democratize trillion-parameter AI.’

Leave a ReplyCancel reply

US engaging in ‘political posturing’ on TikTok ban with Joe Biden still an ‘avid user’

Overtime: Scott Galloway, Don Lemon (HBO)

New Rule: Collective Narcissism | Real Time with Bill Maher (HBO)

RFK Jr. and Bill Maher Debate Vaccines | Real Time with Bill Maher (HBO)

Could this job prevent them from meeting the right guy? | @whatever

Thank God For the 1st Amendment

Is THIS the New Charlottesville?

This Is The New Era Of Race Hoaxes, And It’s Terrifying

‘Built to democratize trillion-parameter AI.’

You May Like This --

Leave a ReplyCancel reply