Google Dethrones NVIDIA With Split Results in Latest Artificial Intelligence Benchmarking Tests

Digital transformation is responsible for creating artificial intelligence workloads on an unprecedented scale. These workloads require companies to collect and store mountains of data. Even though business intelligence is extracted from current machine learning models, new data streams are used to create new models and update existing models.

Advertising

Building AI models is complex and expensive. It is also very different from traditional software development. AI models need specialized hardware for accelerated computing and high-performance storage, as well as purpose-built infrastructure to handle the technical nuances of AI.

In today’s world, many critical business decisions and customer services rely on accurate machine learning insights. To train, run, and scale models as quickly and accurately as possible, a company has the knowledge to choose the best hardware and software for its machine learning applications.

Performance Calibration

MLCommons is an open engineering consortium that has made it easier for companies to make machine learning decisions with its standardized benchmarking. Its mission is to make machine learning better for everyone. Testing is done and unbiased comparisons help companies determine which vendor best meets their AI application requirements. The MLCommons Foundation began its first MLPerf benchmarking in 2018.

MLcommons recently conducted a benchmarking program called MLPerf v2.0 Training to measure the performance of hardware and software used to train machine learning models. 250 performance results were reported by 21 different bidders including Azure, Baidu
BIDU

BIDU
Dell, Fujitsu, GIGABYTE, Google
GOOG

GOOG
Graphcore, HP
the

the
E, Inspur, Intel-Habana Labs, Lenovo, Nettrix, NVIDI
VIDI

The

IAD

VIDI

The

IAD
A, Samsung and Supermicro.

This series of tests focused on determining how long it takes to train various neural networks. Faster model training allows for faster model deployment, which impacts total cost of ownership and model ROI.

A new object detection benchmark has been added to MLPerf Training 2.0, which trains the new RetinaNet reference model on a larger and more diverse dataset called Open Images. This new test reflects state-of-the-art ml training for applications such as collision avoidance for vehicles and robotics, retail analytics and many more.

Results

Machine learning has seen a lot of innovation since 2021, both in hardware and software. For the first time since the debut of MLPerf, Google’s cloud-based TPU v4 ML supercomputer outperformed NVIDIA A100 in four of eight training tests covering language (2), computer vision (4), learning by reinforcement (1) and recommendation systems (1).

According to the graph comparing the performance of Google and nvidia, Google had the fastest training times for BERT (language), ResNet (image recognition), RetinaNet (object detection), and MaskRCNN (image recognition). On DLRM (recommendation), Google narrowly edged out NVIDIA, but it was a research project and unavailable for public use.

Overall, Google submitted scores for five of the eight benchmarks, the best training times are shown below:

In a chat with Vikram Kasivajhula, Google’s director of product management for ML infrastructure, I asked what approach Google was using to make such dramatic improvements to TPU v4.

“We focused on the problems of heavy model users innovating at the frontiers of machine learning,” he said. “Our cloud product is actually an instantiation of that goal. We also focused on performance per dollar. As you can imagine, these models get incredibly large and expensive to train. One of our priorities is to make sure it is affordable. »

A one-of-a-kind submission

A unique submission was made to MLPerf Training 2.0 by Stanford graduate student Tri Dao. Dao submitted an 8-A100 system for BERT training.

NVIDIA also had a submission using the same setup as Dao. I suspect this was a courtesy submission from NVIDIA to provide Dao with a documented point of comparison.

NVIDIA completed training the BERT model with its 8-A100 in 18.442 minutes while Dao’s submission took only 17.402 minutes. He got faster training time using a method called FlashAttention. Attention is a technique that mimics cognitive attention. The effect enhances some parts of the input data while decreasing other parts – the motivation is that the network should focus more on the small but important parts of the data.

Wrap

Over the past three years, Google has made a lot of progress with its TPU. Likewise, NVIDIA has been using its A100 successfully for four years. Much of the software improvement has been brought to the A100, as evidenced by its long history of achievement.

We’re likely to see NVIDIA submissions in 2023 using both its A100 and the new H100, a beast by any current standard. Everyone was hoping to see the performance of the H100 this year, but NVIDIA didn’t submit it because it wasn’t publicly available.

Software improvements in general were evident in the latest results. Kasivajhula said hardware is only half the story of Google’s improved benchmarks. The other half was software optimizations.

“Many of the optimizations were learned from our own industry-leading YouTube and search benchmark use cases,” he said. “We are now making them available to users. »

Google has also made several performance improvements to the virtualization stack to fully utilize the computing power of CPU hosts and TPU chips. The results of Google’s software improvements have been demonstrated by its peak performance on the image and recommendation models.

Overall, Google’s Cloud TPUs deliver significant performance and cost savings at scale. It will take time to see if the benefits are enough to entice more customers to switch to Google Cloud TPUs.

In the longer term, Google’s better results in the main categories could presage that NVIDIA will achieve less MLPerf results in the future. It is in the interest of the ecosystem to see strong controversies between multiple vendors for the best MLPerf performance results.

One thing is for sure, MLPerf Training 2.0 was much more interesting than previous rounds where NVIDIA picked up performance wins in almost every category.

Full results of MLPerf Training 2.0 are available here.

Paul Smith-Goodson is Vice President and Principal Analyst for Quantum Computing, Artificial Intelligence and Space at Moor Insights and Strategy. You can follow him on To babble for current information on quantum, AI and space.

Note: Moor Insights & Strategy writers and editors may have contributed to this article.

Moor Insights & Strategy, like all research and technology industry analyst firms, provides or has provided paid services to technology companies. These services include research, analysis, consulting, consulting, benchmarking, acquisition matching, and speaking sponsorships. Company has had or currently has paid business relationships with 8×8, Accenture
ACN

ACN
A10 Network
ATEN
k
ATEN
s, Advanced micro device
AMD
s
AMD
Amaz
AMZN
oh
AMZN
n, Amazon Web Services, Ambient Scientific, Anuta Networks, Applied Brain Research, Applied Micro, Apstra, Arm, Aruba Networks (now HPE), Atom Computing, AT&T
T

T
Aura, Automation Anywhere, AWS, A-10 Strategies, Bitfusion, Blaize, Box, Broadcom
AVGO

AVGO
C3.AI, Cal
CALX
I
CALX
x, Campfire, Cisco System
CSCO
m
CSCO
s, Clear Software, Cloude
CLDR
Ra
CLDR
Clumio, Cognitive Systems, CompuCom, Cradlepoint, CyberArk, Dell, Dell EMC, Dell Technologies
VALLEY

VALLEY
Diablo Technologies, Dialogue Group, Digital Optics, Dreamium Labs, D-Wave, Echelon, Ericsson, Extreme Networks
EXT

EXT
Five9, Flex, Foundries.io, Foxconn, Frame (now VMware)
vmw

vmw
), Fujitsu, Gen Z Consortium, Glue Networks, GlobalFoundries, Revolve (nonw Google), Google Cloud, Graphcore, Groq, Hiregenics, Hotwire Global, HP Inc., Hewlett Packard Enterprise, Honeywell, Huawei Technologies, IBM
IBM

IBM
Infinidat, Infosys, Inseego, IonQ, IonVR, Inseego, Infosys, Infiot, Intel, Interdigital, Jabil Circuit
JBL

JBL
Keysight, Konica Minolta, Lattice Semiconductor
the
Where
the
Lenovo, Linux Foundation, Lightbits Labs, LogicMonitor, Luminar, MapBox, Marvell Technology
MRVL

MRVL
Mavenir, Marseille Inc, Mayfair Equity, Meraki (Cisco), Merck KGaA, Mesophere, Micron Technology
MU

MU
Microsoft
MSFT
you
MSFT
MiTEL, Mojo Networks, MongoDB, MulteFire Alliance, National Instruments
ITENA

ITENA
Neat, NetAp
the
p
the
Nightwatch, NOKIA (Alcatel-Lucent), Nortek, Novumind, NVIDIA, Nutanix, Nuvia (now Qualcomm
QCOM

QCOM
), onsemi, ON
NAKED
U
NAKED
G, OpenStack Foundation, Orac
ORCL
the
ORCL
Palo Alto Net
PANW
work
PANW
s, Panasas, Peraso, Pexip, Pixelwork
PXLX
s
PXLX
Plume Design, PlusAI, Poly (formerly Plantroni
PLT
cs
PLT
), Portworx, Pure Stora
the
g
the
e, Qualcomm, Quantinuum, Rackspace, Rambu
RMBS
s
RMBS
Rayvolt electric bikes, Red Ha
RHT
you
RHT
Renesas, Residio, Samsung Electronics, Samsung Semi, SAP, SAS, Scale Computing, Schneider Electric, SiFive, Silver Peak (now Aruba-HPE), SkyWorks, SONY Optical Storage, Splunk, Springpath (now Cisco), Spirent, Splunk, Sprint
S

S
(now T-Mobile), Stratus Technologies, Symante
NLOK
vs
NLOK
Synaptic
SYNA
s
SYNA
Syniverse, Synopsis
the
ys
the
Tanium, Telesign, TE Connectivity, TensTorrent, Tobii Technology, Teradata
CDT

CDT
,T-Mobile, Treasure Data, Twitter, Unity Technologies, UiPath, Verizon Communications
VZ

VZ
VAST Data, Ventana Micro Systems, Vidyo, VMware, Wave Computing, Wellsmith, Xilinx
XLNX

XLNX
Zayo, Zebra, Zededa, Zende
ZEN
SK
ZEN
Zoho, Zoom and Zsca
SZ
first
SZ
. Patrick Moorhead, Founder, CEO and Chief Analyst of Moor Insights & Strategy, is an investor in dMY Technology Group Inc. VI, Dreamium Labs, Groq, Luminar Technologies, MemryX and Movandi.

Leave a Comment