NVIDIA's Vera CPU posts its first independent numbers, on NVIDIA's bench

2026-05-28by

#benchmark #hardware #arm #cpu #nvidia #performance #measurement #linux

TL;DR

On 2026-05-26 Phoronix published the first public benchmarks of NVIDIA's Vera CPU (88 Olympus Arm cores, single-socket, pre-production): geomean ~1.5x a 128-core Intel Xeon 6980P and ~10% over a 64-core AMD EPYC 9575F — independently measured by Phoronix, but on a NVIDIA-curated test list at NVIDIA's HQ. (Phoronix)
A default Linux kernel compiled in ~20 seconds, which Phoronix calls the fastest it has measured. (Phoronix)
STREAM TRIAD sustained ~90% of Vera's rated 1.2 TB/s, which Phoronix calls the highest fraction of rated peak of any CPU it has tested. (Phoronix)
NVIDIA reposted all of this the same week under a "heavy-hitting punch" headline — vendor amplification of a third-party measurement, not a vendor measurement, but the test config was the vendor's. (NVIDIA)
Phoronix was not allowed to run perf-per-watt, and power and frequency monitoring were disabled. So efficiency, the whole point of a datacenter part, is absent from the data. (Tom's Hardware)

What got measured

Vera is NVIDIA's Arm server CPU for its next datacenter generation, 88 in-house Olympus cores, single socket, aimed at agentic-AI hosting. On 2026-05-26 Phoronix's Michael Larabel published the first public benchmarks. The framing you need before any number: NVIDIA invited Phoronix to its Santa Clara HQ to run them on pre-production silicon.

The headline geomean numbers, independently measured by Phoronix on that hardware:

~1.5x over the Intel Xeon 6980P, a 128-core Granite Rapids part (Phoronix puts the lead at roughly 55%).
~10% over the AMD EPYC 9575F, a 64-core Zen 5 part clocked at 5.0 GHz.
~63% over NVIDIA's own prior 72-core Grace CPU.

Two single-test results carry the story:

Kernel compile: ~20 seconds for a default Linux kernel build, which Phoronix flags as the fastest result it has recorded. This is the one to internalize if you wait on CI: a defconfig-class build collapsing to ~20s on one socket is a real number, not a marketing ratio.
STREAM TRIAD: ~90% of rated peak. Vera is rated at 1.2 TB/s of memory bandwidth, and TRIAD sustained about 90% of that. Phoronix says that is the highest fraction of rated bandwidth it has seen from any CPU. Bandwidth efficiency, not just bandwidth.

The per-core claims NVIDIA leans on (~2x kernel-compile throughput per core, ~4x memory bandwidth per core vs x86) are derived ratios against the 128-core Xeon, not separate measurements. Treat them as the same result re-expressed.

Test environment, from Phoronix via Tom's Hardware: Ubuntu 24.04 LTS, a patched Linux 6.18 LTS kernel, GCC 16.1. Single-socket throughout. No dual-socket numbers, so this says nothing about how the on-chip fabric scales across sockets.

Read the asterisks before the bars

Every number above is a Phoronix measurement, which is the part that gives it weight. The asterisks are about the conditions Phoronix ran under, and the article is upfront about them.

The workload list was NVIDIA's. Testing was restricted to compilation, STREAM, Python, Java, video encoding, and database-style tests. That is a defensible slice for an agentic-AI host, and it is also a slice. There is no SPEC suite, no adversarial mixed load, no workload NVIDIA did not pick. A geomean over a vendor-selected basket is a weaker claim than a geomean over a fixed public suite, because the basket itself is a degree of freedom.

Efficiency is missing on purpose. NVIDIA asked that power-consumption monitoring stay off, and CPU frequency monitoring was disabled too — power-management tuning was still in flight on pre-production silicon. Reasonable for an unfinished part, but it means perf-per-watt, the metric a datacenter buyer actually optimizes, is not in the dataset. A 128-core Xeon and a 64-core EPYC also have wildly different core counts and power envelopes than an 88-core Vera, so "1.5x" and "10%" are socket-vs-socket throughput, not perf-per-core or perf-per-watt.

Pre-production, single-socket, vendor's room. Shipping silicon can land slower or faster, and Phoronix could not bring its own comparison rigs into NVIDIA's setup the way it would in its own lab. None of this makes the numbers fake. It makes them provisional and venue-bound, which is a different thing from wrong.

NVIDIA's own recap repeats the 1.5x, 2x-compile, and 4x-bandwidth figures verbatim. That is vendor amplification of a third-party measurement — better than a pure vendor benchmark, since Phoronix controlled the runs and published the caveats, but the caveats are exactly what does not survive the trip into a launch post.

What to do with it

If you run Arm CI or memory-bound services, this is a "watch and re-measure" signal, not a procurement decision. The ~20s kernel compile and ~90%-of-rated TRIAD are the genuinely notable results, both bandwidth-and-throughput stories that map to build farms and in-memory data paths. Wait for shipping silicon, a dual-socket configuration, a perf-per-watt figure, and a run on a public fixed suite outside NVIDIA's building before you let any of these ratios into a capacity plan. The honest takeaway is narrow: on a vendor-curated single-socket bench, an 88-core Arm part beat a 128-core Xeon and edged a 64-core EPYC on throughput, and posted the best kernel-compile and STREAM-efficiency numbers Phoronix has recorded. Everything past that needs a second measurement someone else controls.

Sources

Phoronix: NVIDIA Vera CPU Benchmarks — Olympus Cores (primary independent measurement, published 2026-05-26; geomean ratios, 20s kernel compile, STREAM TRIAD, test environment)
NVIDIA blog: Vera CPU "Packing a Heavy-Hitting Punch" (vendor recap amplifying the Phoronix numbers)
Tom's Hardware: Nvidia offers restricted access to Vera CPU in first round of Linux benchmarks (reporting the access restrictions, disabled power/frequency monitoring, and OS/toolchain config that the primary's caveats reference)