WOA Issue 86
In this issue Nvidia Jetson Nano, $99 AI computer Nvidia…
It’s been a very good year for Arm, and not just because Arm-based chips are coming soon to a MacBook near you.
As workloads across the spectrum continue to scale, Arm’s knack for enabling highly customizable implementations means the UK-based company’s star is on the rise in more than one ecosystem. With new announcements about Arm-based hardware from companies as diverse as Amazon to Apple hitting the wires almost weekly, it’s time to take a deeper look at what’s cooking for Arm in the datacenter specifically.
While there is a lot going on in this space, here are the top companies (and trends) we think you should keep your eye on over the next year.
Let’s start with Arm themselves. They are making a big push to deliver more IP to server vendors with the Neoverse product line, and this is driving a lot of the implementations that you’ll find below. The Neoverse N1 (its codename is “Ares”) design hits a sweet spot by providing licensees like AWS and Ampere with a ready-to-go server-class core as a starting point. With the Neoverse roadmap, Arm is looking to help its licensees produce servers with less risk, higher value features for chosen use cases, and a faster time to market.
While Arm doesn’t make any mass-market systems of its own, they have made a limited number of Server Development Platforms (SDP) to help software teams get their code in order.
The big roll-out from Amazon Web Services at their 2019 re:Invent conference was Graviton2, a new chip powering their 6th generation CPU offerings. Priced to compete, Graviton2 is hailed as the “first in a number of ARM chips to challenge dominance in the overall server industry.”
This system is based on Neoverse N1 cores, and promises performance comparable to Intel Xeon at an attractive price. The new M6g server types from Amazon built on this core have been in private preview for some months, with claims of up to 40 percent cost savings available due to price and performance improvements. As of spring of this year, they are now available for the general market.
The performance improvements that AWS has helped drive for the Arm64 architecture have been substantial. One example of this is work that Amazon has done on PHP, showing 37 percent improvements in execution times on a benchmark between PHP 7.3 and PHP 7.4 due to these optimizations. In the real world, this means more throughput and scalability for any applications like WordPress. Interested? You can find more details on improving performance of PHP for Arm64 and impact on AWS Graviton2 based EC2 instances here.
Ampere is leveraging Neoverse-derived cores in their forthcoming Ampere® Altra™, due out in late 2020. The roadmap includes ArmV8.2 support and CCIX, a high-speed interconnect that supports bandwidth throughput.
Ampere™ Altra™ offers up to 80 cores at up to 3.0 GHz speed, and can sustain uniform performance across all cores, delivering predictable performance 100% of the time by fully eliminating the noisy neighbor challenge. Another selling point is power efficiency: Ampere™ Altra™ claims industry leading power efficiency/core, while packing 80 cores in a single socket and 160 cores in a dual socket, establishing new levels of power efficiency with scalability.
And after Altra comes the Altra Max, with up 128 cores per socket coming in 2021.
Marvell is planning to use its own architecture license to deliver ThunderX3. They shared their processor roadmap with Wikichip which spells out a strategy for high performance cores, independent of the N1 designs. Of special interest in this design is 4-way multithreading, which produced the massive 256 threads featured in the previous ThunderX2 design.
This high core count (96) is particularly appealing for HPC applications, where scientific computing routines can readily take advantage of the vast number of cores for things like simulations. Marvell has had a number of design wins in the HPC and supercomputer space, powering high-end systems from HPE and Cray. We’ll see Marvell push the performance envelope.
Fujitsu’s A64FX features exceptional memory bandwidth. Its flagship supercomputer design, the “Fugaku” (Post-K) in Japan, is in the top 10 on the Green500 list of the most energy efficient supercomputers. The Fujitsu design is the first that incorporates SVE (Scalable Vector Extensions), a set of features which targets HPC applications including huge memory bandwidth available to applications.
All the design wins for this Fujitsu system have been for HPC applications so far. I do know quite a few developers who wouldn’t mind having one under their desk or in the cloud to perfect their SVE algorithms. Failing that, you can use ArmIE (“Arm Instruction Emulator”) to build SVE-ready codes and test them on non-SVE systems.
Brand new on the chip scene, and recently out of stealth mode is Nuvia, a Santa Clara, CA startup founded by some of the best and brightest SoC designers in the industry. The company has backing from Dell Technologies Capital and a dream team hailing from Apple, Broadcom, and Red Hat.
Promising to provide performance uplift with each new generation, the Nuvia team is putting its design chops up against industry stalwarts, building a custom CPU architecture focused on the hyperscaler market. Targeting very high single-thread performance — what Nuvia vice president of software Jon Masters calls a “sweet spot” in the market — Nuvia aims to raise the bar a step-function higher. While keeping a lid on the details for now, Nuvia has a goal to build the “highest performing server part” In the industry led by Arm innovation, says Masters.
NVIDIA has been producing both embedded systems with 64-bit Arm cores for AI applications, and GPU designs for high performance computing applications. An announcement out of SC19 promised widespread support for CUDA libraries on Arm systems, with a wide-ranging partnership planned with chipmakers Marvell and Ampere and HPC systems builders HPE and Cray.
On the smartNIC side, NVIDIA’’s acquisition of Mellanox in March of this year opens the door to adding Mellanox’s Bluefield line of Arm-powered smartNICs to their stable. The Bluefield 2 does CPU offload for network functions like deep packet inspection and NVMe storage.
By the end of the calendar year, long-awaited and rumored Arm-based Apple Silicon should be powering new MacBooks. Apple announced availability of a developer kit at WWDC and right away the open source world has been seen porting software to the new hardware.
Just like any porting process, some things work right away, some things take a little longer, and other projects are major milestones. What Apple has going for it is a large and vigorous development community that can tackle the thousands of ports found in the Homebrew package management system and work collaboratively to get thousands of independently developed software packages going on new hardware.
No Raspberry Pi is ever going to win a performance race in the server design market, but engineers from VMware and others are working away at getting standard Arm “Server Ready” firmware on this $75, quad-core, 8 gigabyte device to enable a whole class of new developers to have access to standards-based equipment in their home office and labs.
A team from VMware is leading the effort on UEFI firmware for the Pi and several other single-board computers, with a stated goal of getting off-the-shelf standard Arm64 server images booted without needing to support a Pi-specific device tree.