Over 36 petaflops: New supercomputer from Oak Ridge

IT equipment, especially servers, should be updated every three to five years – that’s the conventional wisdom. Because then the service guarantee usually expires. However, some companies use their hardware longer. There are various reasons for this: a lack of financial resources, a volatile economic environment, the discussions on-premises versus cloud and so on.

And often the hardware itself did not justify an upgrade because new processor generations were hardly faster than the old ones. The result has been longer lifecycles for server hardware. A 2020 IDC survey found that around percent of respondents keep their servers for six years and 12.4 percent for seven years or more.

The latter group includes the Oak Ridge Leadership Computing Facility (OLCF), part of Oak Ridge National Labs (ORNL). In 2019, it decommissioned the supercomputer Titan, which had been put into operation in 2012 but is now unusable due to its outdated CPUs. It has been replaced by Crusher, which takes up just one-hundredth of the space of Titan – with higher performance and better power consumption.

Crusher is a Mini Me version of Frontier, an exascale supercomputer that’s scheduled to launch later this year or next. Both computers are based on the same hardware, but Crusher is much smaller and serves as a test bed for applications that will later run on Frontier.

The hardware is HPE Cray EX Blades with a 64-core AMD EPYC “Trento” CPU and four AMD MI250X GPUs. Trento is a derivative of the Milan generation of Epyc processors. The MI250X is part of AMD’s new family of enterprise GPUs that will see the manufacturer compete with Nvidia’s offerings. At least on paper, the hardware delivers extremely impressive numbers.

According to ORNL, Crusher has 192 HPE Cray EX blades in 1.5 rack cabinets that take up just 44 square meters. Titan, on the other hand, required 200 cabinets, occupying 4,352 square feet. Titan achieved a peak performance of 27 petaflops. While there are no benchmarks for Crusher yet, the MI250X is known to achieve a peak performance of 47.9 Teraflops in Double Precision Floating Point. Multiply that by 768 (four per blade times 192 blades) and you get 36.8 petaflops, well above the Titan’s peak performance. And that’s not even counting the CPUs.

Crusher is currently validating important science projects that will later run on Frontier. Since both systems share the same hardware, projects validated on Crusher should run smoothly on Frontier as well.