SLC Member

Join Date: Jul 2011 Posts: 50 Rep Power: 11 Rep Power:





Specs:



R640

2 x Intel Xeon Gold 6246 (Cascade Lake) 12c, 4.1 GHz all core turbo

12 x 16GB 2933 MHz RAM (Dual rank)

Sub-Numa cluster enabled



R6525

2 x Epyc Rome 7302 16c, 3.3 GHz all core turbo

16 x 16GB 3200 MHz RAM (Dual rank)

NPS set to 4



The R6525 machine is 15 % cheaper than the R640 in the above spec. The rest of the specification list between the two machines is identical.



I've run a bunch of the different official Fluent and CFX benchmarks from ANSYS. For CFX I've used Intel MPI, and Fluent the default ibmmpi.



Average across the different benchmarks I've run:



The Epyc Rome system is:

On a core-for-core basis: 6.5 % faster in Fluent and 28 % faster in CFX(!). This is when the R6525 has been run with 24 cores (so as to compare to the Intel machine). It appears CFX is much more dependent on memory bandwidth (and the Epyc's 8 memory channels) compared to Fluent.

On a machine-for-machine basis: 28 % faster in Fluent and a whopping 48 % faster in CFX. This is when running on all 32 cores (compared to the 24 core load on the Intel system).

Changing from NPS=1 (default) to 4 for the AMD Epyc was a roughly 10 % gain in CFD performance. Enabling sub-Numa clustering on the Intel system was a roughly 3 % gain in performance.



Here's an example of my result.



Fluent

CFX

(See post below, forum spam filter is breaking my balls)



Something that's interesting to note is the scaling on the AMD Epyc - there's a very clear improvement in performance in every multiple of 8 number of cores. Look at the aircraft_wing_14m fluent benchmark for example, there are scaling and performance peaks at 16, 24 and 32 cores. You "do not" want to run the AMD system at 26 cores - it is slower than at 24 cores.



I'm guessing this is related to the CPU architecture and the splitting of cores into CCXs.



Other interesting observations are that the Intel system runs both hot and power hungry - approx. 550W at full load with CPU temps of 80 C, compared to approx. 400 W at full load with CPU temps of 60 C for the AMD system.



The decision is clear for me - I'll be building a mini-cluster consisting of four AMD Epyc Rome machines for a total of 128 nodes.

The alternative would be to purchase five Intel Xeon Gold Cascade Lake systems (for a total of 120 nodes). The Intel setup would be 30 % more expensive and 10 % slower overall! I could also go for 6 machines which ought to theoretically match the 4 AMD machines, but then for a dizzying 50 % price premium.



AMD Epyc Rome really is EPIC for CFD applications! I have been benchmarking two PowerEdge machines from Dell. One 24 core R640 (Intel Cascade Lake) and one 32 core R6525 (Epyc Rome). Both running Windows Server 2019.Specs:2 x Intel Xeon Gold 6246 (Cascade Lake) 12c, 4.1 GHz all core turbo12 x 16GB 2933 MHz RAM (Dual rank)Sub-Numa cluster enabled2 x Epyc Rome 7302 16c, 3.3 GHz all core turbo16 x 16GB 3200 MHz RAM (Dual rank)NPS set to 4The R6525 machine is 15 % cheaper than the R640 in the above spec. The rest of the specification list between the two machines is identical.I've run a bunch of the different official Fluent and CFX benchmarks from ANSYS. For CFX I've used Intel MPI, and Fluent the default ibmmpi.Average across the different benchmarks I've run:The Epyc Rome system is:Here's an example of my result.FluentCFX(See post below, forum spam filter is breaking my balls)Something that's interesting to note is the scaling on the AMD Epyc - there's a very clear improvement in performance in every multiple of 8 number of cores. Look at the aircraft_wing_14m fluent benchmark for example, there are scaling and performance peaks at 16, 24 and 32 cores. You "do not" want to run the AMD system at 26 cores - it is slower than at 24 cores.I'm guessing this is related to the CPU architecture and the splitting of cores into CCXs.Other interesting observations are that the Intel system runs both hot and power hungry - approx. 550W at full load with CPU temps of 80 C, compared to approx. 400 W at full load with CPU temps of 60 C for the AMD system.The decision is clear for me - I'll be building a mini-cluster consisting of four AMD Epyc Rome machines for a total of 128 nodes.The alternative would be to purchase five Intel Xeon Gold Cascade Lake systems (for a total of 120 nodes). The Intel setup would be 30 % more expensive and 10 % slower overall! I could also go for 6 machines which ought to theoretically match the 4 AMD machines, but then for a dizzying 50 % price premium.AMD Epyc Rome really is EPIC for CFD applications! Amiga500, Blanco, evcelica and 5 others like this.