All the way back at Hot Chips 2017, we saw Intel launch its upgraded Xeon Phi processors, Knights Mill. These were updated versions of Knights Landing, using the same upgraded Silvermont x86 cores paired with AVX-512 units and MCDRAM, but focusing on variable length instructions for machine learning. Despite the launch way back when, we had not heard much about anyone using them. Until this past week, that is.

The Xeon Phi ecosystem was actually fairly popular for high-performance compute, with several equipped systems in the top500 supercomputing list. The combination of additional vector processing hardware with x86 compliance has helped start new collective groups. These include the Xeon Phi User Group (now the Intel Extreme Performance User Group, or IXPUG) for projects such as machine learning, matrix processing, and even visual computing. IXPUG holds a panel presentation every year at Supercomputing.

The road of Xeon Phi has not been easy, given that over the last two years Intel cancelled the Knights Landing generation PCIe cards, then stated that Knights Mill was going to be a socketed-only product, before essentially killing off the entire family altogether. (We believe the Cascade Lake-AP platform is designed to fill that role in 2019.) However, despite Knights Mill getting a mention, I haven’t seen any mentions of them in action.

At Supercomputing 2018 this past week, as is usual, I had a walk through the poster presentation room. This is usually a mix of doctoral student work, academic research bodies, and vendor assisted implementations fpr software solutions to mathematical problems. Every so often there’s an interesting hardware related poster for new silicon or a better way to manage silicon, such as the fabric network controller for GPUs we reported on in 2015. Not much luck this year, however I did spot that one of the presentations explained that the researchers were using Knights Mill for their work. It was the only place I saw, in the *whole show*, that mentioned Knights Mill.

The poster, from Joshua Davis, a student at the University of Delaware, was looking at the effects of limiting the peak power consumption of the Xeon Phi processor and correlating its effects on compute time for data-intensive benchmarks. This naturally relies on the internal DVFS implementations for voltage and frequency adjustments as well as accurate internal power reporting by the system to itself.

The idea behind the research is that there is typically a fine balance between power and execution time, and the out-of-the-box numbers are typically outside that ideal power efficiency window. By adjusting and profiling the parameters, the ideal power efficiency point could be obtained for each of the benchmarks.

Results showed a mix of situations where reducing the power made no difference to the run time, or lengthened the run-time and saved power, or lengthened the run time but still consumed the same power. Ultimately the difficulty with this sort of testing, aside from accurate measuring, is that a full profiling set is not easily automatable – power limitations are often set at the BIOS level, requiring restarts between tests. It also means that if the high-performance software that contains parts of these benchmarks is used, a medium has to be found given that the power limit can only be applied once at machine-on.

Systems in play look like single node development towers, which I know that Intel commissioned SuperMicro to make as part of its expansion into development work. Here it states that the KNL system was a mid-range part with 68 cores and a TDP of 215W, whereas the KNM system was a top end model with 72 cores and a TDP of 320W. Both systems were tested at 215W, 140W, and 120W.

On a scale of zero to ‘I’m amazed someone is using a Knights Mill’, then I’m amazed that someone is using Knights Mill. It wasn’t stated if the KNM specific instruction sets were being used (or even if they were applicable). I wonder if we’ll see any mention of KNM at next year’s Supercomputing.

Related Reading