Benchmarks Influence AI Server Design

June 28, 2021

Brandon Lewis

Processor manufacturers are racing to entrench themselves in the growing AI market. As a result, a slew of computing products have been introduced or reinvented to serve AI use cases. These include well-known processing options like CPUs and GPUs, and more novel solutions like vision processing units (VPUs).

But as these devices make their way to deployment in real-world systems, the datasheet performance specifications become essentially meaningless. What matters to design engineers is how processors fare in particular use cases. They want to know about features and optimizations that can boost efficiency, reduce cost, lower power consumption, and enable new capabilities.

But evaluating multiple solutions against these parameters can take a lot of time.

Engineers at ComBox Technology, an IT and neural networks systems integrator, made time to benchmark several computing solutions, before designing its AI server. They measured different options based on cost per frames per second (FPS) of executing AI algorithms—a key measurement for calculating ROI in computer vision systems.

“What we found is that the Intel^® NUC8i5BEK, based on 8^th generation Intel^® Core^™ processors, provided the most value in these workloads—with an average cost per FPS of just over $4.00 per month,” says Dmitriy Rytvinskiy, general director at ComBox.

Processor Cost per FPS Revealed

The ComBox engineering team began their cost-per-FPS experiment with several options for the main deep-learning processor. These include chips, graphics cards, and accelerator modules from multiple vendors

They tested these platforms using two popular image classification convolutional neural networks (CNNs): U-Net and DarkNet-19. The ComBox evaluation used image input sizes of 768 x 512 and 576 x 384 pixels for the U-Net algorithm, and 256 x 256-pixel image data for DarkNet-19.

The two CNNs were run separately and on individual processing elements, even within the same device. In other words, devices that contain both a CPU and GPU or integrated graphics unit—like select Intel Atom^® processors, Intel Core processors, or Intel NUC platforms based on either processor—were tested more than once. In all cases, the neural networks were optimized with frameworks like the Intel^® OpenVINO^™ Toolkit or TensorFlow/TensorRT engines.

To calculate the value of each contender, ComBox testers simply divided the cost of the product by its FPS performance per workload and selected the device that provided the most value across all of the workloads. And as noted, the NUC8i5BEK provided the best cost/performance value.

After going through the process of testing #AI compute alternatives, ComBox has produced a power-efficient, performant inferencing solution for many types of workloads. @insightdottech

Beating the Benchmark with a Video Encode/Decode Cheat Code

The NUC8i5BEK is built around the Intel Core i5-8259U. But in the ComBox battery of inferencing benchmark tests, it was not the device’s CPUs alone that provided the most bang for the buck. It was the integrated Intel Iris Plus 655 graphics unit. But that’s not the only trick up the NUC’s sleeve.

While the Core i5 CPU cores played no part in the algorithm execution itself, they did handle the image encoding and decoding, allowing the graphics unit to remain dedicated to the inferencing workloads. This isn’t to say that other SoCs and cards in the benchmark didn’t take advantage of a similar architecture. Some did. But the combination of Intel^® Iris^® Plus 655 graphics, the multi-threaded quad-core CPU, and OpenVINO outperformed them all at a lower price point.

“Based on the benchmark results, we designed the NUC8i5BEK into our server,” says Rytvinskiy. The platform can simultaneously execute neural networks against up to 80 Full HD IP camera video streams.

Packaged for AI and Vision Processing Power

Off the shelf, the NUC is packaged as a complete system with an enclosure, I/O, and other trimmings that make it ready for uses ranging from prototyping to light commercial deployment. But clearly, the form factor and packaging is not suitable for integration into a rack server, so the ComBox team integrated the NUC motherboards, eight at a time, into a 1U server rack (Figure 1).

ComBox server with 8 Intel NUC motherboards in a single server. — Figure 1. The ComBox 8xNUC Rev 2 server is built around eight Intel^® NUC8i5BEKs stripped down to their motherboards. (Source: ComBox)

The eight hot-swappable NUC modules are accompanied by two hot-swappable power supply units (PSUs) and a front-panel display that provides control over the modules. Collectively, the eight modules provide 32 cores and 64 threads of processing power, with a combined total of 3,072 integrated GPU cores and 1GB of EDRAM.

“According to our own modeling, the 8xNUC-based server can outperform other solutions by allocating the right amount of resources to each workload,” Rytvinskiy says. “And because of the NUC’s low cost per FPS, the server is just half the cost of similar platforms based on alternative AI processing technologies.”

Designed for AI and DL

After going through the process of testing AI compute alternatives, ComBox has produced a power-efficient, performant inferencing solution for many types of workloads. The company has published a paper illustrating how NUCs can be used to create high-efficiency, low-cost AI solutions, including one project to build a computer vision-aided smoke detector based on the same NUC8i5BEK.

While perhaps unexpected, the familiar setup of Iris Plus 655 and Core processor CPUs bring more value to CV inferencing than even newer AI processing solutions. So why spend more for less?