Penguin Unveiling New Products at SC16

by Phil Pokorny, CTO

Knights Landing with NVMe Drives

One of the really exciting things going on in Penguin Engineering right now is the second iteration of the Knights Landing sled, (with the Intel® Xeon Phi™ coprocessor), our Relion Phi X1030e. The Relion Phi X1030e is a building block of our Tundra™ Extreme Scale (ES) Series Solution. This new version offers a high performance storage option with up to 8 NVMe drives. NVMe drives are relatively new, and a lot of our platforms only have one, maybe two, NVMe drives in them. But, here’s an opportunity to put 8 drives in one system, where each drive is 1.5 to 2 TB in capacity. This is a lot of storage that will be really fast because they’re SSDs and also low latency because they’re NVMe.

I am excited to see how fast and well these perform. They should deliver a ridiculous amount of storage bandwidth so that the Knights Landing processor floating point capability is truly unleashed. Think about it: local storage and network communication with equal, world class, performance. This would really benefit workloads requiring random access to large data sets, such as in image processing.

New GPU Option for OCP

The other product is our Relion X1904GT sled, a 4 GPU option for the Open Compute Project (OCP) platform. Here, we support 4 GPUs in one open rack unit. This is the same GPU per rack unit density as our densest 19” platform option, the Relion 2908GT. The CPU to GPU ratio is different; where the 19” platform is 8 GPUs to 2 CPUs in two rack units, the new OCP platform is 4 GPUs to 2 CPUs in one open rack unit. The exciting thing about the OCP platform is that it has modular components that allow for flexible GPU to CPU ratios with different PCIe topologies.

Historically, when you look at GPU platforms, there are fixed configurations with fixed PCIe topologies which determine how the GPUs are attached to the CPUs. So, the designer of the platform has to make certain decisions, like whether these GPUs are attached to these CPUs and also how the bus bandwidth gets allocated.

We are really excited to offer this new Penguin Computing platform for GPU-accelerated computing. The initial road map has 4 different configurations. The first is the classic model, where each GPU is directly attached to a CPU. The second configuration uses a PLX switch chip to allow GPUs to talk directly to the network interfaces. There’s technology from Mellanox and NVIDIA that allows the GPU to talk directly to the NIC and send and receive packets directly, without the involvement of the CPU. This allows very fast, low latency access of GPUs over the network. NVIDIA is very excited about this and there may be customers who are interested in lowering the cost of their CPUs because they don’t use them. That is, they really only use the GPUs, so they don’t want to pay for an extra CPU just to get the extra ports for the GPUs or they have software patterns where GPUs talk to each other. This way, all 4 GPUs talk to each other without involving the CPU, which also improves performance. This should excite anyone working in GPU-focused workflows, such as machine-learning.

Recent Posts
intel omnipath architecture fabric penguin computing