Preparing for the Future of HPC with Open Technology
By Phil Pokorny, CTO, Penguin Computing
Its seems like exciting new high-performance computing (HPC) technology is constantly appearing these days. That means systems need to be flexible to meet the increasing demand for HPC but, at the same time, scale.
The trouble is vendor lock-in restricts you to that particular vendor’s ecosystem rather than allowing you to choose from a wider, less expensive and often more innovative set of options. Open source technology, especially Open Compute Project (OCP) solutions, can help organizations remove this obstacle and increase their return on investment (ROI) today and prepare you for the inevitable shift to SDx.
OCP vendors share design specifications so that each can build efficient, interoperable infrastructure components or even full system, as is the case with vendors like Penguin Computing. This approach drives down costs while providing robust performance, as demonstrated by the Department of Energy’s (DOE) National Nuclear Security Administration (NNSA) CTS-1 program where Penguin Computing has deployed 16 OCP-based Top500 supercomputers since 2016.
These systems were all built using the OCP-based Penguin Computing Tundra® Extreme Scale high-performance computing (HPC) platform using Intel® Xeon® processors, ranging from the Xeon E5 to the latest Xeon Scalable Processors. As a result, the DOE was able to bring down the cost of HPC systems from approximately $100 million per teraFLOP in 1995 to less than $5,000 per teraFLOP today (factor of 20,000) with greater computing power and energy efficiency with each generation.
Benefits like these have caused sales for OCP-based infrastructure to exceed $1.2 billion in 2017, excluding spending by OCP board members Facebook, Intel, Rackspace, Microsoft, and Goldman Sachs, according to industry research firm IHS Markit.
There are some differences between OCP and traditional designs. Usable width is larger (21 inches vs 17 inches in a “19 inch” EIA rack) so you can put more high-value technology into each rack unit. Complex rails are replaced with simple shelves. Power is provided by a common power source dramatically reducing the number of power supplies, eliminating power distribution units (PDUs) and power cords as opportunities for failure.
As a result, node costs are 15-20% less and reliability is improved compared to 19-inch servers. Add in the efficiency of processors such as the Intel Xeon Scalable Processor family and that expands even more. Plus, service for OCP servers is from the front, enabling field replaceable systems, making maintenance simpler and less costly.
There are also thousands of supported and tested open operating systems, software stacks, networking stacks and other software to choose from, which all cost less than proprietary software. The design specifications give you the flexibility of using different hardware technologies, such as X86, ARM, storage, and, software-defined anything (SDx).
OCP includes the entire ecosystem of infrastructure technologies, including data center, HPC, and artificial intelligence (AI) components. Combined with standardization and modularity, you can configure systems for whatever you need. You can even integrate graphics processing unit (GPU) accelerators in the servers, paving the way for future, heterogeneous environments.
This also means you no longer have to replace the whole system based on the lifecycle of the shortest-lived component. Instead, you replace or upgrade the modular components and keep your investment in the overall system. So, for example, as new, more powerful processors come out, you can exchange them with your old processors and gain increased power without having to buy a full system.
Some people have concerns about security, but OCP-based systems are no more or less vulnerable to attack than other computing systems. Your security needs are no different than in a 19-inch environment, like using specific encryption on data, and properly restricting networking and access to your systems. And you have the benefit of more security experts examining and improving the solutions you deploy.
In short, there are many reasons to choose open technologies, especially OCP. Next time your organization is considering an infrastructure change, take a moment to see how OCP can help you gain greater ROI and maybe even leapfrog your competition.
Learn more about the value of OCP-based technology at www.penguincomputing.com/tundra.