Welcome to the Penguin Blog
It’s now official … we are an OCP solution provider
This week Penguin announced that it is now an official OCP solution provider. I know … that somebody is ‘excited’ about something is the last resort for marketing guys to convey a boring message … but we at Penguin are really excited … we believe that OCP will be here to stay and have a lasting impact on the hardware landscape (Is Open Compute a Game Changer?). I don’t want to regurgitate all the benefits of the open hardware concept (a great summary can be found at Tech Republic) but talk a little bit about why Penguin believes that OCP presents a great opportunity not only for customers but also for us. At first glance it is not that obvious how a company like Penguin that focuses on delivering hardware solutions would see OCP as an opportunity. One could argue that opening up hardware designs will lead to less potential for vendors to differentiate their offering which will result in lower profits as ‘everybody and their dog’ will be offering the same standardized hardware.
While it is true that efforts like OCP will lead to further ‘commoditization’ of hardware it takes expertise and skill to integrate this open hardware and the corresponding software, particularly in areas that tend to be more complex such as High Performance Computing. So on the one hand further ‘commoditization’ will add pressure on hardware prices … no doubt about that. On the other hand, lower prices drive more and larger deployments with inherently increasing complexity. While there are numerous providers that know how to build servers, the air gets thinner when it comes to solution providers that really know how to make things work. And that is exactly what we at Penguin have been doing for the last 15 years … deliver trusted scalable Linux solutions that work. And because delivering quality solutions rather than pushing boxes is our sweet spot we are excited about the Open Compute Project.
BTW ... If you want to know more about how OCP is expected to change the server market and how Penguin is embracing 'open hardware' ... an insightful article based on an interview with our CEO Charles Wuischpard was just published by The Register
Old Standards Never Die
Last week, I learned that old standards never die, they live on to make life more difficult for years to come.
A looong time ago, when I was in college, we used has a VAX server than the VMS operating system which had a file system naming standard that was case insensitive, only allowed a single dot between the name and the extension and included a version number. It clearly influenced the design of the High Sierra and ISO-9660 standards for CD-ROM filesystems. And last week, while working on some hardware where I couldn't enable PXE due to customer restrictions, it bit me. In this situation, I was working with our IceBreaker 2716 servers and IceBreaker 4772 JBOD chassis configured using Nexenta software to create a cost effective, high performance VM storage subsystem. I needed the IceBreaker 2716's to be configured to meet a customer requirement of PXE being disabled in the BIOS. But to be able to work with the system, it would have been very helpful to be able to boot a Linux NFS or readonly-root with access to a set of custom tools for firmware and diagnostics. Normally, we do that in the lab using PXE and NFS readonly root. But with PXE disabled, that wasn't an option. There were multiple systems, so the solution needed to be quick, easy and cheap. I could reboot the systems multiple times to go into the BIOS, enable and disable PXE, but with multiple systems that would be a huge waste of time. A pile of USB keys could work, but I didn't have enough USB thumb drives on hand. What I did have were CD-ROM blanks. Cheap! Easy to replicate! Awesome!
Because I already had infrastructure using the excellent PXELINUX support from H. Peter Anvin's SYSLINUX package, it seemed like using ISOLINUX (the equivalent tool for CD-ROM's) would let me just copy the config files, kernel and initrd to a CD and go. ISOLINUX says it supports long names so I thought it should just work. It would be a tiny image, that I could write quickly and it would leverage the infrastructure I already had configured.
WRONG.
I created the CD, rebooted the machine and it failed to boot because mkisofs had to munge the filenames so that "vmlinuz-2.6.18-308.13.1.el5" became "VMLINUZ_2_6_18_308_13_1.EL5;1" but I had told ISOLINUX to look for the original name. Oops.
To avoid the issue altogether I just renamed the kernel to vmlinuz and created an empty file with a file name reflecting the kernel version as a reminder for myself.
- [ppokorny@rps1 ~]$ ls -l _iso total 25928
- -rw-rw-r--. 1 ppokorny ppokorny 0 Jan 29 00:30 2_6_32_279.el6
- -rw-r--r--. 1 ppokorny ppokorny 22531009 Jan 28 20:26 initramfs.img
- -rw-r--r--. 1 ppokorny ppokorny 24576 Jan 29 00:53 isolinux.bin
- -rw-r--r--. 1 ppokorny ppokorny 168 Jan 29 00:53 syslinux.cfg
- -rwxr-xr-x. 1 ppokorny ppokorny 3986608 Jan 28 20:26 vmlinuz
and changed syslinux.cfg to read:
- [ppokorny@rps1 ~]$ cat _iso/syslinux.cfg
- default linux
- label linux
- kernel vmlinuz
- append initrd=initramfs.img root=nfs:192.168.54.1:/var/lib/tftpboot/centos6u3 ro readonlyroot rd_NO_MD rd_NO_LVM rd_NO_DM
Now I could get back to doing the real work...
What's cool about "Software Defined Networking"?
"Software Defined Networking" products are a new breed. One can find early examples of these switches on internet auction sites or searching for "open source switch". The latest versions of software defined networking (or SDN) give users more control over how their network is put together and how it works. This allows users to make the network an integral part of a flexible infrastructure where resources are allocated and configured in response to services being provisioned at the endpoints.
As an example of what SDN can do for users, consider one of the best and worst parts of Ethernet. Spanning Tree. This protocol allows Ethernet networks to recognize when a loop or multiple paths exist between switches and disables redundant or parallel links to prevent packets from being repeated endlessly. While this was extremely handy 20 years ago, it now limits network engineers ability to build high performance networks because spanning tree (and LACP bonding) place limits on how many parallel paths there can be between switches. Contrast this with Infiniband, where truely massive fabrics with full bandwidth between all endpoints are trivial to construct and manage.
The difference is in the way the network is managed. Ethernet requires an algorithm that can be evaluated in a distributed environment with only local information because there is no central agent in an Ethernet network. But Infiniband has a central agent called a subnet manager that sees all the paths in the network and can distribute and allocate traffic to make use of all the parallel links between endpoints. It does this once at connection setup and then gets out of the way so there is no performance impact for this central intelligence.
In a similar way, SDN provides that central intelligence for an Ethernet network of switches and allows the network to make global decisions to optimize the network for the workload.
The icing on the cake is that it's cheaper too.
Is Open Compute a Game Changer?
Last week’s fourth Open Compute summit in Santa Clara was accompanied by a huge media buzz. More participants attended than ever before and an increasing number of vendors are jumping on the bandwagon. Why is it such a ‘big deal’ that specifications for hardware are openly available? Don’t we have enough contract manufacturers and large vendors to satisfy what the market need?
As Penguin's CTO Phil Pokorny pointed out in his AMD guest blog motherboard designs are often a compromise. Customer requirements can be very specific. For manufacturers, removal of components and customization of motherboards is typically more expensive than supplying a ‘one size fits all’ boards with features that some customers don’t need or without features that some customers would like to see. Moreover motherboards are often radically optimized for cost. This can lead to compromises in power efficiency, reliability and specific features.
The Open AMD 3.0 OCP design specifies a ‘bare bones’ motherboard that can be configured for different use cases. The platform was designed with the input of the financial services industry and is intended to provide a ‘universal, highly re-useable common motherboard that targets 70% to 80% of enterprise infrastructure’ (OCP Project AMD Motherboard Hardware). Even though designed based on feedback from Wall Street the server flavors (HPC, Storage, General Purpose) outlined in the specification are generally applicable and should cover a large percentage of use cases in any enterprise data center.
The design offers benefits on many fronts
Management: Having one motherboard design as a ‘common denominator’ in an enterprise data center simplifies system provisioning and system management as well as the management of an inventory of spare parts. OS images and drivers can be used across a wider range of servers.
Capital expense: The ‘bare bones’ design approach enables customers to pick and choose rather than ‘bundle purchase’ components that they don’t really need e.g. fully featured BMCs when only a subset of functionality is required. While these cost savings may seem small at the level of the individual server they add up in large scale and hyperscale deployments.
Economies of scale: With a higher level of standardization customers will benefit from better economies of scale.
Compatibility: OCP 1.0 servers deployed at Facebook were built to fit a custom rack design. OCP 2.0 designs were built to be compatible with the Open Rack specification. The Open AMD 3.0 reference design is compatible with industry standard 19’’ racks. While it makes sense to follow a “holistic design process that considers the interdependence of everything from the power grid to the gates in the chips on each motherboard.” (OCP Open Rack blog) compatibility with the 19’’ de-facto industry standard will drastically accelerate the main stream adoption of Open Compute Project server platforms.
The biggest benefit of the Open Compute Project though is its ‘openness’. It is quite likely that with OCP control over hardware designs will shift from large established vendors to a community of users and cooperating manufacturers. Analogous to the way Linux obliterated the market for ‘closed source’ UNIX implementations OCP has the potential to give established vendors a ‘run for the money’. The open design also provides a great opportunity for new players that can now build on existing open specifications and customize these specifications for specific market niches.
At Penguin we realize that OCP has the potential to turn the server market ‘upside down’. We are an active member of the OCP alliance and recently extended our Altus product line to include servers built according to version 3 of the OCP specification to our product portfolio. For Penguin Computing the bottom line is ‘Yes, OCP is a game changer’
BTW ... If you want to know more about how OCP is expected to change the server market and how Penguin is embracing 'open hardware' ... an insightful article based on an interview with our CEO Charles Wuischpard was just published by The Register
Micro Data Centers – A New Approach to Deploying Compute Capacity
At SC’12 we showcased the Micro Data Center (MDC). A new concept that AOL designed in partnership with Penguin Computing. MDCs are small, self sufficient ‘Data Centers in a Box’ that just require external hookups for networking, power and water. There are two MDC flavors, one for outdoor use and one less ruggedized version for indoors. The outdoor MDC is housed in a 42U rack-size enclosure provided by Elliptical Mobile Solutions that is NEMA 3 rated and protects against fire, water, humidity and vandalism (the first MDC used in production survived ‘Sandy’ without a hitch). The indoor MDC is housed in a 37U rack-size enclosure from AST Modular and was designed for deployments in loosely controlled environments, as for example warehouse spaces. Each MDC contains high density servers and storage nodes from Penguin Computing’s Relion product line as well as PDU’s, switches, load balancers. The outdoor MDC is cooled by a direct expansion cooling module that is integrated with the enclosure, and has an option for using air-side economization. AOL first outdoor MDC that has been deployed in production is currently handling 30% of the traffic to AOL’s main site aol.com.
So why is this so exciting… ? Because moving away from the traditional datacenter deployment approach to a model were capacity can be deployed in small increments wherever it is needed allows for huge cost savings and more flexibility. For applications where the MDC approach is applicable AOL estimates over 90% cost savings. Beyond cost savings the new model also inherently enables the distribution of compute capacity so that natural disasters don’t incapacitate an entire operation. MDCs make it easier for providers that want to offer online services in countries where privacy laws require that data is kept in-country. MDCs can also reduce reliance on commercial content delivery networks as servers can be deployed in local vicinity to content consumers. Of course the MDC deployment model also has limitations. The MDC software architecture has to support self sufficiency of each MDC. Applications that depend on centralized services that need to be accessible with short latencies are obviously not a good fit. Neither are ‘Big Data’ applications or distributed applications that require a multitude of services with low latency.
.
While the upside potential is huge there is of course also a cost. That cost is mostly related to software. To enable the ‘self sufficiency’ of the services running in the MDC problems like database replication, configuration management and system dependencies need to be solved. ‘Cloudifying’ services my be one way to help address these issues. Overall MDCs are a promising approach to deploying data center capacity. MDCs could also be a good fit for small scale HPC deployments as HPC applications are by nature more self sufficient than large scale enterprise applications.
How To Calculate HPC Efficiency
Summary
HPC efficiency is a measure (percentage) of the actual performance of a HPC system against its theoretical peak performance.
Theoretical Peak Performance
The theoretical peak performance (GFLOPS) is calculated by the following equation...
GFLOPS = node * ( sockets / node ) * ( cores / socket ) * GHz * FLOPS
FLOPS (FLoating Point Operations Per Second) is specific to the kind of CPU. The following table shows the FLOP values of some Intel and AMD CPUs.
| CPU | FLOPS |
|---|---|
| Intel Xeon E5-2600 (Sandy Bridge) series | 8 |
| Intel Xeon E3-1200 (Ivy Bridge) series | 8 |
| AMD Opteron 6200 (Bulldozer) series | 4 |
| AMD Opteron 6300 (Piledriver) series | 4 |
1 node * ( 2 sockets / node ) * ( 12 cores / socket ) * 2.4 GHz * 4 FLOPS = 230.4 GFLOPS
Actual Performance
The Actual perfornance can be found by running XHPL. For more information, see...
How To Install / Configure / Execute XHPL (ACML) for AMD FMA4
How To Install / Configure / Execute XHPL (MKL) for Intel AVX
For example the acutal performance of an Altus 1804i with dual Opteron 6234 (2.4 GHz, 12 core), 128GB RAM (90%) is...
================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR01C2L2 123378 160 4 6 7469.13 1.776e+02 -------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0027476 ...... PASSED ================================================================================
HPC Efficiency
The HPC efficiency is simply...
Efficiency = Actual Performance GFLOPS / Theoretical Peak Performance GFLOPS
Using the previous Altus 1804i example the HPC efficiency calculates to...
177.6 / 230.4 = 77.1 %To increase the HPC efficiency, increase the actual performance. This can be done by tweaking compilers, math libraries, shared/distributed memory, numactl, kernel parameters, etc.
The UDX1 - Penguin’s first ARM based Server
Why are so many people interested in this platform? The short answer is power and density. Traditionally ARM based processors have been used for mobile devices, where low power consumption is key. At the same time every data center has power and cooling constraints and more and more cloud and ‘Big Data’ applications require scale out architectures. Our partner Calxeda is one of the first organizations to bring low-power ARM technology to the data center. The UDX1 is based on Calxeda EnergyCore SoCs (System on Chip) and can be configured with up to 48 servers and 192 cores in a 2U (we chose a 4U chassis to accommodate a larger number of drives – up to 36 3.5’’ drives). The power consumption per server is around 7W including RAM but excluding HDDs. What makes this super low power envelope possible is Calxeda’s SoC architecture that integrates the entire system logic on a single die: ARM9 quad core processors including the Neon SIMD engine, dedicated logic for power management, L2 cache, BMC (accessible through SoL), PCI-E, SATA and memory controllers.
Two issues that always pop-up in the context of ARM bases systems are the lack of 64-bit support with the inherent limitation of the addressable RAM to 4GB and the lack of applications and OSs' that run on ARM. The first issue is being worked on. The next generation processor code named Midway slated for next year will support 40-bit memory addressing and a 64-bit architecture built on the ARM V8 architecture is scheduled for 2014. The second issue matters for applications that cannot be recompiled on the platform or for customers that need to run enterprise distributions. As far as the enterprise distribution is concerned, there is an effort to build a RHEL based distribution for ARM. If applications cannot be recompiled emulators that facilitate the execution of x86 code through on-the-fly binary translation (and retrieval of already translated code from a cache) could be of interest. While certainly not as fast as native execution this type of technology could help with the transition to ARM. Also interesting … at the time of writing it looks like AMD is very likely to announce an 64-bit ARM based micro server based on the Seamicro platform acquired in March
.Even if the UDX1 may not be the perfect fit for your current workload it makes sense to deploy a UDX1 to port your applications to be ready when the more powerful platform hits the market.
Current thinking regarding XFS and RAID arrays
OK. First, a note about performance testing. You don't want to use the default DD block size of 512bytes. That's much too small to get the best performance. 1megabyte (bs=1M) is probably the minimum I would use for streaming copy/read/write testing.
How setting up our AMD Fusion12 Developer Summit Demo made me appreciate AMD's HSA announcement
Returning from the AMD Fusion Developer Summit I am sitting here at SeaTac airport pondering the information overload from the last few days … but I should take a step back … the fun leading up to the conference actually started last week when we needed to setup a demo for the show …
As you may know, Penguin released the first rack mount server based on AMD's APU architecture last year and we deployed a cluster of over 100 systems at Sandia National Laboratories. So the idea was to show a demo that illustrates that the APU's GPU cores can be used for HPC type of workloads. My first, in hindsight admittedly naïve thought was to modify Octave (an open source version of Matlab) to take advantage of the AMD's CLBLAS libraries. After researching a little bit I realized quickly that I had been a bit too ambitious ...
Penguin's New Blog
Welcome to the Iceberg! As part of our new website and direction and looking at the kind of projects we are working on, we’ve decided to start our official Penguin blog. Going forward, this will be our way of sharing more detail on the cool developments we are working on in our chosen markets: High Performance Computing (HPC) or Supercomputing, what we are calling the Efficient Data Center, and Cloud Computing for the HPC or Big Data user. We will post material not only from our engineers and executives but also from our customers and partners.
It’s hard to believe but Penguin is now fourteen(14) years old and I’ve had the pleasure of being a Penguin for over five years with nearly four in the top job. In that time, the industry has changed dramatically and continues to do so. Traditional older players have exited or have been acquired. IBM has largely exited the x86 business, Sun was acquired by Oracle and exited the HPC market, Rackable Systems was subsumed into SGI, Linux Networks was acquired by SGI, Verari filed for bankruptcy, etc. On the other hand, new players (primarily the largely Asian contract manufacturers) have entered the market targeting the very largest data center opportunities. And more recently, cloud computing threatens to upend all traditional business models. Penguin has benefited from all these changes as customers seek a trusted established supplier who focuses on and delivers custom-built and optimized turn-key deployments; whether they be one-of-a-kind HPC clusters, custom compute and storage farms, or partial or fully outsourced HPC cloud solutions. These are big long term trends affecting a $70B market so we think the world is bright and that our skills and focus are in the right place at the right time.”

Read the University of Virginia Case Study »
Learn More
