Information Center

Cloud computing versus high-performance computing: who is more competitive?

  

Core tips: Recently, the debate over whether the high-performance computing cluster solution should be built independently or purchased directly is in full swing, partly because the key components of the performance and software ecosystem that originally belonged to the market gap have been gradually implemented.

Recently, the debate over whether the high-performance computing cluster solution should be built independently or purchased directly is in full swing, partly because the key components of the performance and software ecosystem that originally belonged to the market gap have been gradually implemented.

 

After years of development and evolution, the feasibility of high-performance computing in the cloud environment has finally been affirmed to some extent - at least for some applications. While large cloud service providers have made tentative extensions to high-performance computing by using more powerful network and processor solutions, other manufacturers, represented by Rescale, have also begun to help independent software developers access high-performance computing code through their own licensing models, thus unveiling the mystery that has long covered high-performance computing software. However, it needs to be emphasized that there are still many difficulties in running high-performance computing loads in the cloud environment. It is still difficult to find the exact answer to which loads should be handed over to the internal high-performance computing cluster for management, and at the same time to the cloud infrastructure for other loads that may have sudden resource demand growth, so as to make full use of the inherent advantages of both sides.

According to a recent article published on The Platform website, we can see that in the field of high-performance computing, it is really difficult to complete tasks such as the construction of the overall facilities, procurement and co location of computer rooms. In order to better understand the thinking mode of these two viewpoints, we have adopted some figures provided by Rescale, a high-performance computing cloud service provider (as mentioned above, mainly responsible for the work of connecting software with independent software developers of high-performance computing).

The figures mentioned below come from a cost comparison between Rescale's internal high-performance computing cluster and leasing cloud supplier capacity and license. These figures reflect the use cost of a typical midrange cluster when dealing with high-performance computing workloads - note that no high-end processors or acceleration schemes are considered here. In an article discussing the cost of using high-performance computing cloud services, Joris Poort, CEO of Rescale, explained that this is mainly to reflect the median cost level of end users - some need high to extreme performance, while others pay more attention to cost factors, so these figures are only for benchmark reference. Of course, once the new Haswell processor or InfiniBand and other high cost elements are added, the basic cost will undoubtedly increase significantly - especially in the first year of purchasing cluster equipment.

 

 

Under the above configuration conditions, users who operate and maintain a typical 100 node cluster in the physical data center need to bear a fixed cost of nearly $70000 per month, of which about $16000 is spent on energy consumption and cooling. The statistics provided by Poort also include the salary of a full-time engineer responsible for managing the cluster. At the same time, he stressed that for most users, even those who have begun to transition to the cloud, this position still exists, because the enterprise often continues to run a large number of workloads.

Next, we can see the specific cost composition of the typical cluster provided by Poort:

 

This is only the expenditure on cluster operation and maintenance, Poort said that the overall cost of ownership of the cluster should also be taken into account, including the expenditures incurred by the technical support team and other service projects that are beyond the scope of the table. The overall monthly cost is about 110000 dollars. Interestingly, the hardware related cost is only about $40000, but other operation and maintenance costs (including power supply, personnel and other related costs) are as high as $70000.

This number sounds really high, especially considering that enterprises tend to further divide the budget prepared for high-performance computing resources by department. For example, in some enterprises, the bandwidth cost budget is included in the overall bandwidth usage monitoring mechanism. The same is true for power consumption, which is often not directly allocated to high-performance computing clusters, because the data center also contains the energy costs of other equipment. Although these funds will be provided by the enterprise, Poort said that it is easier to understand the way of directly listing figures - the statistical results here belong to the sum of all expenses directly accumulated, which may also include some expenditure items that are counted on the head of high-performance computing departments and actually used by other departments.

 

Considering this, it is really difficult to accurately split the overall cost into operating expenses per hour, especially considering that in most cases we can only calculate separately based on different hardware. In other words, the overall operation and daily data center costs listed above are often missing from the calculation results. Here we will temporarily calculate the core use cost per hour under full load operation at 10 cents. Of course, the calculation results may vary significantly for different enterprises due to different conditions. "If you only include the power consumption cost but not the overall data center cost, you may only need 5 cents per hour to calculate the use cost of the core. It seems that there is no difference on the surface, but the cost level is 25% higher than our calculation method - plus the burden of other facilities and elements, the difference will be very obvious, "Poort explained. This saving effect may not be instant, but it is absolutely an important difference between the two schemes, and this difference will become more and more significant as time goes by.

However, the usage cost of 10 cents per hour per core is only applicable to the assumption of full load operation, as shown in the following table. "Another big mistake that often occurs when assessing such costs is that when people look at their own spending on using servers in the cloud environment, they tend to simply understand from the surface that the cost is higher than buying servers and accessing infrastructure in person. In view of this, we need to emphasize the difference in resource utilization - in the cloud environment, we can turn on and off our leased equipment at any time. In the non use period, they will not bring any cost at all, "pointed out Poort.

 

View the original picture

 

In other words, as a typical internal system solution, most high-performance computing engineering teams prefer to adopt the highest capacity specification to meet the peak demand for resources. After all, it is more important to be able to devote all resources to product development than to maintain 100% resource utilization. "Many enterprises have realized that their current actual resource utilization rate is probably only 60% to 70%. However, this capacity setting is really smart for them, because they need to meet the resource requirements of peak capacity - engineers can't wait for a moment."

This goes back to the point Poort mentioned earlier, that is, for high-performance computing customers, the most ideal use case should be able to combine internal resources with cloud based capacity, so as to easily cope with peak demand while balancing actual operating costs with existing hardware investment. He does not expect enterprise customers to put all critical high-performance computing workloads into the cloud environment. However, in his opinion, it is an ideal choice to expand the existing capacity of enterprises by using various hardware and software tools provided by cloud services.

Finally, it needs to be emphasized that in addition to the considerable global resource reserves Rescale also has a rich combination of software licensing, which is enough to help independent software developers get rid of those expensive and complex high-performance computing engineering software. For users, it is obviously very important to pay the software license fee by the hour, which can even be regarded as a unique advantage. Although this point is not clearly reflected in the above table - the actual difference between the special high-performance computing system and the specific workload makes it difficult for us to develop an accurate benchmark for it, we believe that users have an absolute understanding of its important value.