Generative AI was front and center this past year with huge advancements in the technology’s ability to create everything from visual art, written theses, new recipes, biological breakthroughs, and more. The headlines surrounding AI-generated art beating human artists in competition or Meta’s AI program, Cicero, winning online strategy games through deception of human players highlight the very real advancements we have made in the sophistication of these algorithmic technologies.
These examples of advanced machine learning and artificial intelligence have all been enabled by high-performance computing (HPC). In practical terms, HPC simply refers to a network or networks of hardware capable of computing large, complex calculations very quickly – much like high-performance computing’s predecessor, the supercomputer, and its distant cousin, the quantum computer. However, whereas both supercomputers and quantum computers generally consist of one large system purpose-built to perform a particular task, HPCs combine multiple systems and parallel computing techniques that can be adjusted to suit evolving needs. The resulting agility and scalability of high-performance computing systems has produced astonishing technological breakthroughs at a relatively low cost.
With a host of data-hungry applications emerging and expanding around the world, HPC will only continue to grow in significance and scope, radically transforming the conventional data center landscape in several key ways:
Shifting power in the value chain
In the heyday of the supercomputer, semiconductor companies were king. AMD, Nvidia, and Intel ruled the industry, and their ability to achieve exponential increases in processing power every couple of years governed the pace of growth for computing in general. Today, however, chip manufacturing is reaching atomic scales, and as a result, transistors are approaching the limit of Moore’s Law. While these companies are still pushing that boundary and may one day produce higher-performing chips, the near-term growth in computing power will have to come from somewhere else – and, so far, a leading solution seems to be high-performance computing.
As a result, influence in the computing industry has shifted forward in the value chain to companies that build, own, and operate data centers. And much of that power is concentrated in the “Big 9” operators (Microsoft, Google, Meta, Amazon, Apple, Tencent, IBM, Alibaba, and Baidu). Rather than waiting for Nvidia to release the next most powerful generation of semiconductor, these companies are proactively designing their own chips and networks and competing with each other for the most powerful machines on the planet – which brings us to the next big industry shift facilitated by adoption of HPC:
Proprietary technology development
For decades, technological developments in computing were first adopted by academic or government research labs, which used their huge budgets to commission the latest supercomputers from semiconductor innovators. Then, after the technology was proven out over a few years, the semiconductor companies would sell the technology to other industrial users with large server rooms, the predecessors of today’s cloud data centers. These advanced but ultimately general-purpose chips provided benefits across the computing industry and drove competition between chip makers and other hardware designers.
As influence has moved forward in the value chain, however, there has also been a shift toward innovating everything in-house, from hardware to software to data center design. The Big 9, along with other data-hungry companies such as Tesla, install HPC as clusters within their own data centers to accelerate and optimize deployment of certain high-complexity tasks. These operators have formed well-funded teams of talented engineers who develop proprietary designs customized for their particular computational needs.
For example, Tesla has intensive processing needs for training and developing autonomous vehicles, as their self-driving algorithm must analyze hundreds of thousands of hours of videos to “learn” how to operate. To accommodate these demanding video-analytic activities, Tesla has designed an HPC system named Dojo. Dojo is completely custom-built, from the AI chip – the D1, which is among the leading most powerful chips in the world – to the packaging of the chips on a motherboard, to the communication network, and even the cabinet design. This technology, which Tesla has no intention of selling to other players, is extremely powerful and could serve as a massive competitive “moat” (although some might argue that recent and widely publicized failures of Tesla’s self-driving function could be traced back to its insistence on custom-building chips in-house with no external verification).
Spillover effects to data center design
The expansion of HPC within data centers may also change performance expectations for cloud computing as a whole. HPC networks are designed differently than traditional data center layouts, allowing them to allocate and pool resources more dynamically and efficiently in real time. As these systems proliferate, inefficiencies in computing and networking in the rest of the data center will become more obvious, and operators may begin to push for better performance in mainstream data center equipment. Low-latency fibers and low-loss connectors, among other technologies, will likely be used in HPC clusters first but will then begin to permeate the rest of the data center infrastructure.
Changes in the manufacturing landscape
Advanced computing trends are driving adoption of new materials (e.g., alternative metals and semiconductors) that can perform at higher temperatures, lower failure rates, and be made into smaller, more complex chip features. Incorporation of new materials in the chip fab will in turn influence processes for growing semiconductors, polishing wafers, etching surface features, cleaning nanoparticle residues, etc., and may have implications for suppliers further back in the value chain. Another aspect to consider is shifts in manufacturing geography, which are happening now and are driven largely by global supply chain issues and growing political urgency around reshoring capacity in the U.S. (e.g., via the CHIPS bill). While HPC is not the cause of these trends, its trajectory is intertwined with these developments – semiconductor companies traditionally use older facilities for older chip designs and dedicate new plants to cutting-edge chips, so all the new fabs being constructed in the U.S. (e.g., Intel’s new Ohio plant), will likely be outfitted with the technology required to build advanced HPC hardware.
Overall, the rise of high-performance computing poses a threat to the traditional supply chain in the computing industry. As influence accrues to large data center operators who are innovating largely in-house, those further back in the value chain may struggle to find success by providing differentiated materials and components unless working closely with data center operators as design partners. That being said, demand for materials that enable data center operators to take full advantage of the low-latency, low-loss networking across their campuses will be on the rise. And HPC will continue to have broad-reaching impacts – not only on society and art, but on the very businesses who supply materials that make it run if they can find ways to benefit as users.