Registration debate In a sense, removing as much complexity as possible from the IT stack is entirely artificial, and in a sense desirable. Accepting the need for choice and acknowledging the necessary setbacks brought about by complexity is also completely human.
There is tension between the two because many register Readers pointed out in our recent debate– Can a unified, agnostic software environment spanning HPC and AI applications be achieved?
Readers were more skeptical than optimistic about this, with opponents overwhelming the pros with 53% of the votes against and 47% of the votes.
The debate started with Nicole Hemsoth, the co-editor of our sister publication, Next platform, Defend the bill It also stipulates that it may be time to have the least number of tools in HPC and AI centers to support the widest range of high-performance and large-scale applications.
“Create a unified HPC and AI software stack that is both open and unknowable. It sounds like common sense to us, and it sounds so because it is true,” Hemsoth argued.
“What prevents us from focusing on solving problems instead of endlessly clearing tools and code matrices? The conceited and almost religious insistence on the preferred platform, the non-invention syndrome here, and the lack of cooperation are the root causes of this particular evil and result It’s this vicious circle of reinventing the wheel. Over and over again.”
After making this logical and hopeful argument for the integration of HPC and AI development tools and runtimes, Hemsoth admitted that having a stack might not happen, even if it does make sense abstractly, and we can hope that the best for is A bunch of different stacks, they can generate their own native code, but can also be converted to other platforms, just like AMD’s ROCm platform can run CUDA code or kick out CUDA code, so that it can run on the Nvidia GPU accelerator. Maybe Nvidia will repay the favor in the same way. Will Intel also work hard through oneAPI?
Then there is Rob Farber, who has worked at Los Alamos National Laboratory, Lawrence Berkeley National Laboratory and Pacific Northwest National Laboratory during his long career, and is now the CEO Technical Support, Surprised us Intricate technical arguments, Supports the idea that a unified, unknowable software environment is an admirable goal, but it is difficult to achieve at the source code level, because there is no single machine architecture or any single machine architecture—current or undesigned— -Can be excluded.
Interestingly, the key insight put forward by Farber is that any unification may not occur at the source code level, but in the calculation graph generated by the compiler (such as the LLVM-based compiler), which constitutes the data generated by the compiler The structure, regardless of the source language, tells how the data flows and is processed by the hardware.
“These diagrams constitute the’software environment’ that can take advantage of all the hardware densities and parallelism that can be packaged on a chip in modern semiconductor manufacturing,” explains Farber. “Performance utilizes decades of work by compiler writers to optimize their computational graphs to maximize the use of hardware computing power and minimize the performance of restricting external memory access. Pipeline work and instantiate multiple computational graphs. A calculation graph to process data can achieve parallelism in parallel.”
Dan Olds, Chief Research Officer of Intersect360, Strongly oppose the motion.
“This can’t happen in hell,” Olds argued. “Why? Because this is a human world working for the benefit of itself and its organization. APIs are a source of competitive advantage for many companies. Therefore, these vendors should not want something completely standardized—especially when the standard is governed by Driven by the largest and most influential suppliers in the industry.”
What we want to add is that when there are three major computing engine suppliers in the data center-Intel, AMD and Nvidia-and agree that the self-service standard will not exist for a long time, it is difficult to reach an agreement, as Olds pointed out. . But in the long run, it may be a long time. Like decades.
We ended the debate with my other co-editor Next platform, debate A single unified HPC and AI development and runtime environment may not be as ideal as we might think at first glance.
“In the wider field of simulation and modeling and machine learning in the field of high-performance computing, perhaps one day there will be a unified field, such as quantum mechanics and relativity, and there may be a single programming environment that can span all of these,” I Say.
“But for now, in the post-Moore’s law world, every transistor is important, every bit of movement and processing is important, every joule of energy is important, and there is nothing inefficient in the hardware and software stack. This means that there will be complexity in the programming stack. This is an inevitable trade-off between application performance and application portability, which we have seen in commercial and HPC computing for more than five years. “
History has shown that it is much easier to obtain plumbing standards in the hardware stack-interconnections and protocols, etc.-than to obtain higher standards in a programming environment.Without knowing what we are all writing, I agreed with my partner Next platform This is perhaps the best level of simulation or conversion we can hope for, as happened between AMD’s ROCm and Nvidia’s CUDA.
But in the end, when we face a post-Moore’s law world, it becomes more and more difficult to do more work under the same heat and the same budget (software becomes more and more complex, and hardware cannot keep up. This is why 5 To build a billion-dollar supercomputer, instead of the $50 million required to build a trillion-level supercomputer a few decades ago), every piece of code in the HPC and AI stack must be highly adjusted to improve efficiency and heat and cost Decline, which means having a wider variety of hardware, so there are more compilers, more frameworks, more libraries.
Readers weigh
One of many anonymous cowards summarized many comments in this debate:
“It’s a good idea-write them once and run them anywhere.
The issue is:
-The supplier needs to be locked. AWS does not want customers to migrate to Google, nor does it want them to migrate to Azure. There are not many places to go in the future.
You don’t want to build exciting new products with competitive features at the request of competitors, let alone some other self-appointed arbiters. It produces an incredibly chilling effect.
This is a beautiful daydream, but it doesn’t work in reality.
This is not to say that this is impossible within a limited scope-for example, HTML is universal enough to provide us with the World Wide Web. “
And readers Scott Secondly, the idea that vendor lock-in is a real obstacle:
“Supplier locks in. It’s completely correct. The reasons why it hasn’t happened are not valid. The only obstacle to this heavenly pie is that it won’t make anyone money. There is no incentive to spend money on the time and resources to build and maintain such a system. .”
As far as the four of us and many of you are concerned, this is the real trouble. ®