Li Kaifu: How does AI big model industry find the optimal solution of "computing power" _ China Economic Network

"As AI has moved from 1.0 to 2.0, we have ushered in the most important technological revolution and platform revolution ever." Li Kaifu, CEO of OneThing and Chairman of Innovation Works, said in an interview with reporters a few days ago that computing, as a key link in the development of AI 2.0, needs to adopt a more pragmatic, coordinated and optimized approach to address current challenges. In the future, it is expected that AI cutting-edge enterprises will be more widely and deeply involved in the construction of national AI computing power, and all advantageous resources will be fully integrated to better enable the development of AI big model industry.

"It is a very good exploration and attempt to actively build supercomputing centers in many places, build a national computing base, and promote the integrated operation of supercomputing." Li Kaifu believes that supercomputing centers gather multiple functions and elements such as cloud computing, big data, and artificial intelligence research and development, and gather rich AI development resources and momentum. The future is foreseeable. How to give better play to the advantages of all parties to build super AI computing power can be started from the following four aspects:

First, follow the scaling law to improve computing efficiency. The importance of the law of scale of large models has been highlighted in this era - human beings can use more calculations and data to continuously increase the wisdom of large models. This path that has been verified by many parties is still in progress, and is far from reaching the ceiling.

At the same time, the wisdom of the large model comes from near lossless compression, which is also very important. Because more GPUs cannot be blindly stacked in the process of the law of scale, a method is needed to evaluate whether the large model enterprise is doing better or which method is doing better. Following the law of scale has also become one of the effective ways for many large model enterprises to try to break through the limitation of computing power and optimize computing efficiency when applying large models on a large scale.

There is a rigorous methodology within OneThing, which is evaluated with the concept of compression, making the aimless model training process more systematic, scientific and mathematical based, and greatly improving the calculation efficiency of its own large model.

Second, strengthen the "model base co construction" to find the optimal solution of computing power. At present, the number and scale of GPUs in the AI field in developed countries are about several times that in China. In the face of such a gap, we need to take more pragmatic and effective measures - self research AI Infra (AI Infrastructure Artificial Intelligence Infrastructure Technology). AI Infra mainly covers large model training and deployment and provides various underlying technical facilities. In foreign first-line factories, the most efficient way to train models is to co build algorithms and Infra, not only focusing on model architecture, but also starting from optimizing the underlying training methods. Since its establishment, OneThing has set the self-developed AI Infra as an important direction. It has chosen the strategy of "model based co construction", which is in line with the international first-line echelon. The model team and AI Infra team are highly co built, with a population ratio of 1:1. On this basis, OneThing has developed the AI infrastructure technology by itself, optimized the training methods from the bottom layer, greatly saved costs, and found a high-quality way to use computing power under the current conditions. From the perspective of the training process of Yi Large, the world leading 100 billion parameter model of OneThing, after optimization, the training cost of Yi Large 100 billion parameter model dropped significantly year on year.

Third, we should create a "model response integration" and explore TC-PMF (Product Market Technology Cost Fit, technology cost × product market fit). In the era of large model, the cost of model training and reasoning constitutes a growth trap that almost every startup must face. User growth requires high-quality applications, and high-quality applications cannot be separated from a strong base model. Behind a strong base model is often high training costs, and then the reasoning costs with the growth of user size need to be considered. Enterprises that can take the lead in detecting and achieving TC-PMF will undoubtedly take the lead. To achieve this, the "trinity" of models, AI Infra and applications is indispensable. Based on this, OneThing has listed "integration of modeling and application" and "co construction of modeling and foundation" as the top core strategy of the company. In terms of talent density and collaboration mode, OneThing has also rapidly polished a set of organizational systems that can integrate excellent talents from different disciplines to create cross-border businesses. Base model, AI Infra, API, and C-end application AI assistant "Wanzhi", these full stack business deployments enable OneThing to view the growth trap in the era of big model from a more comprehensive perspective, and also bring greater play to its commercial landing in the application layer.

Fourth, give full play to the demonstration advantages of cutting-edge AI enterprises. Li Kaifu believes that China has certain advantages in developing AI computing power and other fields. In addition to rich big data resources, China's social resource mobilization and integration are high, and its R&D efficiency and implementation effect in scientific research and application fields are good. At the same time, there are many outstanding Internet enterprises and phenomenal app applications in China, such as Diaoyin, Meituan, etc. These enterprises' exploration in the field of AI computing power and algorithm is based on user support, which makes it easier for enterprises to develop more forward-looking and application-oriented applications, which is an important driving force for the development of AI big model industry. "In the future, we hope that national supercomputing centers, reliable and cutting-edge AI enterprises in China and enterprises in various fields can enhance the integration and utilization of resources, give full play to the advantages of all parties, and better integrate development, so as to add new momentum to AI's development."

(Editor in charge: Liu Peng)