Dagong products

home page > Finance >Body

Chuangke Universe/"embodied intelligence" industry has broad prospects Liang Zhiyu, convener of Venture Capital Alliance

Time: 2023-11-21 04:03:02 Source: The Dagong Daily

Having written several articles on generative artificial intelligence (AI) in a row, it is not difficult to realize that the current mature applications mainly interact with users through terminals loaded with software and algorithms. In science fiction movies, we can see intelligent robots that can understand and speak, answer fluently and move freely like human beings. So far, they have not gone out of the virtual world of movies. To achieve this ultimate goal, the next challenge in the AI field will be to achieve "Embodied Artificial Intelligence", that is, high-end robots that can master various skills through self-learning and have the ability to execute.

To put it simply, embodied intelligence is to endow the AI system with a body that supports physical interaction. After combining the two into an agent that integrates software and hardware, it can interact with the environment and perceive like human beings, and complete various tasks in real life by observing, moving, speaking and interacting with the world. At present, some intelligent service robots, autonomous cars, chat robots, etc. visible in daily life are embryonic forms of embodied intelligence. However, because the control mainly depends on preloaded programs, it is far from the ultimate form of embodied intelligence.

To understand embodied intelligence, you can start with the non embodied intelligence (Disembodied AI or Internet AI) that many people have contacted. Non embodied intelligence focuses on AI software applications, does not consider specific forms, does not need physical interaction, and focuses on the development of abstract algorithms, such as deep learning and generative AI big models, which have developed rapidly in recent years. Various multimodal applications such as ChatGPT, Midjournal, etc. have been derived. The embodied intelligence also has AI, but an important condition is that it has a physical body that can support feelings and activities. The ideal embodied intelligence is that it can take the initiative to feel the world like human beings, understand human language, analyze tasks and then take action. In the process, it can immediately verify and adjust the model, and finally complete the corresponding tasks.

It is not easy to replicate the five senses of human eyes, ears, mouth, nose, body and mind. Embodied intelligence includes almost all technologies in the AI field, including machine vision, natural language understanding, cognition and reasoning, robotics, game ethics, machine learning, etc. It is the epitome of AI. With the continuous progress of deep learning, the generative AI large model is increasingly developing into multi-mode, especially the development of large language model. Combined with the complex multi-mode model of multiple sensors such as vision, it has greatly accelerated the development and landing speed of embodied intelligence.

The "mind" of embodied intelligence is usually driven by the deep neural network model. The emergence of large models such as GPT provides new ideas. Large language models and visual language models with universal capabilities can enhance the understanding of the model to objects in the real environment through the joint training of images, words, and embodied data, and endow the embodied intelligence with powerful generalization ability. Robot technology provides a "body" that can interact with the physical world. By integrating a series of sensors such as camera lens, microphone and tactile sensor, AI can perceive the world with its senses just like human beings; Coupled with actuators such as wheels and electric joints, it provides AI with a body capable of movement.

More importantly, non embodied wisdom has no eyes, ears, mouth, nose, limbs and senses. It cannot collect data independently, and can only passively accept the data already collected by human beings. At present, most of the deep learning model training uses historical data from the Internet. Once encountering problems that have not occurred in the training environment, it is necessary to collect data and then perform repeated optimization. This process is inefficient. In the future, the training and testing of embodied intelligent models can be combined with cloud services. In the cloud virtual simulation scenario, end-to-end real-time training and testing can be carried out, without the need to rely on manual code to update iterations, which naturally greatly accelerates the evolution speed of embodied intelligence.

It is expected to replace dangerous types of work

At present, Google, Microsoft, Tesla and other technology companies have announced their own embodied intelligent products. Now, in March, Google launched the PaLM-E, which is a multi-modal embodied visual language model (VLM), so that robots can understand images, languages and other data based on large models, and execute complex instructions without retraining. This one-step R&D route looks cool, but it takes a long time and is far away from industrial availability.

A more feasible technology landing path is to enable different tasks to be realized through different models, such as learning dialogue with the language model, recognizing maps with the visual model, and completing limb driving with the multimodal model. All instructions are decomposed and executed, and then automated scheduling and collaboration are completed through the large model.

China's policies are also promoting the development of embodied intelligence. The Ministry of Industry and Information Technology issued the Guiding Opinions on the Innovative Development of Humanoid Robots on November 2 this year, positioning the development of Chinese humanoid robots. It is proposed to take AI technology breakthroughs such as big models as the lead, and focus on breakthroughs in the key technologies of humanoid robots such as "brain" and "cerebellum", "limbs", and technological innovation systems on the basis of the existing mature technologies of robots.

The wave of industrial intelligence has provided a market for embodied intelligence. For example, "grasping, holding, and releasing" in the industrial scene can replace manual operation of dangerous or tedious and repetitive processes, such as coal mine underground operations, port warehouse unloading, moving express services, accident site cleaning, disaster relief and other fields. In addition, if multimodal input is introduced, users can use language to intuitively control such as mechanical arms, unmanned aerial vehicles, home auxiliary robots, etc., which can rapidly expand the application scope to the level of daily life and generate huge industrial value.

Latest news

Most popular