Fudan MOSS team talks about the latest progress: from big language model to multimodal big model

Source: contribution
2024-05-24 15:56:27

At the 2023 "Top Ten Scientific and Technological Progress" Selection Meeting of Fudan University, the MOSS, a general credible AI model of Professor Qiu Xipeng's team from Fudan University, was selected as one of the top ten progress. The research team carried out in-depth research on the multi-modal expansion and credibility improvement of the big model, jointly released the open platform of AI governance "Dandelion", and systematically revealed the text, images The credibility of the video model and the innovative proposal of the "six dimension evaluation framework" have promoted the solution of the problem of difficult implementation of global AI governance.

Professor Huang Xuanjing of Fudan University, a team member, introduced to First Finance and Economics that based on MOSS, they had several new developments, one of which was to hope that all kinds of signals would be symbolized. "At present, we are exploring different solutions to multimodal large models. One idea is to transform all symbols, whether pictures or music, into a unified symbol space."

She explained to reporters that they proposed a model called AnyGPT, which can understand and reason multimodal content, such as text, voice, images and music. The discrete representation is used to deal with various modes in a unified way, and the efficient multimodal alignment pre training is achieved through a two-stage generation framework. Given a voice prompt, AnyGPT can generate a comprehensive response in the form of voice, image and music, and give a prompt in the form of text+image. AnyGPT can generate music according to the prompt requirements.

In addition to the above progress, in January this year, they also launched "MouSi" based on MOSS. "I hope it can provide services for visually impaired people. We have also developed an app called 'See the World', which can tell you what goods are available in the store, how to navigate on the road, and some entertainment modes. At present, we have started services with some communities," said Huang Xuanjing.

In view of the data and computing power required by the current large model, Huang Xuanjing explained, "Everyone knows that the large model needs to be fed with 'fuel', which is the so-called data. We collected various cross language and multi-source data for various pre-processing, extracted high-quality data to train the model, and then studied the alignment of AI feedback."

In terms of algorithm, they proposed a low memory optimization algorithm. "Because the training cost of large models is very high and the world knowledge is always updated, this algorithm reduces the computational power resources required for training large models."

Expand to read the full text
Click to lead the topic 📣 Post and join the discussion 🔥
zero comment
zero Collection
 Back to top
Top