Kaiyuan Daily | The national team of the big model has made efforts; RLHF Xinpingti; ICQ will be shut down soon; China's big model scuffle terminator; MoE is an art of compromise

Source: OSCHINA
Edit: game
2024-05-27 19:41:00

Welcome to read the Open Source Daily produced by OSCHINA editorial department, which is updated once a day.

# 2024.5.27

Today's key points

The phantom engine code specification prohibits the use of dirty words, slave, and master

A netizen recently shared Epic Games' official "Fantasy Engine Code Specification", pointing out that some of its provisions are very absurd, and questioning Epic's "why should we turn things that were not problems into problems".

It is the first time in the industry to realize the end-to-end AI big model training of domestic GPU from 0 to 1

Moore Thread and Wuwen Core Dome jointly announced today that the two sides have officially completed the training of "MT-infini-3B", a large-scale model based on the domestic full function GPU kilocalorie cluster, The model is based on the kilocalorie cluster composed of domestic full function GPU MTT S4000 of Moore Thread , as well as the AIStudio PaaS platform of the Core Free Dome.

It is reported that this MT-infini-3B model training took 13.2 days in total, and the whole process was stable and uninterrupted. The stability of cluster training reached 100%. Compared with single machine, the expansion efficiency of kilocalorie training exceeded 90%, It is said to have "fully verified the reliability of Kua'e thousand calorie intelligent computing cluster in the large model training scene, and also pioneered the new paradigm of in-depth cooperation between domestic big language models and domestic GPU thousand calorie intelligent computing cluster" in the industry.

Problems between GNOME and the sponsor STF, or affecting the development of the project

Last month, German sovereign technology fund Sovereign Tech Fund (STF) announce Continue to invest in GNOME. Thanks to the support of STF, the GNOME project has also made many remarkable progress in recent months.

But its status updates in the last week point out There are some problems between STF and GNOME Foundation, which are not specified, but may affect the follow-up development of the project.

As part of the GNOME STF program, many community members are engaged in infrastructure related projects.

We are currently facing a major problem from the GNOME Foundation. We hope that it can be solved before affecting the coordination of the STF project, but if it cannot be solved, the future of some contents of the project will be full of uncertainty.


Observation today

Social observation

Startups spend $50 billion to buy NVIDIA chips to train big language models

Sequoia Capital estimated in a demonstration in March that start-ups spent $50 billion to buy NVIDIA chips to train big language models. However, these start-ups only earned $3 billion in total. This huge gap tells us that most companies are struggling to find a viable business model.

Some companies are now downsizing or transforming, such as Inflection, whose CEO and most employees have left and joined Microsoft. There is also Character.ai. Although its usage and popularity are growing, the report shows that its annual income is less than $20 million. Considering that two start-ups, OpenAI and Anthropic, as the leaders in this field, account for the majority of sales, while most ecosystems are still looking for ways to make money.

-Weibo Baoyu xp

The big model national team has launched

The latest progress of domestic big models comes from the "national team" this time!

Just now, the home-made ecological model of the whole stack "Jiutian Smart Base" was officially released!

It was developed by China Mobile. It includes three parts, namely, ten thousand card computing power, hundred billion model and Baihui platform.

The model part is a 100 billion parameter large model independently developed by Jiutian from operator to framework full stack domestic training, and its capability reaches 90% level of GPT-4.

Based on it, 17 industry models have been laid out, covering government, medical, office, financial risk control, customer service, code and other industries.

-Weibo qubit

AI search engine open source project Farfare

Similar to Perplexity AI), which supports the use of local or cloud LLM.

Project: github.com/rashadphz/farfalle
Demo:farfalle.dev

Technology stack
-Front end: Next.js
-Back end: FastAPI
-Search API: SearXNG or Taville
-Logging: Logfire
-Rate limit: Redis
-Component: shadcn/ui

features:
-Search with multiple search providers (Taville, Searxng)
-Answer questions using the cloud model (OpenAI/gpt4-o OpenAI/gpt3.5-turbo、Groq/Llama3)
-Answer questions using local models (llama3, mistral, gemma, phi3)

-Weibo Huang Jian

LLMs Reasoning Delay Optimization Assistant

The reasoning delay optimization tool of the open source large language model (LLM) provides efficient model service capabilities through a variety of optimization technologies and server configurations to help users reduce delay while maintaining model performance and improve user experience
-Mlc performs best in reasoning latency, but further testing of model quality may be required.  
-CTranslate2 performs well in speed and ease of use, but it does not support distributed reasoning at present.  
-Although vLLM is fast, its advantage is that it supports distributed reasoning and is suitable for very large serving models.  
-TGI is easy to use and integrates the HuggingFace ecosystem, but its new license may limit its commercial use.  
-The choice of reasoning optimization tools depends on specific requirements, including reasoning speed, ease of use, model size, license restrictions and other factors.  
-The purpose of benchmarking is to obtain a general comparison of performance between tools, rather than an accurate test.  
-Reasoning server and model optimization technology are usually used together to achieve the best reasoning performance.
'Optimizing Latency for LLMs'

-Weibo Love Coco - Love Life

Kunlun Wanwei announced that the daily active users (DAUs) of Tiangong AI exceeded 1 million

On May 27, Kunlun Wanwei Group announced that the daily active users (DAU) of Tiangong AI had exceeded 1 million, which was warmly sought after by users.

After the preliminary verification of the large model capability, Tiangong AI will continue to focus on the product market matching degree (PMF, Product Market Fit), Always take the user experience as the center, constantly optimize the product experience, and create AI products that are closer to the user use scenarios.

According to the data of QuestMobile, a domestic business intelligence data service provider, as early as March 2024, the monthly active users of Kunlun Wanwei Tiangong AI APP have reached nearly 10 million, second only to Doubao and Wenxin, becoming the third monthly active user of domestic AIGC APP.

- Kunlun Wanwei Group

Media observation

Quantification started, ten thousand cards in hand, price reduction crazy devil, DeepSeek may be the terminator of China's big model scuffle

The big scuffle involving Byte, Alibaba, Baidu, and Zhipu was initially started by a "mysterious" "financial company". On May 6, the AI team of the quantitative hedge fund Magic Square made a deep search and released the latest model. At the same time, it announced that the API price would be lowered by a large margin, making its price only one percent of the GPT-4 Turbo. Soon, it triggered a chain reaction. After that, Byte and Alibaba followed up one after another, making the price war officially launched.

DeepSeek and its model name "DeepSeek" seem strange to most people, but in model researchers and open source circles, it was once one of the most mentioned models and developers. Even when Mistral and Llama dominated, DeepSeek also had a group of loyal fans. Many developers especially think that its mathematical and reasoning abilities are very strong, which is obviously different from those models that pursue to play with poetry and fu.

The main business is secondary market trading, but it has started AGI; Low profile is abnormal, but it determines the trend of the whole industry; Not much publicity, but was praised by the community tap water This series of contrast makes this company more mysterious.

But this "mystery" may not last long. A number of people close to Magic Square said that the next plan of Magic Square is to let it face the market independently. It will probably become the last player in China's big model Jianghu that seems to have set the pattern, and it is also destined to be a player that can stir up trouble.

-Play

Price war of deep digging big models: 15 companies compete with 45 models, who is really cheap and who "makes a pretence"?

The 100 model war has entered the deep water area, and large model manufacturers have also developed from volume performance, volume application to volume price. On the whole, however, most enterprises have left a way for themselves when they cut prices, such as only releasing lightweight models, or offering limited time discounts.

At present, most of the players involved in the price war are Internet giants with their own cloud services. For them, participating in the price war may be more out of consideration of strategic layout and market dominance, because they are able to withstand a certain degree of price fluctuations and cost pressure.

However, large model start-ups have relatively limited resources. In the price war, they may pay more attention to the balance between survival and profitability. They need to respond more carefully to price changes and provide competitive prices on the premise of ensuring a certain level of profitability.

- Intellectual things

Change in CIO of China Merchants Bank: Jiang Chaoyang raised China Merchants Group and has served as Chief Information Officer for 6 years

The senior management of CMB's technology business line has been significantly adjusted. Titanium Media App learned that Jiang Chaoyang, Chief Information Officer of China Merchants Bank, was transferred to General Manager of Strategic Development Department/Science and Technology Innovation Department of China Merchants Group; Zhou Tianhong, General Manager of Information Technology Department of China Merchants Bank, intends to take over the post of Chief Information Officer.

According to public data, Jiang Chaoyang, born in December 1967, has a master's degree in management science from Shanghai Jiaotong University and is a senior economist. He joined the Company in November 2013 and has successively served as the General Manager of the Head Office's Strategic Customer Department, the General Manager of the Head Office's Retail Internet Banking Department, the Deputy General Manager and General Manager of the Head Office's Wealth Management Department, and the Chief Information Officer of the Company since November 2019.

-Titanium Media

The A/B side of MoE efficient training: trade with the devil, and trade "video memory" for "performance"

In fact, even those players who choose the MoE architecture have to admit that although it has the advantages of efficient training, flexibility and interpretability, the limitations of MoE are still obvious, and it is "an art of compromise".

Song Chenyang said that the advantage of MoE is that it can significantly reduce the amount of calculation under the same parameter scale when the computing power is limited. However, if there is enough computing power to train a MoE model and a dense model with the same parameters, the dense model must be better in terms of performance.

In other words, under the same parameter scale, the dense model outperforms the MoE model. This is because, in essence, the expressive ability of FFN is unlimited, but the MoE model, in order to activate sparsely, imposes that only a part of the FFN after being cut down can be activated, which is equivalent to reducing the entire representable space of the model.

-Lei Feng

In the first quarter of this year, the total amount of financing in the global AI field reached 156.4 billion, down 31.2% year on year

On May 27, it was reported that PitchBook, a market research institution, had recently released the latest investment and financing data report in the field of global artificial intelligence (AI) and machine learning in Q1 2024.

The report shows that in the first quarter of 2024 (from January to March), the global AI field completed a total of 1779 financing transactions, raising a total of 21.6 billion US dollars (about 156.427 billion yuan) in venture capital, and the transaction value decreased 7.8% month on month and 31.2% year on year. Among the financing transactions in this quarter, $5.3 billion (about 38.383 billion yuan) came from large transactions of basic model companies of Anthropic, Mistral Al and xAl. By the second quarter of 2024, xAl and Mistral Al will continue the trend of large financing transactions. In addition, the report said that the median valuation of AI enterprises that completed 78 financing transactions reached $55 million.

-Titanium Media

New work by Chen Danqi's team: fine-tuning the 8B model to surpass Claude3 Opus, behind which is the new replacement of RLHF

SimPO of Chen Danqi's team, like the DPO proposed by Stanford, optimizes the reward function in RLHF.

In the traditional RLHF, the reward function is usually provided by an independent reward model, which requires additional training and reasoning; DPO uses the relationship between human preferences and model output to directly construct the reward function with the logarithmic probability of the language model, bypassing the training of the reward model.

Compared with DPO, SimPO is only designed based on the current optimized model π _ θ, completely getting rid of the dependence on the reference model π _ref.

Specifically, SimPO uses the logarithmic probability of length normalization as the reward function.

-Qubit


Recommended today

Open source project

linuxmint/timeshift

https://github.com/linuxmint/timeshift

Timeshift is an application that provides functions similar to the system restore function in Windows and the Time Machine tool in mac OS. Timeshift protects your system by periodically taking incremental snapshots of the file system. These snapshots can be restored later to undo all changes to the system.

A daily blog

My CEO thinks that any technical manager is redundant

I recently joined a start-up company to manage a team of about 40 engineers and serve as the vice president of technology. However, I had a big conflict with the CEO (formerly an engineer) on whether to hire a full-time technical manager. At present, engineers are divided into small teams of 3-4 people. Each team has an engineer leader who is responsible for leading the team, but their main responsibility is still to write code and deliver products.

 file


Event comments

Problems between GNOME and the sponsor STF, or affecting the development of the project

Last month, German sovereign technology fund Sovereign Tech Fund (STF) announce Continue to invest in GNOME. Thanks to the support of STF, the GNOME project has also made many remarkable progress in recent months.

But its status updates in the last week point out There are some problems between STF and GNOME Foundation, which are not specified, but may affect the follow-up development of the project.

comment

As an open source project, GNOME is highly dependent on external funds. STF's financial support is crucial to the sustainable development of the project, and any financial problems may have a significant impact on the progress of the project. The healthy relationship between the sponsor and the open source project is critical to the stability and predictability of the project. Problems between STF and GNOME Foundation may cause concern of community members and users.

This uncertainty may affect the morale of community members and the long-term planning of the project. Members of the GNOME community may feel uneasy about funding, which may affect their motivation and commitment to the project. When there are problems between the sponsors and the Open Source Foundation, maintaining transparency and open communication is the key to solving the problems. This helps maintain community trust and confidence in the project.

Such events highlight the importance of diversified funding sources for open source projects to reduce dependence on a single sponsor and enhance the risk resistance of projects. This event also provided a warning for other open source projects, that is, the need to establish a solid financial and cooperative relationship, as well as develop strategies to deal with potential risks.

If the problem involves policies or regulations, relevant institutions may be required to intervene to ensure that the legitimate rights and interests of open source projects are protected. This event may also prompt the open source community and sponsors to think about the future cooperation model and how to establish a more stable and sustainable cooperative relationship.

RustRover officially released, free for personal non-commercial use

For RustRover, JetBrains provides a new licensing model, including a paid commercial license and a free personal non-commercial license, which is suitable for individuals to use RustRover for non-commercial purposes.

comment

The release of RustRover is a positive signal for the developers of Rust language. It shows JetBrains' support for the Rust ecosystem and its commitment to providing professional development tools.

By providing paid commercial licenses and free personal non-commercial licenses, JetBrains has demonstrated flexibility, which helps attract a broader user base. This trust based "phone system" licensing model is a novel attempt in the industry. Free individual non-commercial license reduces the threshold for developers to try and adopt RustRover, helps promote the use of Rust language, and may also attract new developers to join the Rust community.

In addition, the release of RustRover may further promote the popularization and industrial application of Rust language. JetBrains also shows support for the open source community by providing free and discounted licenses, which may strengthen the company's ties with the open source community.

Research: ChatGPT has an error rate of 52% when answering programming questions

52% of ChatGPT answers contain incorrect information, 77% are too lengthy, and 78% are inconsistent with human answers to varying degrees. The results of in-depth manual analysis also show that there are a lot of conceptual and logical errors in ChatGPT answers.

comment

This research highlights that even advanced AI technologies such as ChatGPT may have limitations in dealing with problems in professional fields. The 52% error rate indicates that AI still has much room for improvement in providing programming solutions. Given the high error rate of AI in programming, this emphasizes the importance of professional education and training. Developers and programming learners should improve their skills through reliable resources and practices.

35% of the research participants prefer the answers of ChatGPT, which may be because the answers generated by AI are usually clear in language style. However, this also means that users may be vulnerable to error messages and need to be cautious about the answers provided by AI. 39% of users did not find any errors in the answers of ChatGPT, which indicates that it is necessary to improve users' critical thinking on AI generated content and their ability to identify errors.

The research results call for more rigorous audit and quality control of AI generated content. This may require additional tools and techniques to identify and correct incorrect information provided by AI. Companies and researchers developing AI solutions need to take responsibility to ensure that their products are as accurate and reliable as possible when they are introduced to the market.

This research also shows that the application of AI in the field of programming may be more suitable as an auxiliary tool, combined with the knowledge and experience of human experts, rather than as an independent solution provider. In the long run, if the answers provided by AI are frequently wrong, it may damage users' trust in AI technology and affect its application in a wider range of fields.

For open source communities like Stack Overflow, this study reminds them to be more careful when using AI auxiliary tools to ensure that the information provided by the community is accurate and useful.


Voice of Open Source

Media opinion

Altman went against the tide in May, and OpenAI scandal was involved

OpenAI and its CEO Altman should go to worship God.

......

Google and Microsoft have successively released their AI products and strategies, and GPT-4o has gradually been ignored by the industry, instead of making a big announcement about the AI conferences of several technology giants.

However, OpenAI is not idle. It is constantly exposed to scandals by the media, which puts this AI startup with an estimated value of 80 billion dollars in the forefront of the wave.

- BiaNews

AI PC becomes Ma Liang's "magic pen" and NPC becomes "cyber chatter"

Predictably, AI PC will be more and more widely used in all walks of life, and its penetration of entertainment, office, social networking and other scenes will also be gradually deepened. However, unlike many previously released technologies, the addition of generative AI is often "insensitive". For example, high-quality pictures drawn by Stable Diffusion can often be fake, while NPC under ACE technology is somewhat similar to real people, Not to mention that AI can also help achieve voice cloning and other capabilities.

For entertainment scenes, users who pursue high frame rate and more realistic images can enhance their experience by updating the GPU and choosing to turn on AI functions. In terms of productivity applications, GPU seems to be expected to become "invisible combat power".

Titanium Media

AI unicorns seek to sell themselves collectively, and a new round of shuffle begins

The shuffle period of the big model company has begun.

Suddenly, several star start-ups spread the news one after another, seeking acquisitions. They are all familiar names, and their past achievements are also good

AI industrial chain union

How is the new economic momentum possible? Open source, open, more patient, more rule of law

From the perspective of market competition, the open source model has accelerated the catching up of latecomers and limited the monopoly that may result from the increasing scale of the software industry. But at the same time, the scale increase in the era of general large models is more significant than that in the software industry, and the emergence of large models will have the ability that latecomers do not have. Jiang Xiaojuan believes that it is difficult to judge the future pattern from the current progress of closed source and open source, but no matter what model, she hopes to maintain the competitive market structure that has promoted innovation for many years, and support enterprises to further open source.

- Surging current affairs

User perspective

The software that has operated for 28 years is about to be shut down: in its heyday, Tencent has more than 100 million users, and Tencent has to call its father

  • Viewpoint 1: ICQ, OICQ, QQ does not stop, but exists in another form
  • Viewpoint 2: I remember correctly whether qq was made by referring to icq
    • Viewpoint 3: Well, you remember correctly, "reference"
  • Viewpoint 4: QQ used to be called oicq
  • Viewpoint 5: "When OICQ was opened, the chat record stopped in the late autumn of last year, and the last retention was not said."
  • Viewpoint 6: At that time, I could chat in English, and my friends were from all over the world
  • Viewpoint 7: I bought a computer at home in 1996 and dialed up early. I don't remember using ICQ very much. It seems that OICQ was used at the beginning
    • Viewpoint 8: OICQ hasn't existed in 1996
  • Viewpoint 9: Why not buy
  • Viewpoint 10: I thought I would die
  • Viewpoint 11: ICQ and MSN don't know what to think. They don't have the function of minimizing to a small icon. They must occupy a long bar in the taskbar
  • Viewpoint 12: Now I know why I recommend decentralized chat software, because centralized chat software is not reliable
  • Viewpoint 13: Open source is better than closed source. Some companies will panic

This is the most unbearable episode of Linus —— The phantom engine code specification prohibits the use of dirty words, slave, and master

  • Viewpoint 1: A colleague in our company likes to name the fuck function
  • Viewpoint 2: It is suggested to change it to master/servant
  • Viewpoint 3: Master/slave should not be used. It is too easy to misunderstand [Blush] The master/nigger is always used. The worker pool generally uses the cottonfield and passes the cotton
  • Viewpoint 4: I want to build a web framework called FuckAPI. Another Kubernetes tool is called Fuck8s. What do you think
  • Viewpoint 5: controller and worker, onlist and offlist. How about this? These broken things are so boring!
  • Viewpoint 6: It is suggested that English should also delete these words, and cultural solution is the real and thorough solution
  • Viewpoint 7: I once saw a discussion that said that parent task should be translated as "parent task" rather than "parent task". Personally, I feel that it has a taste of zzzq
  • Viewpoint 8: There really is a software called the fuck, which can be automatically corrected after the Linux terminal has typed the wrong command
  • Viewpoint 9: amateurs also like to take "fuck", and what follows "fuck" means to solve any problem
  • Viewpoint 10: I have turned to godot, and the illusion is too bloated. Remember that the game website itch has a limit on the size of uploaded games. Some people said that the games produced by themselves are too big to upload. The administrator asked that the previous Final Fantasy game is very small and there is a lot of content. How much actual content does your giant game have? The famous illusory game "Shrinking Adventures after School" has only 15 minutes of content, but it is several gigabytes in size.

RustRover officially released, free for personal non-commercial use

  • Viewpoint: 1: Please extend this licensing mode to all IDEAs
    • Viewpoint 2: Does it mean that all JetBrains products are free to individuals and non-commercial users?
  • Viewpoint 3: This connection needs to download 1G more backends remotely

China Telecom Releases the First Large Speech Model Supporting 30 Dialects

  • Viewpoint 1: Can I think that the biggest contribution of telecommunications is speech recognition?
  • Viewpoint 2: The training data must be the voice of the user who made the call. Use it after approval
    • Viewpoint 3: Can't the customer service call be recorded? When you dial, it is clear that the call will be recorded.
    • Viewpoint 4: Telecom does not have an open voice data set, nor does it recruit or outsource recording posts in different dialects. Where did the data come from? Do you listen to the user's call records and mark them internally? Train the model again?
    • Viewpoint 5: I think the probability is the recording of customer service calls, which is practical and not easy to cause disputes. Do not use the customer service recordings accumulated over the years, but monitor for training?

Can Microsoft Windows on ARM be successful? He also said that he would compete with Apple on the basis of a translation layer?

  • Viewpoint: 1: not optimistic [Covering his face] Because Microsoft never lets people down
  • Viewpoint 2: Concentrate on WSL, strive to be the best Linux distribution, or GUI desktop system, which has more prospects than compatible with old interfaces.
    • Viewpoint 3: It seems that wsl will soon be buried
    • Viewpoint 4: WSL will not go down to earth. Behind it is the Azure team. First, it is not bad for money. Second, it is also used internally. It is the WSA that goes down to earth. The machete department is only Microsoft can do
  • Viewpoint 5: It is clear that GPU is better, why use npu
    • Viewpoint 6: GPU is used for CNN algorithm with high power consumption and high cost. NPU has the advantage of low power consumption. I guess in the future, in order to develop the end AI of mobile devices or embedded devices.
  • Viewpoint 7: Good choice, except pure x86 native code NET Java or other applications are not a problem. Pure x86 is basically an old program, which is not a problem for modern CPUs and virtual machines at all. So this is a good choice. Intel has wasted a lot of time
  • Viewpoint 8: The endurance of the win notebook has always been a problem. If the arm architecture supports it, the power consumption will be reduced, and it will be great to achieve the endurance of the macbook
  • Viewpoint 9: 3 years later, because For this reason, we have to give up our support for arm. as for...... In our eyes, old users are inferior to dogs, so don't expect
  • Viewpoint 10: It mainly depends on whether the program of shell anti virtual machine anti debugging can be translated. If it can, then the advantage will come
  • Viewpoint 11: I think it's time to supplement the knowledge of ARM architecture
  • Viewpoint 12: Not optimistic about the huge investment in arm native applications, except that large companies may do
    • Viewpoint 13: See what kind of apps are available. Ordinary application level apps don't have much investment. Just recompile them

---END---

Finally, welcome to scan the code to download "Open Source China APP", read massive technical reports, and share with programmers and geeks!

Expand to read the full text
Click to lead the topic 📣 Post and join the discussion 🔥
zero comment
zero Collection
 Back to top
Top