Research: ChatGPT has an error rate of 52% when answering programming questions

Source: OSCHINA

2024-05-26 10:09:16

[Live Preview] How come Rust has not replaced C/C++for nearly ten years?

Researchers from Purdue University recently conducted a comprehensive research on the characteristics of ChatGPT answering programming questions Research 。 Through in-depth analysis of ChatGPT answers to 517 programming questions on Stack Overflow, the correctness, consistency, comprehensiveness and conciseness of ChatGPT answers are investigated; We also conducted large-scale language analysis and user research to understand the characteristics of ChatGPT answers from the aspects of language and humanization.

The results show that 52% of ChatGPT answers contain incorrect information, 77% are too lengthy, and 78% are inconsistent with human answers to varying degrees. The results of in-depth manual analysis also show that there are a lot of conceptual and logical errors in ChatGPT answers.

However, 35% of the research participants still prefer ChatGPT answers because they are comprehensive in content and clear in language style. 39% of the respondents did not find the wrong information in the answer to ChatGPT. "This means that it is necessary to counter ChatGPT's error messages when answering programming questions, and to raise people's awareness of the risks posed by seemingly correct answers."

Through the linguistic analysis of 2000 randomly selected ChatGPT answers, it is found that they are "more formal and analytical". At the same time, it also shows "less negative emotions", which is the kind of flat and pleasant tone typical AI tends to produce.

The researchers pointed out that it is precisely because of polite language, forceful and textbook style answers and comprehensiveness that ChatGPT answers seem more convincing, leading users to lower their vigilance and ignore some wrong information in ChatGPT answers.

“ Although our user research shows that users have a high preference and quality score for manual answers, users occasionally make mistakes, preferring incorrect ChatGPT answers and seemingly correct logic presented with positive assertions according to the language style of ChatGPT. Because ChatGPT will produce a large number of wrong answers, our research results emphasize that we must be cautious and vigilant when using ChatGPT answers in programming tasks. This work also aims to encourage further research on how to identify and reduce different types of conceptual and factual errors. Finally, we hope that this work can promote more research on transparency and communication of incorrect answers generated by machines, especially in programming. ”

Details can be View full report 。

Related reading: Stack Overflow and OpenAI reach cooperation

exidot 2024-09-26 16:54

I'm really worried if Linux loses Linus and other veteran figures. It is estimated that Linus is also considering the future of Linux by introducing Rust

Azeroth008 2024-07-09 10:43

It's good to have a self-developed operating system. What's the mentality of those who spray indiscriminately?

Outstanding people 2024-07-10 16:17

Can you lie flat, can you cut leeks, and spend money on research and development? For scolding? Say this can cut leeks? You were cut? Did you buy it? Who changes the mac every year and who changes the iphone every year? Huawei users don't seem to do that, do they? I can't stand it for a Xiaomi user!

Small and beautiful software development 2024-09-26 19:41

To put it bluntly, I can only indulge myself

Big back 2024-07-10 14:03

Then the traffic police find the responsible party and call the customer service of the generative AI, which is very powerful

-SORA- 2024-09-26 12:49

Simple grammar is not difficult, but it is not so easy to do a big project after engineering

Kevin586 2024-07-29 17:09

If you really want to reduce costs, you still need to change go. Java eats too much memory

blue_think 2024-08-26 11:00

Don't just talk about Huawei. Tell me about your own abilities, how far you have reached, and what achievements you have made. It's a bit persuasive anyway

fastfail 2024-09-26 15:00

I didn't learn Rust

infoworld 2024-09-11 18:00

Thanks, it is your pioneering work that can avoid being monopolized by foreign systems and applications.

cyclamenkde 2024-09-26 15:25

OpenHarmony based OS, x86 and arm supported

Binx 2024-09-07 08:28

It's better to increase the Apple tax to 80%, otherwise how can you show your distinguished Apple user identity

songdragon 2024-08-14 13:11

There are several problems with the conditions for this comparison. 1. Solomon uses smart http and spring uses undertow 2 The automatic configuration of solon startup itself is less than that of spring, which determines the different dimensions of comparison. The reason for better performance is probably due to the dependency of web server and application configuration. If you want to align, you need to use the same web server. The spring application excludes all automatic configurations and only retains what is necessary for the web to explain the performance gap of the framework. Now, this result does not mean that solon itself has good performance.

Wise sermon 2024-08-13 12:02

No matter who is fighting in Ping'an County, our 358 Regiment will help the field!

My name is Li Gue 2024-09-26 13:53

This is a bit interesting

z-zg 2024-09-26 13:07

Indeed, it was all the trouble caused by the original bustle. However, with Oracle's cancerous character, it will not be given.

Solitary Demon Xia 2024-09-26 15:48

😍

Tobyee 2024-07-09 11:04

No GMS is an excuse. In essence, we still don't want to adapt to domestic mobile phone systems. When Hongmeng Next comes out, we can see whether Microsoft will embrace or not

tsdyy 2024-09-26 11:46

Who asked you to change your name from livescript to javascript in order to rub off the popularity of Java? I have to pay it back.

kakai 2024-09-07 10:39

Why did WeChat offend you? In any case, it is beneficial for the Chinese people to let Apple reduce the tax rate in China even if WeChat does it for its own commercial interests. This tax rate is not only for WeChat, but also for Apple. What a stupid and shameful statement!

Jane Roemer 2024-08-12 19:31

Are you connected to the Internet at home? Godson has abandoned MIPS for a long time, and now it is LoongArch. Take a look: https://loongarch.dev/zh-cn/posts/20210501-loongarch-manual/

osc_97949904 2024-09-26 11:19

How to highlight a row in a table in a diagram?

yiyanxiyin 2024-09-26 11:14

How to say, if we regard C language as a tool, it is not difficult to learn how to use it basically, but it can make earth shaking things, which is difficult to do. C language is like a pen, which everyone can use, but everyone writes different characters, and the difference is not a problem of a pen

-SORA- 2024-09-26 12:45

Similarly, both the iPhone and IOS are in Cisco's hands

PynixWang 2024-09-26 14:36

It is suggested to use the digital version instead of the year version. The digital version is convenient for ticket skipping. The year version was very awkward.

Small and beautiful software development 2024-09-26 19:42

This is to sell money with other people's resources. Now all kinds of ai are irregular

liming0101 2024-09-10 09:09

What Naji things, but also touch the porcelain black myth

osc_73214294 2024-09-26 18:02

Next is customized by Huawei based on openharmony. If you say Android is open source, it should correspond to openharmony, which is open source. The customized UIs of various companies will not be open source.

I have I can 2024-07-09 11:40

The essence of sprayers is to find reasons for their own darkness and inferiority.

osc_50722289 2024-09-06 13:51

If Apple doesn't give in and WeChat doesn't give in, it will look good! WeChat goes deep into ordinary people's homes in China! Payment social WeChat is inseparable. If WeChat is not updated on IOS, Apple "doesn't have to mix"

Flat wave 2024-09-26 15:06

VERY GOOD

Flat wave 2024-09-26 15:07

VERY GOOD!

fzn0268 2024-09-04 14:26

This is the guy who makes code generator

kakai 2024-09-10 12:23

You don't know anything, but you come here with your mouth open. The reason why the exclusive channel draws high is that there is exclusive channel traffic, which means that the platform is guiding your operation. As long as you collect money, 50% is less, and Apple is guiding you? Give more respect to the apple, it will make you rich.

Small and beautiful software development 2024-09-26 19:47

I don't know who invented the name and then had to get an internal version number

Black toothpaste 2024-07-21 12:12

A real person is invincible if he has no shame. As long as he is not embarrassed, others are embarrassed.

osc_566335 2024-08-01 15:05

"Although they only have college education" - college education is also considered as higher education, is it a level of illiteracy in the mouth of these media now?

Flat wave 2024-09-26 15:02

very good!

Yanlongli 2024-07-11 17:28

It reduces the visual complexity and increases the operation complexity.

1km 2024-09-26 14:44

Now AI is getting bigger and bigger

Flat wave 2024-09-26 16:11

osc_73214294 2024-08-05 10:19

I thought there would be more in-depth comments below the articles on this platform, and there would be many blowers.

0day 2024-07-21 11:52

A rogue should talk about security?

Small and beautiful software development 2024-09-26 19:44

That's the same thing. Don't be moral. It's yours to kidnap you

My name is Li Gue 2024-09-26 12:28

Read more and write less

Flat wave 2024-09-26 12:59

I actually have powder, which is really effective; 😂

Qsion 2024-09-26 14:39

The crow in the world is black, just choose the one you believe in.

HalLi 2024-09-09 01:10

Even if ordinary users don't understand it, why don't even programmers understand it? Apple is 30% of the whole platform, and domestic channel service is 50%. Where does the big app like WeChat and Tiao Yin get channel service? Besides games, which app brings channel service.

kushu001 2024-08-14 15:24

Why must we emphasize "domestic"? Is it an open source project? If open source, won't you accept the contribution of foreign developers? I'm just curious, can't I promote without "domestic" 😀

iCooook 2024-09-26 11:21

👏

geeaks 2024-09-26 10:40

HarmonyOS hurry to unify all ends

Artrener 2024-07-21 15:12

It can be seen that he is unhappy and 360 is unhappy, but what others say is the truth. For example, people in the aviation industry also said so.

two hundred and seventy-nine million seven hundred and seventy-eight thousand three hundred and twenty-five 2024-08-16 16:22

It's not easy to have a domestic development platform, but only to belittle it without encouragement, even if the propaganda is exaggerated? So what are you really doing? Why not go to the manufacturer one by one with exaggerated advertising everywhere? Criticize and think about whether you can make one at the same time? Why do we have to be so honest when we add the word "domestic"?

h4cd 2024-09-26 10:17

This picture of you 👮

zb79463626 2024-08-26 15:51

What research and development does IBM China have? All tests! The so-called people engaged in research and development are all working for the elderly!

Strong ice 2024-07-22 08:41

It's better to say that 90% of domestic computers do not have CrowdStrike software installed

Flat wave 2024-07-07 16:54

After eating, I smashed the pot, as if it were pure blood. After eating, I wanted to smack the pot of millet, popo and vivo; 😂

dwingo 2024-07-18 10:12

It's not that jni and unsafe are not allowed to be used. It's just a "restriction". You can continue to use them as long as you add command line parameters. The purpose is to let users consider the security of the program

yiyanxiyin 2024-09-26 11:13

How to say, if we regard C as a tool, it is not difficult to learn how to use it basically, but it can make earth shaking things. It is difficult to do it. C is like a pen that everyone can use, but everyone can write different characters, and the difference is not a matter of a pen

Small and beautiful software development 2024-09-26 19:45

Ordinary people still use broadband wifi

Research: ChatGPT has an error rate of 52% when answering programming questions

Hot content

Popular comments of the whole site

Hot News

Excellent column

Advanced developers have a deep understanding of the underlying technical principles of the Linux kernel

How do programmers get started with AI application development?

Talk about Unity and Native Bridging

Extreme governance of Baidu search results fluctuation

58 Algorithm Practice in Commercial Search Scenarios

Development history and status quo of 10 database technologies

Hot software

OSCHINA Community

Online tools

Introduction

QQ group

Public account

Video number

Research: ChatGPT has an error rate of 52% when answering programming questions

Hot content

Popular comments of the whole site

Hot News

Excellent column

Advanced developers have a deep understanding of the underlying technical principles of the Linux kernel

How do programmers get started with AI application development?

Talk about Unity and Native Bridging

Extreme governance of Baidu search results fluctuation

58 Algorithm Practice in Commercial Search Scenarios

Development history and status quo of 10 database technologies

Recommended attention

Hot software

OSCHINA Community

Online tools

Introduction

QQ group

Public account

Video number