Research: ChatGPT has an error rate of 52% when answering programming questions

Source: OSCHINA
2024-05-26 10:09:16

Researchers from Purdue University recently conducted a comprehensive research on the characteristics of ChatGPT answering programming questions Research Through in-depth analysis of ChatGPT answers to 517 programming questions on Stack Overflow, the correctness, consistency, comprehensiveness and conciseness of ChatGPT answers are investigated; We also conducted large-scale language analysis and user research to understand the characteristics of ChatGPT answers from the aspects of language and humanization.

The results show that 52% of ChatGPT answers contain incorrect information, 77% are too lengthy, and 78% are inconsistent with human answers to varying degrees. The results of in-depth manual analysis also show that there are a lot of conceptual and logical errors in ChatGPT answers.

However, 35% of the research participants still prefer ChatGPT answers because they are comprehensive in content and clear in language style. 39% of the respondents did not find the wrong information in the answer to ChatGPT. "This means that it is necessary to counter ChatGPT's error messages when answering programming questions, and to raise people's awareness of the risks posed by seemingly correct answers."

Through the linguistic analysis of 2000 randomly selected ChatGPT answers, it is found that they are "more formal and analytical". At the same time, it also shows "less negative emotions", which is the kind of flat and pleasant tone typical AI tends to produce.

The researchers pointed out that it is precisely because of polite language, forceful and textbook style answers and comprehensiveness that ChatGPT answers seem more convincing, leading users to lower their vigilance and ignore some wrong information in ChatGPT answers.

Although our user research shows that users have a high preference and quality score for manual answers, users occasionally make mistakes, preferring incorrect ChatGPT answers and seemingly correct logic presented with positive assertions according to the language style of ChatGPT. Because ChatGPT will produce a large number of wrong answers, our research results emphasize that we must be cautious and vigilant when using ChatGPT answers in programming tasks. This work also aims to encourage further research on how to identify and reduce different types of conceptual and factual errors. Finally, we hope that this work can promote more research on transparency and communication of incorrect answers generated by machines, especially in programming.

Details can be View full report  

Related reading: Stack Overflow and OpenAI reach cooperation

Expand to read the full text
Click to join the discussion 🔥 (6) Post and join the discussion 🔥
This wonderful review
Wrong integrity and solemnity
2024-05-27 10:07
two fabulous
report
six comment
three Collection
 Back to top
Top