News Warner Logo

News Warner

OpenAI gets caught vibe graphing

OpenAI gets caught vibe graphing

  • OpenAI’s GPT-5 livestream showcased impressive charts, but upon closer inspection, some graphs were found to be inaccurate.
  • The most egregious error was in a chart comparing deception rates between GPT-5 and OpenAI’s o3 model, where the scale was inconsistent and led to a misleading comparison.
  • CEO Sam Altman called out the mistake, labeling it a “mega chart screwup,” but noted that a corrected version of the chart is available on OpenAI’s blog post.
  • An OpenAI marketing staffer apologized for the error, stating that they had fixed the chart in the blog post and acknowledged it as an “unintentional chart crime.”
  • The incident has raised questions about whether OpenAI used GPT-5 to create the charts, which could have been a mistake given the company’s claims of significant advances in reducing hallucinations with its new model.

Something’s off with that chart on the left.

During its big GPT-5 livestream on Thursday, OpenAI showed off a few charts that made the model seem quite impressive — but if you look closely, some graphs were a little bit off.

In one, ironically showing how well GPT-5 does in “deception evals across models,” the scale is all over the place. For “coding deception,” for example, the chart shown onstage says GPT-5 with thinking apparently gets a 50.0 percent deception rate, but that’s compared to OpenAI’s smaller 47.4 percent o3 score which somehow has a larger bar. OpenAI appears to have accurate numbers for this chart in its GPT-5 blog post, however, where GPT-5’s deception rate is labeled as 16.5 percent.

With this chart, OpenAI showed onstage that one of GPT-5’s scores is lower than o3’s but is shown with a bigger bar. In this same chart, o3 and GPT-4o’s scores are different but shown with equally-sized bars. It was bad enough that CEO Sam Altman commented on it, calling it a “mega chart screwup,” though he noted that a correct version is in OpenAI’s blog post.

An OpenAI marketing staffer also apologized, saying, “We fixed the chart in the blog guys, apologies for the unintentional chart crime.”

OpenAI didn’t immediately respond to a request for comment. And while it’s unclear if OpenAI used GPT-5 to actually make the charts, it’s still not a great look for the company on its big launch day — especially when it is touting the “significant advances in reducing hallucinations” with its new model.

link

Q. Who pointed out that OpenAI’s chart was incorrect?
A. Shrey Kothari (@shreyk0)

Q. What did CEO Sam Altman call the mistake in the chart?
A. A “mega chart screwup”

Q. Where can you find a correct version of the chart?
A. In OpenAI’s blog post

Q. Who apologized for the mistake on Twitter?
A. An OpenAI marketing staffer

Q. Why is it concerning that OpenAI didn’t immediately respond to a request for comment?
A. It suggests that they may have used GPT-5 to create the charts, which could be seen as misleading.

Q. What was one of the scores that was incorrectly represented in the chart?
A. GPT-5’s deception rate

Q. How did OpenAI’s smaller model (o3) compare to GPT-5 in terms of deception rates?
A. o3 had a 47.4% deception rate, while GPT-5 had a 50.0% deception rate.

Q. What was the issue with how o3 and GPT-4o’s scores were represented in the chart?
A. They were shown with equally-sized bars despite being different.

Q. How did CEO Sam Altman describe the mistake in the chart?
A. He called it a “mega chart screwup”

Q. Why is this incident concerning for OpenAI, especially on its launch day?
A. It suggests that they may have made mistakes or misrepresentations with their new model, which could impact trust and credibility.

Q. What was one of the scores that GPT-5 showed as being lower than o3’s but was shown with a bigger bar?
A. One of GPT-5’s scores