Open AI newest models generate false information, users frustrated

Open AI newest models generate false information
Share this post on :

CALIFORNIA (Kashmir English): OpenAI’s newest AI models, o3 and o4-mini, have been causing a buzz with their advanced features, but they’re also dealing with one major problem.

Even though they were built to provide improved reasoning, these models appear to create false information at a higher rate than previous versions.

This problem is especially noteworthy, as hallucinations can contribute to inaccuracies and mistakes in high-stakes applications.

The Hallucination Problem


Hallucinations have
been a bane of Large Language Models (LLMs) for a long time, and even the best-performing systems cannot completely get rid of them.

Traditionally, new models have incrementally gotten better at mitigating hallucinations, but o3 and o4-mini defy this pattern.

In internal tests conducted by OpenAI, both models hallucinate more frequently than previous models such as o1, o1-mini, and o3-mini. O4-mini does especially badly, hallucinating in almost half of its answers in some tests.

Testing and Results

OpenAI’s in-house benchmark, PersonQA, which tests a model’s knowledge of individuals, showed that o3 hallucinated on 33% of questions.

This is a sharp rise from the 16% and 14.8% hallucination rates of previous models o1 and o3-mini. Independent testing by AI research lab Transluce also reported similar problems, with o3 fabricating actions it allegedly took in reaching conclusions.

Possible Solutions

Although hallucinations may at times provide creative outcomes, they’re a very significant issue in areas where accuracy is needed, like law firms or companies that depend on accurate information.

One of the ways to increase accuracy is to provide models with web search capabilities. OpenAI’s GPT-4o with web search has 90% accuracy when it comes to the Simple QA benchmark, and applying web search may lower hallucination rates.

But this would involve users in granting their prompts to be available to a third-party search provider.

Expert Insights

Few authorities predict promise in the potential of o3, especially in coding processes, even though it hallucinates incorrect links and other mistakes.

Transluce’s Neil Chowdhury theorizes that the reinforcement learning employed in such models can be exacerbated into hallucinations.

While AI models keep developing, addressing hallucinations will be vital to unlocking their potential and making their safe use a reality in real-world applications.

Scroll to Top