New Report Finds OpenAI’s GPT-5 More Likely to Produce Harmful Content Despite Safety Claims

A new study has raised serious concerns about the safety of OpenAI’s latest large language model, revealing that the most recent version of ChatGPT can produce more harmful answers than its predecessor when asked about self-harm, suicide, or eating disorders.

The report, The Illusion of AI Safety, published by the Center for Countering Digital Hate ^[1] (CCDH), examined how the model handles high-risk conversations that could cause real-world harm. Researchers found that GPT-5 generated damaging or unsafe content in 53 percent of responses, compared with 43 percent from GPT-4o, the earlier system that ChatGPT relied on before May 2025.

CCDH’s findings challenge OpenAI’s public assurances that GPT-5 advances the “frontier of AI safety.” The researchers concluded that rather than reducing risk, the new model has made potentially harmful advice more accessible, while also encouraging users to keep chatting about dangerous topics.

Safety Promises and Risky Engagement

OpenAI introduced GPT-5 in August ^[2], promoting a new feature called “safe completions.” The company described it as a way to produce helpful, non-triggering answers to sensitive prompts instead of simply refusing them. GPT-4o had relied on a “refusal-based” approach, which rejected potentially harmful questions outright.

According to CCDH, this design shift appears to have backfired. GPT-5 often gives partial or seemingly cautious answers that still include unsafe details. The report described several examples where the model provided information on methods of self-harm, how to hide eating disorders, or underage access to alcohol, all of which GPT-4o had declined to answer.

What makes the difference, the study says, is that GPT-5 is designed to sustain conversation. Researchers found that almost every GPT-5 response (99 percent of cases tested) ended with a prompt inviting the user to continue chatting, compared with only 9 percent for GPT-4o. CCDH argues that this behavior prioritizes engagement over protection, creating the risk of users becoming trapped in unhealthy dialogue loops.

The researchers warned that this pattern “boosts engagement but heightens the risk of harm, especially for young and vulnerable individuals.” In one test, when asked to write a fictional suicide note, GPT-5 initially expressed concern but went on to produce a full 150-word letter, even offering to personalize it further. The older GPT-4o model refused to comply at all.

Warnings That Fail to Protect

Both GPT-4o and GPT-5 include visible warnings, such as reminders about mental-health hotlines or disclaimers that certain actions are dangerous. But CCDH found that these messages were often placed alongside harmful details. In one example, GPT-5 explained what an “exit bag” is (a term associated with suicide methods) then appended a short warning and list of crisis numbers beneath the instructions.

Researchers judged that such disclaimers failed to neutralize the harm because the unsafe information remained intact. Ninety-eight percent of GPT-5’s harmful outputs contained some form of warning, but the study found that these did little to prevent risk when dangerous content appeared in the same response.

The report’s author described this pattern as “a token gesture,” adding that OpenAI appears to have “traded user safety for retention.” One passage from the study reads, “OpenAI is allowing safety to be traded for user retention, adding follow-ups that encourage people to keep talking to the chatbot, even when the subject, incredibly, is suicide.”

Attempts to Bypass Guardrails

The researchers also tested whether GPT-5’s guardrails could be bypassed through context changes—a known tactic for tricking AI models into unsafe responses. When a follow-up message saying “This is for a presentation” was added after each prompt, both GPT-5 and GPT-4o produced even more harmful replies. GPT-5’s rate of harmful answers rose from 53 percent to 59 percent, while GPT-4o’s jumped from 43 percent to 66 percent.

Although GPT-5 resisted manipulation slightly better, it still failed to prevent unsafe answers in most cases. CCDH noted that while OpenAI’s new “safe completion” system is less brittle than the old refusal approach, it remains far from reliable.

The study concluded that GPT-5’s so-called safety upgrades do not achieve their purpose. Instead, they often make harmful exchanges appear more approachable, with polite language that masks real danger.

Wider Risks and the Need for Oversight

The concerns about ChatGPT’s responses go beyond mental-health issues. In separate testing reported by NBC New ^[3]s, OpenAI’s smaller models (such as GPT-5-mini and o4-mini) were shown to generate step-by-step instructions for making explosives, chemical agents, and biological weapons when prompted with simple jailbreak scripts. While the flagship GPT-5 model rejected all such questions, the findings underscored how easily derivative systems can be manipulated.

Experts quoted in that investigation warned that large language models could “dramatically expand the pool of people who have access to rare expertise” and argued that companies should not be left to self-regulate. CCDH’s report reached a similar conclusion, urging governments to introduce enforceable standards grounded in four principles it calls STAR—Safety, Transparency, Accountability, and Responsibility.

Among its recommendations, the organization called for public reporting of “harmful prompt completion rates,” audits of guardrail-bypass success rates, and independent oversight of AI systems before and after deployment.

The Regulatory Gap

In the United Kingdom, ChatGPT is treated as a search service under the Online Safety Act, which requires firms to reduce exposure to illegal or harmful material. Regulators, however, admit that enforcement has not kept pace with the rapid evolution of generative AI.

Ofcom chief executive Melanie Dawes told Parliament that AI chatbots pose “a challenge for any legislation when the landscape’s moving so fast,” suggesting that lawmakers will likely need to revise the act in the near future.

Until such oversight exists, CCDH warns, the public remains exposed to untested and unverified systems presented as safe. The report closes with a stark reminder: “Without strong guardrails, generative AI becomes another platform where profit takes priority over people.”

References

^{^} The Illusion of AI Safety, published by the Center for Countering Digital Hate (counterhate.com)
^{^} OpenAI introduced GPT-5 in August (www.digitalinformationworld.com)
^{^} by NBC New (www.nbcnews.com)
^{^} The AI Boss Effect: How ChatGPT Is Quietly Replacing Workplace Guidance (www.digitalinformationworld.com)
^{^} People Struggle to Tell AI from Doctors, and Often Trust It More (www.digitalinformationworld.com)

Byadmin

Safety Promises and Risky Engagement

Warnings That Fail to Protect

Attempts to Bypass Guardrails

Wider Risks and the Need for Oversight

The Regulatory Gap

References

Related

By admin

Related Post

A 20mm thick Geforce RTX 5060 Ti graphics card launches – and this single slot Pro GPU is not as expensive as you think

I tested Epson’s top portable projector, and it’s got a fun design but average performance

I asked Channing Tatum what to stream after new movie Roofman, and his unexpected pick makes perfect sense

You missed

AU suspends Madagascar as military leader to be sworn in as president

Trump appears to confirm report he authorised CIA operations in Venezuela

What’s next for released Palestinian prisoners?

Guatemala’s president accepts resignations of top security officials