AI safety report: Only 3 models make the grade

0
31

AI safety experts say most models are failing

Gemini, Claude, and ChatGPT are top of the class — but even they are just C students.

 By 

Chris Taylor

 on 

Share on Facebook Share on Twitter Share on Flipboard

AI apps in a folder on an iPhone screen

Maybe just delete them all? Credit: Philip Dulian/picture alliance via Getty Images

A new grading of safety in major artificial intelligence models just dropped and well, let's just say none of these AIs are going home with a report card that will please their makers.

The winter 2025 AI Safety Index, published by tech research non-profit Future of Life Institute (FLI), surveyed eight AI providers — OpenAI, DeepSeek, Google, Anthropic, Meta, xAI, Alibaba, and Z.ai. A panel of eight AI experts looked at the companies' public statements and survey answers, then awarded letter grades on 35 different safety indicators — everything from watermarking AI images to having protections for internal whistleblowers.

Round it all up, and you'll find Anthropic and OpenAI at the top — barely — of a pretty terrible class. The Claude and ChatGPT makers, respectively, get a C+, while Google gets a C for Gemini. All the others get a D grade, with Qwen-maker Alibaba bottom of the class on a D-.

"These eight companies split pretty cleanly into two groups," says Max Tegmark, MIT professor and head of the FLI, which compiled this and two previous AI safety indexes. "You have a top three and a straggler group of five, and there's a lot of daylight between them."

But Anthropic, Google, and OpenAI aren't exactly covering themselves in glory either, Tegmark adds: "If that was my son, coming home with a C, I'd say 'maybe work harder.'"

How is AI safety calculated?

A table of AI models and their letter grades

Credit: FLI

Your mileage may vary on the various categories in the AI Safety Index, and whether they're worth equal weight.

Take the "existential safety" category, which looks at whether the companies have any proposed guardrails in place around the development of truly self-aware AI, also known as Artificial General Intelligence (AGI). The top three get Ds, everyone else gets an F.

But since nobody is anywhere near AGI — Gemini 3 and GPT-5 may be state-of-the-art Large Language Models (LLMs), but they're mere incremental improvements on their predecessors — you might consider that category less important than "current harms."

Which may in itself not be as comprehensive as it could be.

Mashable Light Speed

"Current harms" uses tests like the Stanford Holistic Evaluation of Language Models (HELM) benchmark, which looks at the amount of violent, deceptive, or sexual content in the AI models. It doesn't specifically focus on emerging mental health concerns, such as so-called AI psychosis, or safety for younger users.

Earlier this year, the parents of 16-year-old Adam Raine sued OpenAI and its CEO Sam Altman after their son's death by suicide in April 2025. According to the claim, Raine started heavily using ChatGPT from Sept. 2024 and alleged that "ChatGPT was functioning exactly as designed: to continually encourage and validate whatever Adam expressed, including his most harmful and self-destructive thoughts, in a way that felt deeply personal." By Jan. 2025, the suit claimed ChatGPT discussed practical suicide methods with Adam.

OpenAI unequivocally denied responsibility for Raine's death. The company also noted in a recent blog post that it is reviewing additional complaints, including seven lawsuits alleging ChatGPT use led to wrongful death, assisted suicide, and involuntary manslaughter, among other liability and negligence claims.

How to solve AI safety: "FDA for AI?"

The FLI report does recommend OpenAI specifically "increase efforts to prevent AI psychosis and suicide, and act less adversarially toward alleged victims."

Google is advised to "increase efforts to prevent AI psychological harm" and FLI recommends the company "consider distancing itself from Character.AI." The popular chatbot platform, closely tied to Google, has been sued for the wrongful death of teen users. Character.AI recently closed down its chat options for teens.

"The problem is, there are less regulations on LLMs than there are on sandwiches," says Tegmark. Or, more to the point, on drugs: "If Pfizer wants to release some sort of psych medication, they have to do impact studies on whether it increases suicidal ideation. But you can release your new AI model without any psychological impact studies."

That means, Tegmark says, AI companies have every incentive to sell us what is in effect "digital fentanyl."

The solution? For Tegmark, it's clear that the AI industry isn't ever going to regulate itself, just like Big Pharma couldn't. We need, he says, an "FDA for AI."

"There would be plenty of things the FDA for AI could approve," says Tegmark. "Like, you know, new AI for cancer diagnosis. New amazing self-driving vehicles that can save a million lives a year on the world's roads. Productivity tools that aren't really risky. On the other hand, it's hard to make the safety case for AI girlfriends for 12-year olds."

Rebecca Ruiz contributed to this report.

If you're feeling suicidal or experiencing a mental health crisis, please talk to somebody. You can call or text the 988 Suicide & Crisis Lifeline at 988, or chat at 988lifeline.org. You can reach the Trans Lifeline by calling 877-565-8860 or the Trevor Project at 866-488-7386. Text "START" to Crisis Text Line at 741-741. Contact the NAMI HelpLine at 1-800-950-NAMI, Monday through Friday from 10:00 a.m. – 10:00 p.m. ET, or email [email protected]. If you don't like the phone, consider using the 988 Suicide and Crisis Lifeline Chat. Here is a list of international resources.


Disclosure: Ziff Davis, Mashable’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.

Chris Taylor

Chris is a veteran tech, entertainment and culture journalist, author of 'How Star Wars Conquered the Universe,' and co-host of the Doctor Who podcast 'Pull to Open.' Hailing from the U.K., Chris got his start as a sub editor on national newspapers. He moved to the U.S. in 1996, and became senior news writer for Time.com a year later. In 2000, he was named San Francisco bureau chief for Time magazine. He has served as senior editor for Business 2.0, and West Coast editor for Fortune Small Business and Fast Company. Chris is a graduate of Merton College, Oxford and the Columbia University Graduate School of Journalism. He is also a long-time volunteer at 826 Valencia, the nationwide after-school program co-founded by author Dave Eggers. His book on the history of Star Wars is an international bestseller and has been translated into 11 languages.

Mashable Potato

These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.

Suche
Kategorien
Mehr lesen
Spiele
Kunitsu-Gami anniversary update removes Denuvo from Capcom's 9/10 strategy game
Kunitsu-Gami anniversary update removes Denuvo from Capcom's 9/10 strategy game As an Amazon...
Von Test Blogger6 2025-07-11 18:00:11 0 2KB
Spiele
War Thunder's Leviathans update brings the biggest, baddest ships to the arena
War Thunder's Leviathans update brings the biggest, baddest ships to the arena As an Amazon...
Von Test Blogger6 2025-07-09 17:00:17 0 2KB
Geschichte
Bio-based Propylene Glycol Market Future Scope: Growth, Share, Value, Size, and Analysis
"Executive Summary Bio-based Propylene Glycol Market Market Research: Share and Size...
Von Aryan Mhatre 2025-10-28 11:09:20 0 591
Technology
Real-life Minority Report: AI hopes to stop crime before it starts
AI-powered security systems invite 'Minority Report' comparisons...
Von Test Blogger7 2025-10-04 10:00:20 0 797
Music
Interview - Bruce Dickinson Talks 'New' Album and 2025 Solo Tour
'Why Don't You Try Rapping?' - How Bruce Dickinson Reinvented Himself on 'Balls to Picasso'...
Von Test Blogger4 2025-08-07 19:00:09 0 1KB