Google finds AI chatbots are only 69% accurate… at best

AI chatbots still get one in three answers wrong

By Manisha Priyadarshini Published December 15, 2025

phone-showing-ai-chatbots — Solen Feyissa / Unsplash

Google has published a blunt assessment of how reliable today’s AI chatbots really are, and the numbers are not flattering. Using its newly introduced FACTS Benchmark Suite, the company found that even the best AI models struggle to break past a 70% factual accuracy rate. The top performer, Gemini 3 Pro, reached 69% overall accuracy, while other leading systems from OpenAI, Anthropic, and xAI scored even lower. The takeaway is simple and uncomfortable. These chatbots still get roughly one out of every three answers wrong, even when they sound confident doing it.

The benchmark matters because most existing AI tests focus on whether a model can complete a task, not whether the information it produces is actually true. For industries like finance, healthcare, and law, that gap can be costly. A fluent response that sounds confident but contains errors can do real damage, especially when users assume the chatbot knows what it is talking about.

What Google’s accuracy test reveals

The FACTS Benchmark Suite was built by Google’s FACTS team with Kaggle to directly test factual accuracy across four real-world use. One test measures parametric knowledge, which checks whether a model can answer fact-based questions using only what it learned during training. Another evaluates search performance, testing how well models use web tools to retrieve accurate information. A third focuses on grounding, meaning whether the model sticks to a provided document without adding false details. The fourth examines multimodal understanding, such as reading charts, diagrams, and images correctly.

The results show sharp differences between models. Gemini 3 Pro led the leaderboard with a 69% FACTS score, followed by Gemini 2.5 Pro and OpenAI’s ChatGPT-5 nearly at 62% percent. Claude 4.5 Opus landed at ~51% percent, while Grok 4 scored ~54%. Multimodal tasks were the weakest area across the board, with accuracy often below 50%. This matters because these tasks involve reading charts, diagrams, or images, where a chatbot could confidently misread a sales graph or pull the wrong number from a document, leading to mistakes that are easy to miss but hard to undo.

Recommended Videos

The takeaway isn’t that chatbots are useless, but blind trust is risky. Google’s own data suggests AI is improving, yet it still needs verification, guardrails, and human oversight before it can be treated as a reliable source of truth.

Manisha Priyadarshini

News Writer

Manisha likes to cover technology that is a part of everyday life, from smartphones & apps to gaming & streaming…

Topics

Computing

ChatGPT now lets you create and edit images faster and more reliably

Its new image generation model promises up to four times faster creation and more precise edits.

ChatGPT Image announcement banner.

After upgrading ChatGPT with its latest GPT-5.2 model last week, OpenAI has rolled out a major improvement for the chatbot's image generation capabilities, positioning it as a strong competitor to Google's Nano Banana Pro. Powered by OpenAI's new flagship image generation model, the latest version of ChatGPT Images promises up to four times faster image generation and far more accurate, reliable results that closely follow user instructions.

OpenAI says the new image generation model performs better, both when users generate images from scratch and when they edit existing photos. It preserves important details across edits while giving users precise control over changes. Users can add, subtract, combine, blend, and transpose elements while editing, and even add stylistic filters or perform conceptual transformations.

Computing

Gemini web app just got Opal where you can build mini apps with no code

Google Labs' Opal is now in Gemini's Gems manager, letting you chain prompts, models, and tools into shareable workflows.

Opal in Gemini

Opal is now inside the Gemini web app, which means you can build reusable AI mini-apps right where you already manage Gems. If you’ve been waiting for an easier way to create custom Gemini tools without writing code, this is Google’s latest experiment to try.

Google Labs describes Opal as a visual, natural-language builder for multi-step workflows, the kind that chain prompts, model calls, and tools into a single mini app. Google also says Opal handles hosting, so once an app’s ready, you can share it without setting up servers or deploying anything yourself.

Computing

Love the Now Brief on Galaxy phones? Google just built something better

CC launches in early access today for consumer account users 18+ in the U.S. and Canada, starting with Google AI Ultra and paid subscribers.

CC AI Agent

Google Labs just introduced CC, an experimental AI productivity agent built with Gemini that sends a Google CC daily briefing to your inbox every morning. The idea is to replace your usual tab-hopping with one “Your Day Ahead” email that spells out what’s on deck and what to do next.

If you like the habit of checking a daily summary like Now Brief on Galaxy phones, CC is Google’s take, but with a different home base. Instead of living as something you check on your phone, Google is putting the briefing in email and letting you reply to it for follow-up help.