ChatGPT / Open AI

(Image by Ryan DeBerardinis on Shutterstock)

In A Nutshell

  • ChatGPT surprised researchers by solving Plato’s ancient square puzzle with algebra instead of geometry.
  • The AI resisted wrong suggestions, explaining why certain shortcuts wouldn’t work.
  • With prompts, it revealed both recall and problem-solving behaviors, resembling student learning.
  • Researchers described a “Chat’s Zone of Proximal Development,” where guidance helps the AI go further.
  • The study is exploratory, based on one conversation, but raises new questions about AI in education.

CAMBRIDGE, England — When two researchers at the University of Cambridge challenged ChatGPT with a classic puzzle from ancient Greece, they found that the model sometimes behaved less like a search engine and more like a learner. The platform took time testing approaches, reconsidering when prompted, and even resisting wrong suggestions.

The study suggests that artificial intelligence may do more than retrieve memorized answers. In certain settings, it can appear to work through problems in a way that resembles student reasoning.

This finding does not mean ChatGPT “thinks” like a human. The authors emphasize their study is exploratory and based on a single conversation. Still, the results raise questions about how AI might support education if guided well.

How Researchers Gave ChatGPT Plato’s Famous Math Test

Nadav Marco, who’s now at Hebrew University, and Andreas Stylianides revisited Plato’s dialogue “Meno.” In that text, Socrates shows an uneducated slave boy how to double the area of a square through guided questions. Socrates used this exchange to argue that knowledge already exists in the mind and can be drawn out through teaching.

The researchers posed the same 2,400-year-old puzzle to ChatGPT-4. Instead of repeating the well-known geometric solution from Plato’s dialogue, ChatGPT used algebra, which wasn’t invented until centuries later.

What made this notable is that the AI later showed it did know the geometric method. If it were simply recalling from training data, the obvious move would have been to cite Plato’s approach immediately. Instead, it appeared to construct a different solution pathway.

The researchers also tried to mislead ChatGPT into making the same mistake as Plato’s slave, who initially thought doubling the sides would double the area. But ChatGPT refused to accept this wrong answer, carefully explaining why doubling the sides actually creates four times the area, not twice.

ChatGPT solves an ancient geography problem
Researchers tested ChatGPT with this classic geometry problem to see if it could reason like a student. (Image generated by ChatGPT/OpenAI)

When ChatGPT Faced Variations on the Problem

The researchers then changed the puzzle, asking ChatGPT how to double the area of a rectangle. Here, the model showed surprising awareness of the problem’s limitations. Rather than incorrectly applying the square’s diagonal method, ChatGPT explained that “the diagonal does not offer a straightforward new dimension” for rectangles.

This response demonstrated something resembling mathematical reasoning. The AI seemed to understand that techniques working for one shape don’t automatically apply to others—a distinction that often challenges human students learning geometry.

When prompted for more practical solutions, ChatGPT initially focused on algebraic approaches, similar to its first response about squares. But the AI’s explanations of how it was reasoning were inconsistent. At times it described generating answers in real time; at other points it implied the responses were not spontaneous.

The authors noted that these reflections may not accurately represent how the system works. They cautioned against taking the AI’s own words at face value, since language models are not reliable guides to their inner processes.

The “Chat’s ZPD”: Where AI Learns with Guidance

Drawing on psychologist Lev Vygotsky, the researchers described a “Chat’s Zone of Proximal Development.” These are problems ChatGPT could not solve independently but managed when guided with timely prompts.

Vygotsky’s original concept describes the gap between what a child can do alone versus what they can accomplish with help from a teacher or more skilled peer. The researchers found a similar pattern with ChatGPT: certain problems remained out of reach until the right kind of guidance appeared.

Some answers looked like retrieval from training data. Others, especially those involving resistance to incorrect suggestions or adaptation to new prompts, resembled the problem-solving steps of students. While this does not prove that the model truly “understands,” it does suggest that, under the right conditions, AI output can mirror aspects of human learning.

When the researchers asked for an “elegant and exact” solution to the original square problem, ChatGPT provided the geometric construction method. The AI itself admitted that “there [was] indeed a more straightforward and mathematically precise approach … which [it] should have emphasised directly in response to [our] initial inquiry.”

This self-correction suggested the model could reflect on and improve its responses when given appropriate prompts, much like a student who realizes they took a harder path than necessary.

What This Means for Students and Teachers

If AI tools can sometimes behave like learners, they could become useful educational partners. Instead of treating ChatGPT as an answer machine, students and teachers might experiment with prompts that invite collaboration and exploration.

The type of prompt matters significantly. The researchers found that asking for exploration and collaboration yielded different responses than requesting summaries based on reliable sources. Knowing how to phrase prompts could shape whether the model retrieves or attempts to generate new approaches.

Teachers could use this approach to model problem-solving strategies. Rather than asking AI for the final answer, they might guide it through the same thinking process they want students to follow. This could help students see that even sophisticated systems sometimes struggle, reconsider approaches, and need guidance to reach better solutions.

Students, meanwhile, could practice their own reasoning by working alongside AI that shows its thinking process. When ChatGPT resists incorrect suggestions or explains why certain approaches won’t work, students get opportunities to understand mathematical reasoning rather than just memorize procedures.

The authors stress that their study, published in the International Journal of Mathematical Education in Science and Technology, involved only one conversation with one model (ChatGPT-4 in February 2024). Results may differ with newer versions or different systems. Still, the findings invite educators to consider how AI might support exploration, not just provide ready-made answers.

As the researchers put it, users should “pay attention to the type of knowledge they wish to get from an LLM and try to communicate it clearly in their prompts.” Guidance can help AI attempt solutions it would not manage on its own.

Building Mathematical Understanding Through AI Collaboration

The study reveals potential for AI to serve as more than an information source. When ChatGPT resisted incorrect suggestions and explained its reasoning, it demonstrated behaviors that could help students develop critical thinking skills.

Rather than simply accepting or rejecting AI outputs, students could learn to evaluate mathematical reasoning, whether from artificial systems or human sources. This skill becomes increasingly valuable as AI tools become more prevalent in academic and professional settings.

The researchers’ approach also highlights how questioning techniques can reveal different aspects of AI behavior. By varying their prompts and challenging the system’s responses, they uncovered evidence of both retrieval and generation processes within the same conversation.

A Tentative Step, Not a Final Word

The study opens questions about how we understand machine intelligence. If AI can engage in something resembling reasoning, complete with self-correction and resistance to errors, the line between retrieval and generation becomes blurred. This doesn’t mean AI has achieved consciousness, but it suggests these systems might be more sophisticated thinking partners than previously imagined.

For teachers and students, the lesson is not that machines replace human reasoning, but that they could help learners explore strategies, confront mistakes, and practice persistence in problem-solving. The key lies in knowing how to prompt and guide these systems effectively.

Paper Summary

Methodology

Marco and Stylianides conducted an exploratory case study using ChatGPT-4 in February 2024. They recreated Plato’s famous slave-boy experiment from the dialogue “Meno,” which involves solving the geometric problem of doubling the area of a square. The researchers followed four specific guidelines: interact with ChatGPT about the problem similarly to the original Socratic dialogue; try to make the AI commit mathematical errors by suggesting wrong solutions or posing harder problems; provide minimal hints when the AI made mistakes; and when the AI couldn’t solve problems independently, offer hints or demonstrate solutions to see if it could learn and apply strategies to new contexts. After the mathematical conversation, they asked ChatGPT to reflect on its own knowledge processes during the interaction.

Results

ChatGPT initially solved the doubling the square problem using algebra rather than the classical geometric solution, despite being familiar with Plato’s dialogue. When researchers tried to mislead the AI into making the same mistake as Plato’s slave boy, ChatGPT resisted and provided correct explanations. When asked about doubling a rectangle’s area, ChatGPT correctly explained why the diagonal approach used for squares wouldn’t work for rectangles. With appropriate prompting, the AI was able to provide the geometric solution it had initially overlooked and reflect on why it should have offered this approach first. The researchers identified what they called the “Chat’s ZPD” – problems ChatGPT cannot solve alone but can accomplish with appropriate prompting.

Limitations

The study is based on a single conversation with one AI model (ChatGPT-4) from February 2024, severely limiting generalizability. The researchers acknowledge the rapid development of AI capabilities means their observations may not apply to newer versions or different language models. The exploratory nature of the study lacked a predefined analytical framework, relying instead on the researchers’ interpretive judgments about whether responses reflected “recollection” versus “generation” of knowledge. Additionally, when they asked ChatGPT to explain its own processes, the AI provided contradictory and unclear explanations, making it difficult to verify the researchers’ interpretations.

Funding and Disclosures

The authors reported no potential conflicts of interest. No funding sources were mentioned in the paper.

Publication Information

This study was published by Nadav Marco (Hebrew University of Jerusalem and David Yellin Academic College of Education) and Andreas J. Stylianides (University of Cambridge) in the International Journal of Mathematical Education in Science and Technology on September 17, 2025. The paper is titled “An exploration into the nature of ChatGPT’s mathematical knowledge” and is available online with DOI: 10.1080/0020739X.2025.2543817.

About StudyFinds Analysis

Called "brilliant," "fantastic," and "spot on" by scientists and researchers, our acclaimed StudyFinds Analysis articles are created using an exclusive AI-based model with complete human oversight by the StudyFinds Editorial Team. For these articles, we use an unparalleled LLM process across multiple systems to analyze entire journal papers, extract data, and create accurate, accessible content. Our writing and editing team proofreads and polishes each and every article before publishing. With recent studies showing that artificial intelligence can interpret scientific research as well as (or even better) than field experts and specialists, StudyFinds was among the earliest to adopt and test this technology before approving its widespread use on our site. We stand by our practice and continuously update our processes to ensure the very highest level of accuracy. Read our AI Policy (link below) for more information.

Our Editorial Process

StudyFinds publishes digestible, agenda-free, transparent research summaries that are intended to inform the reader as well as stir civil, educated debate. We do not agree nor disagree with any of the studies we post, rather, we encourage our readers to debate the veracity of the findings themselves. All articles published on StudyFinds are vetted by our editors prior to publication and include links back to the source or corresponding journal article, if possible.

Our Editorial Team

Steve Fink

Editor-in-Chief

John Anderer

Associate Editor

Leave a Reply