(Image by Ryan DeBerardinis on Shutterstock)
In A Nutshell
- ChatGPT surprised researchers by solving Plato’s ancient square puzzle with algebra instead of geometry.
- The AI resisted wrong suggestions, explaining why certain shortcuts wouldn’t work.
- With prompts, it revealed both recall and problem-solving behaviors, resembling student learning.
- Researchers described a “Chat’s Zone of Proximal Development,” where guidance helps the AI go further.
- The study is exploratory, based on one conversation, but raises new questions about AI in education.
CAMBRIDGE, England — When two researchers at the University of Cambridge challenged ChatGPT with a classic puzzle from ancient Greece, they found that the model sometimes behaved less like a search engine and more like a learner. The platform took time testing approaches, reconsidering when prompted, and even resisting wrong suggestions.
The study suggests that artificial intelligence may do more than retrieve memorized answers. In certain settings, it can appear to work through problems in a way that resembles student reasoning.
This finding does not mean ChatGPT “thinks” like a human. The authors emphasize their study is exploratory and based on a single conversation. Still, the results raise questions about how AI might support education if guided well.
How Researchers Gave ChatGPT Plato’s Famous Math Test
Nadav Marco, who’s now at Hebrew University, and Andreas Stylianides revisited Plato’s dialogue “Meno.” In that text, Socrates shows an uneducated slave boy how to double the area of a square through guided questions. Socrates used this exchange to argue that knowledge already exists in the mind and can be drawn out through teaching.
The researchers posed the same 2,400-year-old puzzle to ChatGPT-4. Instead of repeating the well-known geometric solution from Plato’s dialogue, ChatGPT used algebra, which wasn’t invented until centuries later.
What made this notable is that the AI later showed it did know the geometric method. If it were simply recalling from training data, the obvious move would have been to cite Plato’s approach immediately. Instead, it appeared to construct a different solution pathway.
The researchers also tried to mislead ChatGPT into making the same mistake as Plato’s slave, who initially thought doubling the sides would double the area. But ChatGPT refused to accept this wrong answer, carefully explaining why doubling the sides actually creates four times the area, not twice.
When ChatGPT Faced Variations on the Problem
The researchers then changed the puzzle, asking ChatGPT how to double the area of a rectangle. Here, the model showed surprising awareness of the problem’s limitations. Rather than incorrectly applying the square’s diagonal method, ChatGPT explained that “the diagonal does not offer a straightforward new dimension” for rectangles.
This response demonstrated something resembling mathematical reasoning. The AI seemed to understand that techniques working for one shape don’t automatically apply to others—a distinction that often challenges human students learning geometry.
When prompted for more practical solutions, ChatGPT initially focused on algebraic approaches, similar to its first response about squares. But the AI’s explanations of how it was reasoning were inconsistent. At times it described generating answers in real time; at other points it implied the responses were not spontaneous.
The authors noted that these reflections may not accurately represent how the system works. They cautioned against taking the AI’s own words at face value, since language models are not reliable guides to their inner processes.
The “Chat’s ZPD”: Where AI Learns with Guidance
Drawing on psychologist Lev Vygotsky, the researchers described a “Chat’s Zone of Proximal Development.” These are problems ChatGPT could not solve independently but managed when guided with timely prompts.
Vygotsky’s original concept describes the gap between what a child can do alone versus what they can accomplish with help from a teacher or more skilled peer. The researchers found a similar pattern with ChatGPT: certain problems remained out of reach until the right kind of guidance appeared.
Some answers looked like retrieval from training data. Others, especially those involving resistance to incorrect suggestions or adaptation to new prompts, resembled the problem-solving steps of students. While this does not prove that the model truly “understands,” it does suggest that, under the right conditions, AI output can mirror aspects of human learning.
When the researchers asked for an “elegant and exact” solution to the original square problem, ChatGPT provided the geometric construction method. The AI itself admitted that “there [was] indeed a more straightforward and mathematically precise approach … which [it] should have emphasised directly in response to [our] initial inquiry.”
This self-correction suggested the model could reflect on and improve its responses when given appropriate prompts, much like a student who realizes they took a harder path than necessary.
What This Means for Students and Teachers
If AI tools can sometimes behave like learners, they could become useful educational partners. Instead of treating ChatGPT as an answer machine, students and teachers might experiment with prompts that invite collaboration and exploration.
The type of prompt matters significantly. The researchers found that asking for exploration and collaboration yielded different responses than requesting summaries based on reliable sources. Knowing how to phrase prompts could shape whether the model retrieves or attempts to generate new approaches.
Teachers could use this approach to model problem-solving strategies. Rather than asking AI for the final answer, they might guide it through the same thinking process they want students to follow. This could help students see that even sophisticated systems sometimes struggle, reconsider approaches, and need guidance to reach better solutions.
Students, meanwhile, could practice their own reasoning by working alongside AI that shows its thinking process. When ChatGPT resists incorrect suggestions or explains why certain approaches won’t work, students get opportunities to understand mathematical reasoning rather than just memorize procedures.
The authors stress that their study, published in the International Journal of Mathematical Education in Science and Technology, involved only one conversation with one model (ChatGPT-4 in February 2024). Results may differ with newer versions or different systems. Still, the findings invite educators to consider how AI might support exploration, not just provide ready-made answers.
As the researchers put it, users should “pay attention to the type of knowledge they wish to get from an LLM and try to communicate it clearly in their prompts.” Guidance can help AI attempt solutions it would not manage on its own.
Building Mathematical Understanding Through AI Collaboration
The study reveals potential for AI to serve as more than an information source. When ChatGPT resisted incorrect suggestions and explained its reasoning, it demonstrated behaviors that could help students develop critical thinking skills.
Rather than simply accepting or rejecting AI outputs, students could learn to evaluate mathematical reasoning, whether from artificial systems or human sources. This skill becomes increasingly valuable as AI tools become more prevalent in academic and professional settings.
The researchers’ approach also highlights how questioning techniques can reveal different aspects of AI behavior. By varying their prompts and challenging the system’s responses, they uncovered evidence of both retrieval and generation processes within the same conversation.
A Tentative Step, Not a Final Word
The study opens questions about how we understand machine intelligence. If AI can engage in something resembling reasoning, complete with self-correction and resistance to errors, the line between retrieval and generation becomes blurred. This doesn’t mean AI has achieved consciousness, but it suggests these systems might be more sophisticated thinking partners than previously imagined.
For teachers and students, the lesson is not that machines replace human reasoning, but that they could help learners explore strategies, confront mistakes, and practice persistence in problem-solving. The key lies in knowing how to prompt and guide these systems effectively.
Paper Summary
Methodology
Marco and Stylianides conducted an exploratory case study using ChatGPT-4 in February 2024. They recreated Plato’s famous slave-boy experiment from the dialogue “Meno,” which involves solving the geometric problem of doubling the area of a square. The researchers followed four specific guidelines: interact with ChatGPT about the problem similarly to the original Socratic dialogue; try to make the AI commit mathematical errors by suggesting wrong solutions or posing harder problems; provide minimal hints when the AI made mistakes; and when the AI couldn’t solve problems independently, offer hints or demonstrate solutions to see if it could learn and apply strategies to new contexts. After the mathematical conversation, they asked ChatGPT to reflect on its own knowledge processes during the interaction.
Results
ChatGPT initially solved the doubling the square problem using algebra rather than the classical geometric solution, despite being familiar with Plato’s dialogue. When researchers tried to mislead the AI into making the same mistake as Plato’s slave boy, ChatGPT resisted and provided correct explanations. When asked about doubling a rectangle’s area, ChatGPT correctly explained why the diagonal approach used for squares wouldn’t work for rectangles. With appropriate prompting, the AI was able to provide the geometric solution it had initially overlooked and reflect on why it should have offered this approach first. The researchers identified what they called the “Chat’s ZPD” – problems ChatGPT cannot solve alone but can accomplish with appropriate prompting.
Limitations
The study is based on a single conversation with one AI model (ChatGPT-4) from February 2024, severely limiting generalizability. The researchers acknowledge the rapid development of AI capabilities means their observations may not apply to newer versions or different language models. The exploratory nature of the study lacked a predefined analytical framework, relying instead on the researchers’ interpretive judgments about whether responses reflected “recollection” versus “generation” of knowledge. Additionally, when they asked ChatGPT to explain its own processes, the AI provided contradictory and unclear explanations, making it difficult to verify the researchers’ interpretations.
Funding and Disclosures
The authors reported no potential conflicts of interest. No funding sources were mentioned in the paper.
Publication Information
This study was published by Nadav Marco (Hebrew University of Jerusalem and David Yellin Academic College of Education) and Andreas J. Stylianides (University of Cambridge) in the International Journal of Mathematical Education in Science and Technology on September 17, 2025. The paper is titled “An exploration into the nature of ChatGPT’s mathematical knowledge” and is available online with DOI: 10.1080/0020739X.2025.2543817.







