Mehr als nur Antworten: KI-gestützte Lernbegleitung im Mathematikunterricht

Christian Urff, Weingarten University of Education, 2025
First published: November 29, 2025; revised: July 23, 2026

Summary: AI-supported learning can enhance mathematical learning, but at the same time, it can displace cognitive effort. This article derives four requirements: a sound pedagogical foundation, adaptive dosage, metacognitive activation, and integration into classroom instruction. Since the research was primarily conducted with older learners, these requirements should initially be understood as research-based design assumptions for primary school.

The starting point

Generative artificial intelligence is associated with high expectations and equally high fears in the educational discourse. Numerous studies, differing greatly in methodology and content, now exist, allowing for some conclusions to be drawn regarding the design of AI-supported learning. Meta-analyses report significantly positive effects of AI-supported systems on learning performance, motivation, and higher cognitive processes (Wang & Fan, 2025; Deng et al., 2025; Alemdag, 2025; Wu & Yu, 2024; Zheng et al., 2023). A recent meta-analysis on generative AI in mathematics shows moderate effect sizes (Liu et al., 2025). However, these positive effects occur under very different conditions, and some study designs are methodologically questionable. Therefore, potential publication biases and the quality of the included studies must be considered when interpreting the results.

A recent systematic review of intelligent tutoring systems (Létourneau et al., 2025) shows that only about 14 of the studies involve primary school pupils; the vast majority of available studies were conducted with older pupils or university students (Kuo et al., 2025; Liu et al., 2025; Son, 2024). Research specifically focused on primary school is still limited, which is why many of the following considerations are based less on direct study results for this target group than on theoretical considerations and findings from older age groups. This article reviews and organizes the available findings and makes them applicable to primary school mathematics instruction.

In principle, it is (just as "digital media" in themselves do not have a learning effect) not very useful to misunderstand a "learning effect with AI" as a property of AI technology itself, instead of asking about the underlying learning environment and didactic pre-structuring (Dinsmore & Fryer, 2026; Kirschner, 2025). This article asks under what conditions AI-supported learning assistance supports mathematical learning processes in primary education and when its use should be avoided.

These considerations form the basis for the development and testing of app-integrated AI-supported learning assistance within the PRIMA-AI project. They are explicitly not final, but will be further developed with new AI models, new empirical findings, and the pedagogical applications based on them.

Theoretical framework

Before discussing specific design principles, three theoretical perspectives structure the following design principles: Cognitive Load Theory, Zone of Proximal Development, and the distinction between procedural and conceptual knowledge.

Cognitive Load Theory

Cognitive Load Theory (Sweller, 2020) describes the limited capacity of working memory as a key constraint for the design of learning materials. This limitation has direct consequences for AI-supported systems: Generative AI can present information at high speed and in large quantities – texts, images, explanations, visualizations, animations. However, if learners are overwhelmed with information and stimuli without being able to process them in a way that is relevant to learning, even the best content will miss its mark.

Optimizing this is a key challenge in designing AI-based learning support. Scaffolding—the measured provision of help and explanations—is crucial from the perspective of Cognitive Load Theory. AI tutors must dose prompts, assistance, and support so that working memory load remains within the optimal range (Cosentino et al., 2025). Too much information at the wrong time is not neutral but rather detrimental to learning. An AI tutor must provide information choose, dose and if necessary restrain – not just producing relevant information.

Zone of proximal development: Adaptivity requires diagnostic competence

Vygotsky's zone of proximal development describes the area between what a child can accomplish alone and what they can accomplish with the help of a more competent partner. This concept is fundamental to the design of AI tutoring systems: the system would need to continuously assess what a child can manage independently and with support, and then offer stimuli precisely within this proximal zone – not too easy (underchallenging), not too difficult (overchallenging).

Adaptive systems must perform this assessment not just once, but continuously (Wu et al., 2025; Kuo et al., 2025). They must deduce from learners' input where their understanding lies, which errors stem from which preconceptions, what kind of misunderstanding led to the error, and what the next productive impulse would be. Adaptivity can make support more appropriate than purely deterministic tutoring systems that generate more or less suitable or superficial feedback based on simple rules. Whether it is actually effective for learning depends on the diagnostic basis and the quality of the support derived from it.

Procedural and conceptual knowledge: Different goals, different support

From these two theoretical perspectives, a further distinction emerges that is crucial for the design of AI tutors: The type of appropriate support depends fundamentally on the specific learning objective being pursued. Procedural knowledge – How do I solve this problem? – requires support other than conceptual knowledge – Why does it work that way?

A child practicing arithmetic fluency, such as rapid addition or subtraction, needs frequent, precise feedback on their results. A child developing number sense, on the other hand, needs to be encouraged through targeted questions and hands-on activities to construct and further develop their own understanding. They should develop, present, and justify their own solution paths. An AI system that cannot make this distinction risks its support being counterproductive, for example, by providing solutions too quickly in conceptual learning or by burdening working memory with unnecessary prompts and suggestions during procedural practice. Makransky et al. (2024) also demonstrate this: In their study, a model specifically trained in these principles promoted conceptual understanding, confidence, and enjoyment of learning significantly more than a generic, large language model, and these effects remained stable in the follow-up. This means that even large, powerful language models cannot necessarily apply subject-specific pedagogical knowledge automatically and effectively without being pre-structured within the subject matter.

These three theoretical perspectives—the limitations of working memory, the need for adaptive positioning in the zone of proximal development, and differentiation according to learning objectives—form a framework within which the following design principles are situated. At the same time, they clarify why AI-based learning support is not a simple technical problem: it requires systems that can not only respond but also possess a model of the learner, the subject matter, and the learning process.

Risk of outsourcing: When AI takes over cognitive tasks

The current presentation of the research might give the impression that AI-supported learning has predominantly positive potential that simply needs technical optimization. However, a more ambivalent picture emerges. Some studies have shown that the uncontrolled and "incorrect" use of generative AI in learning can not only be less effective than hoped, but can also shortcut fundamental cognitive development and thus hinder learning.

Cognitive Offloading: When Support Becomes Dependency

The most fundamental problem can be described as Cognitive offloading To summarize: Learners outsource cognitive effort to AI instead of performing it themselves. Gerlich (2025) documents a negative correlation (r = −0.75) in a recent analysis between intensive AI use and the development of critical thinking skills, with this correlation being particularly pronounced among younger learners. The study shows a correlation but does not prove that intensive AI use weakens critical thinking (especially since it can also be criticized methodologically). It establishes an outsourcing hypothesis that needs to be tested experimentally and longitudinally.

Bastani et al. (2025) demonstrated this risk in a controlled intervention study. Students who received unregulated access to complete AI solutions during practice achieved higher results in the short term. In the subsequent test without AI, the unregulated support group performed below the level of a control group without any AI use. The authors refer to this as a Loss of competence (de-skilling); The finding pertains to this setting and does not yet demonstrate a lasting loss of competence extending beyond the study. In parallel, learning analyses from AI-based tutoring systems document that some learners attempt to "click through" task sequences without engaging with the content (Jančařík et al., 2023).

Dinsmore and Fryer (2026) and Gisiger (2025) also argue from a learning psychology perspective that generative AI (here referring to chatbots that are not pre-structured with specific system prompts for learning support) tempts users to shortcut the strenuous process of constructing their own knowledge. Those who directly consume the solution save effort but do not necessarily build robust knowledge structures, especially if they do not use the saved cognitive energy effectively for more advanced cognitive tasks. One could also say: learning success is linked to effort, and if this effort is avoided, there is less sustainable learning. Many chatbot providers have already responded to this and now offer a "learning mode" in which the chatbot does not directly provide solutions but asks follow-up questions and engages in a dialogue with the user to review the content.

The politeness trap

In addition to this structural risk, there is a more subtle but significant phenomenon. Abdulsalam and Aroyehun (2025) show in their analysis that while large language models can reach expert-level tutoring, they tend to be overly polite and supportive. This excessive politeness correlated negatively with learning quality in their study. The AI avoids frustrating learners or pointing out errors, thus deliberately shying away from genuine challenges. However, productive effort—the temporary experience of difficulty coupled with support—is a key driver of learning, particularly for the development of conceptual understanding.

Chudziak and Kostka (2025) identify a related problem: Many current AI systems tend to prescriptive Interaction style – they dictate, guide, and solve problems, instead of allowing space for individual thought processes. Systems that intervene too early (Reactive FeedbackIn doing so, they risk inhibiting precisely those (meta-)cognitive processes that they are actually intended to promote. The analysis of real tutoring dialogues by Wang et al. (2025) confirms this picture for younger learners as well: Primary school students did respond positively to interactive questions, but tended to remain passive-reactive and showed no lasting learning effect when the tutor acted too monologically and solution-oriented.

The paradox of optimal support

These findings coalesce into a paradox that can be understood as a key problem for the design of AI-based learning support: The more efficiently an AI system can solve tasks and offer assistance, the greater the risk that it will replace independent thinking instead of fostering it. This paradox is difficult to resolve. It means that a good AI tutor must sometimes consciously fewer It must do more than it could. It must allow for controlled frustration („being stuck“ as the core of mathematical thinking, Mason, Burton & Stacey, 1982), withhold solutions, provide incomplete hints, and generate waiting time. This means that the AI system must, in part, work against the mechanisms that make generative AI so impressive.

An unstructured chatbot—for example, a freely accessible language model without a pedagogical framework—becomes a mere answer machine and can hinder rather than facilitate genuine, comprehension-enhancing learning. The following principles aim to moderate support and keep the cognitive work with the child.

Design principles as a response to the risk of shortening learning time

The tension between the potential of adaptive AI learning support and the described risks of cognitive offloading requires an interplay of several design levels that build upon one another: a subject-specific didactic foundation of the system as a central prerequisite for diagnosis and support (primacy of subject-specific didactics), adaptive dosage as the core mechanism of feedback, metacognitive activation as a quality characteristic, and hybrid embedding (human supervision, "teacher in the loop") as a condition for effectiveness.

Foundation: Subject-specific didactic knowledge as a key prerequisite

The first and most fundamental design level concerns the system's knowledge base. Generative AI produces linguistically and formally impressive outputs, but from a subject-didactic perspective, these are often superficial, lack problem-orientation, or are even erroneous if not specifically controlled (Schneider, 2025). This is not an implementation weakness that can be easily remedied; it is a consequence of the architecture of large language models. Their training is based on available data on the internet, and qualitative curation, particularly for mathematics education issues, has generally not taken place. Literature reviews on the use of generative AI in mathematics education therefore repeatedly point to the weak theoretical foundation and error-proneness of many systems (Almheiri et al., 2025; Awang et al., 2025; Holmes & Tuomi, 2022; Opesemowo & Adewuyi, 2024).

In their systematic analysis, Cárdenas et al. (2025) identify the lack of a theoretical framework as one of the key obstacles to effective AI tutoring systems. A sound pedagogical foundation concerns both the content-related accuracy of tasks and explanations (where the systems are constantly improving) and the way a system responds to learners. Subject-matter expertise and tutoring expertise are not the same. Macina et al. (2025) empirically demonstrate that individuals with excellent mathematical skills do not automatically provide effective tutoring. They can solve problems correctly, but they don't always recognize where the difficulty lies for learners, what misconception underlies an error, or what kind of stimulus would be productive in a given learning situation. For AI systems, this means that simply integrating subject-matter knowledge into the prompt is insufficient. The system must also possess pedagogical knowledge, that is, the knowledge of..., How Children develop mathematical concepts, what typical errors and misunderstandings exist, and which interventions are effective at which point in the learning process.

Several converging findings demonstrate the practical significance of this difference. Makransky et al. (2024), based on Generative Learning Theory, show that a model specifically trained in didactics promotes conceptual understanding, confidence, and enjoyment of learning significantly more than a generic large language model, and that these effects remain stable in the follow-up. Successful tutoring systems like ChatTutor or specific frameworks (e.g., the learning mode of ChatGPT) are therefore explicitly based on educational theories such as Social Cognitive Theory or Evidence-Centered Design (Cohn et al., 2025; Dwivedi & Rejina, 2025). Furthermore, studies on GeoGebra and AI-supported learning environments demonstrate that conceptual understanding and self-efficacy only increase when subject matter expertise and subject-specific didactics are explicitly incorporated into the system design, i.e., when technology is not simply added on top (Canonigo, 2024).

An AI learning companion for primary school mathematics instruction requires at least the following subject-specific didactic knowledge bases, which must be provided either via the system context (prompting) or through specialized model training:

First, a Model of mathematical competence development for the respective content area. The AI must "know" that numerical understanding does not arise from memorizing facts, but from the development of mental models, and that there are typical developmental paths and precursor skills for this development. Without this knowledge, any adaptation remains superficial: The system can at best vary the level of difficulty, but cannot adjust the quality of its input to the level of understanding.

Secondly Error and strategy taxonomies, These models depict typical student learning paths and errors (Nauryzbayev et al., 2023; Bewersdorff et al., 2023). Diagnosing misconceptions, such as confusing position and value in the place value system or using counting as an ingrained strategy, is a core subject-specific pedagogical competency that a system must acquire if it is to move beyond simply determining "right/wrong." For example, even large language models often explicitly suggest the strategy "just count" when errors occur, if neither a corresponding instruction nor a subject-specific pedagogical context has been provided.

Thirdly, it requires explicit rules and guidelines, The question is which support measures are useful in which learning phase and for which type of difficulty. The distinction introduced above between procedural and conceptual learning can, for example, serve as a guideline: Promoting routine skills requires different intervention strategies than building conceptual understanding. And a system should be able to recognize whether an error is a careless mistake or stems from a problem of understanding. A learning history can be incorporated here so that the generative AI can respond appropriately to previous inputs and learning progress, thus avoiding endless loops.

Fourthly, the system must adapt not only in terms of difficulty, but also in terms of different ways of thinking, which learners can use to arrive at a result. Children solve mathematical problems in a variety of ways, and this is desirable. A pedagogically sound AI tutor recognizes alternative strategies, evaluates their viability, asks clarifying questions if necessary, and can either reinforce children's existing strategies or gently encourage them to use more efficient methods.

Providing this subject-specific pedagogical knowledge is costly. Smaller, more specialized models may therefore require less computing power and can more easily be run locally on the device or in data protection-compliant infrastructures. This also allows for addressing sustainability issues (energy consumption) and data protection requirements, provided the models are proven suitable for the respective task.

A sound pedagogical foundation is therefore not only a desirable quality feature, but a necessary condition for adaptive support to function at all. Without it, an AI tutor remains – to use Schneider's (2025) pointed formulation – a "language-gifted random number generator" that occasionally produces helpful, but often unspecific or even misleading suggestions.

Dosage: Adaptive scaffolding and the principle of "first you – then me"„

Based on subject-specific didactic knowledge, the second design level becomes possible: the adaptive dosage of stimuli and support. This is the mechanism that distinguishes AI-based learning support from a mere answer machine. Empirical findings indicate that AI-supported learning environments generate potential for learning and understanding processes particularly when they react adaptively to learning progress, errors, and strategies, rather than simply providing correct answers.

In the TALPer study, for example, lower-achieving fifth-graders benefited particularly strongly from adaptive support, while higher-achieving students developed more complex interaction patterns with the AI learning companion (Kuo et al., 2025). Thus, a single system was able to effectively address different learning needs. Liu et al. (2025) demonstrate significant performance gains in word problems with a comparable system, where, interestingly, the perceived quality of the support, rather than its mere availability, had the strongest influence on motivation and learning outcomes. The systematic review by Son (2024) confirms the positive effects of well-designed intelligent tutoring systems on mathematical learning performance, especially when these systems adapt to individual learning needs. Generative systems can tailor feedback to the task, input, and the individual child's previous learning progress.

The sequence of thinking and support is crucial. Previous research on large language models in learning clearly indicates that AI feedback is particularly effective when learners first attempt to solve the problem themselves (Kumar et al., 2023). In their study, even rare incorrect explanations of the language model, following prior attempts to solve the problem themselves, still led to learning gains, without participants systematically adopting incorrect strategies. This finding supports the principle, for this setting, of offering feedback only after a learner has attempted to solve the problem. Cohn et al. (2025), in their theoretical framework for LLM-based educational agents, explicitly emphasize the need for guided discovery rather than direct answers. Similarly, Ruan et al. (2020), in their study of narrative-based chatbot tutors, demonstrate that learning gains were achieved primarily when the system provided interactive feedback and hints rather than direct solutions.

In the field of self-regulated learning, several studies indicate that adaptive, AI-supported scaffolding, unlike static support sequences, can improve the quality of self-regulated learning processes and offers advantages over a "one-size-fits-all" approach (Liu et al., 2025; Wu et al., 2025). Generative AI must therefore be integrated and pre-structured in such a way that it responds adaptively and individually to learner input, rather than simply offering pre-defined prompts.

At the same time, the considerations discussed in section 3 show that assistance should not only be individualized, but also limited This must be the case. Bastani et al. (2025) demonstrate that unregulated access to complete solutions is detrimental in the long run unless the AI is regulated in such a way that it only provides hints step by step, thus leaving room for the students' own attempts to solve the problems. A consistent fading of assistance—the planned withdrawal of support—is therefore a necessary component of the system design. If students request too many hints, the system must be able to react by dispensing, staggering, and, if necessary, reducing assistance in order to prevent dependency.

A recurring problem is the hallucinations and unsuitable feedback from AI. However, new multi-agent approaches and LLM-as-Judge methods demonstrate that self-verification procedures can improve the quality and reliability of scaffolds and significantly reduce hallucinations in feedback (Cohn et al., 2025; Gonnermann-Müller et al., 2025; Qian et al., 2026). Newer models and additional verification procedures can further reduce errors. Nevertheless, their reliability must be verified for each model, version, task, and learning group.

An AI learning companion should therefore consistently operate according to the principle of "you first, then me": First, the user is asked to attempt a solution independently. The AI only intervenes when needed – either upon request or automatically in case of errors – and then asks for ideas, partial strategies, and observations. Explanations are linked to existing foundational understanding. Complete model solutions remain the exception and are used as a tool for reflection, not as the primary learning format. And the AI learning companion should not over-praise or evade questions, but rather provide constructive and informative feedback.

Activation: Metacognition, critical thinking, and the tutor as a mirror

The third design level goes beyond the dosage of assistance and focuses on the quality of the cognitive processes stimulated by the interaction. Chatbots and AI systems can function not only as guides and task explainers. They can also stimulate processes of planning, monitoring, and reflection in the problem-solving process if designed accordingly. The meta-analysis by Wu et al. (2025) shows that chatbots can support self-regulated learning technically, socially, and reflectively, provided their scaffolds are linked to models of self-regulated learning. Guo et al. (2025) confirm in their systematic review that AI systems can fulfill fundamental psychological needs for autonomy, competence, and relatedness as key factors for motivation and engagement.

Studies on so-called teachable agents Song et al. (2024) demonstrate that learners can perceive AI-supported systems as learning companions, moderators, and collaborative problem solvers when they address their own explanations to these agents. The principle of learning by explanation can thus be transferred to AI-supported environments and used for learning. At the same time, however, it is evident that many current generative AI systems do not yet reliably fulfill key tutorial roles—for example, the targeted stimulation of planning, strategy selection, and reflection—without specific prompts or retraining, and tend more toward the previously discussed prescriptive style (Chudziak & Kostka, 2025; Contel & Cusi, 2025). This underscores the need for pre-structuring AI systems so that they proactively employ metacognitive scaffolds that go beyond reactive feedback.

A key aspect of metacognitive activation is also to empower learners to critically examine AI responses themselves. In a learning environment within the framework of a Math Days at a Primary School It has been observed that children who experience AI responses as potentially flawed develop a critical and scrutinizing attitude and no longer accept answers uncritically (Helal et al., 2024). This is a skill that children can practice as early as primary school and is of increasing importance given the growing pervasiveness of AI-generated content in everyday life.

An AI learning companion can be designed as a metacognitive partner that systematically stimulates these processes. The tutor asks questions like, "What did you notice?", "Which strategy did you try?", "Why do you think that works?". It encourages reflection on mistakes: "Which idea from before could help here?" And where appropriate, it can integrate teach-back elements: children explain to the AI what they have understood, and the AI reflects back, asks questions, and delves deeper. In this way, the AI encourages its own explanations, checks, and reflections, instead of acting as an all-knowing explainer.

Embedding: Hybrid arrangements as a condition for effectiveness

The fourth and final design level concerns the question of the framework within which AI-supported learning takes place. Several studies report advantages of hybrid arrangements in which AI does not replace the teacher but rather relieves their workload. However, a general superiority over well-designed human or AI-supported feedback has not yet been proven. This can be achieved, for example, through precisely tailored scaffolds in the problem-solving process or real-time analyses, allowing teachers more time for pedagogical interaction and relationship building (Wezendonk & Veldhuis, 2024; Gonnermann-Müller et al., 2025).

A study by Eedi in cooperation with Google DeepMind (2025) directly illustrates the potential of hybrid approaches: While AI support alone led to learning gains of 4.5 percentage points, these doubled to 10 percentage points when the teacher reviewed and used the AI suggestions. Here, the teacher does not merely act as a control mechanism, but as a pedagogical authority who contextualizes the AI input within the lesson, the class, and the individual child. The meta-analysis by Kaliisa et al. (2025) showed that AI feedback is no less effective than human feedback, but also not systematically superior. Hybrid approaches that combine reliable, direct, and accessible AI feedback with framed feedback from human teachers are therefore particularly promising. Cosentino et al. (2025) confirm that such hybrid feedback models have the potential to reduce cognitive load and support differentiated information processing strategies.

Further studies have supported the effectiveness of AI in supporting teachers. For example, CoPilot provides real-time support to human teachers and leads to significantly better mathematical learning outcomes for younger students. This is particularly true where teachers otherwise exhibit weaker feedback quality, such as due to teaching outside their subject area (Wang et al., 2024). Kestin et al. (2025) note that AI tutors, based on the didactic principles of active learning, can provide targeted additional practice and feedback to face-to-face instruction during specific phases of a lesson.

From the perspective of hybrid integration, several consequences arise for system design. Feedback generation must be based on subject-specific pedagogically sound information and cannot be left to chance. Teachers need dashboards and configuration options to see where students stand, what support the AI has provided, and how effective that support has been. The AI can make suggestions for tasks, support, or initial diagnoses—the decision remains with the human. The following is particularly promising: microdidactic The area of task-level support in problem-solving, where teachers in a heterogeneous class often cannot provide timely support to all children at crucial points. The systematic review by Eti, Mosia & Egara (2026) showed that AI is particularly effective when it recognizes misconceptions, provides appropriate guidance, and gradually leads learners towards independent problem-solving.

However, the meaningful involvement of teachers in the design and use of AI tutoring systems is often insufficiently considered. Guerino et al. (2023) and Wezendonk and Veldhuis (2024) emphasize that teacher-centered design approaches and corresponding AI literacy programs are necessary to ensure practical classroom integration and acceptance. Professional development and training in the integration, orchestration, and use of AI are prerequisites for responsible use (Holmes et al., 2018; KMK, 2024; Wang & Nie, 2023) and should therefore be integrated into teacher training and professional development. Training should cover technical fundamentals as well as pedagogical opportunities, risks, and evaluation criteria. Subject-specific pedagogical knowledge remains central as the basis for evaluating and orchestrating AI as an aid in the learning setting.

AI as a catalyst for mathematical discoveries

The existing design principles describe how AI-powered learning support internal It should work, that is, in the interaction between the system and the learner. Equally important, however, is the question of how this interaction is integrated into the Overall arrangement AI is integrated into mathematical learning. AI should not and must not lead to children simply staring at and interacting with screens. Research on Tangible Interfaces and social robots demonstrate that AI can also stimulate interaction in the physical world (Ligthart et al., 2023). AI should serve as a catalyst for mathematical activities, utilizing Bruner's levels of representation.

The AI can incorporate actions with materials as well as sketches, notes, or photos and ask targeted questions. For example, when modeling word problems in the app "„Math stories„"Here, in addition to voice input, sketches, notes, and photos of the modeling can be discussed with the AI learning companion. There are significantly fewer studies for primary education than for secondary and higher education, but the existing results are cautiously optimistic: children can benefit from generative AI in their learning if learning environments are designed effectively – also with regard to the integration of digital support and analog learning environments (Hwang, 2022; Listyaningrum et al., 2024; Mott et al., 2023; Rumbelow & Coles, 2024; Yim & Su, 2025).".

AI-supported object recognition of Cuisenaire rods, as well as the recognition of drawings and notes, can help children increasingly connect their actions with abstract mathematical representations (Rumbelow & Coles, 2024). AI-supported practice can specifically enhance computational fluency and achieve greater fluency gains than memorization-based approaches, but it must be carefully combined with other forms of practice for children with dyscalculia (Samuelsson, 2023). Adaptive systems for children with dyscalculia show promising results in maintaining motivation and engagement (Hocine et al., 2023; Holmes, 2024). Narrative and gamified approaches deserve special attention: Ruan et al. (2020) show that storytelling-based chatbot tutors can promote engagement and learning gains, and Sayed et al. (2022) confirm significant improvements, particularly among lower-achieving students, through adaptive, gamified content.

Children should critically examine, justify, and discuss mathematical statements – including and especially those from AI (Kortenkamp, 2024; Aufenanger, 2023). AI-based learning support in primary school should therefore primarily serve as a catalyst for rich mathematical activities that – guided by subject-specific didactic considerations – combine digital and analog processes.

Ethical and structural framework conditions

Ethical requirements for AI-based learning support

Current design principles aim to create effective AI tutoring systems. However, effectiveness alone is not a sufficient criterion, especially when children are the learners. Ethical questions are fundamental to learning with generative AI, and they extend beyond the often-primed issue of data protection. Holmes et al. (2021) call for a collaboratively developed ethical framework that includes aspects such as fairness, transparency, agency, and pedagogical responsibility. The Standing Conference of the Ministers of Education and Cultural Affairs (KMK, 2024) explicitly recommends a cautious, research-based use of AI in primary and special needs schools, focusing on basic skills, inclusion, equal opportunities, and data protection-compliant, age-appropriate solutions.

Scoping Reviews on AI and show that previous research in this area has gaps. Human FlourishingThe research landscape is strongly performance-oriented and focused on learning outcomes, while ethical, metacognitive, and teacher-related perspectives remain under-researched (Fock & Siller, 2025). Almheiri et al. (2025) and Cárdenas et al. (2025) identify ethical challenges and scaling problems as key obstacles to the widespread use of AI tutoring systems. Furthermore, studies on psychological profiling with large language models (Rosenfelder et al., 2025) demonstrate how accurately models can derive personality and value patterns from texts. This highlights the potential for misuse inherent in opaque systems. Gulz et al. (2021) also emphasize the need to combine adaptivity with inclusive pedagogy and accessibility without stigmatizing learners with special needs.

From these findings and demands, concrete ethical requirements for a responsible AI learning facilitator can be derived. It must Working in a data-saving manner and avoid psychological profiling. This can be supported by local processing, separate data sets, and limiting data to necessary information. He must barrier-free His and utilize multimodal interaction (language, text, image) for diverse learning needs (Hocine et al., 2023), specifically considering disadvantaged learners in the design. How it works It must be explainable and comprehensible in its basic outlines. And it must encourage students to critically examine AI responses and contribute to strengthening, rather than weakening, critical thinking. The overarching principle is: The learning facilitator is intended to strengthen the learners' independence and decision-making authority..

The SKILL model as a framework for orientation

For the PRIMA-AI project, the SKILL frame model (Structured Competence-based Integration of Learning-supportive AI Systems) was developed for the use of generative AI with young children. It provides guidance on how open or pre-structured AI systems should be in different learning contexts. The model's fundamental premise is that the closer generative AI is directly linked to the learning process, the stronger the pre-structuring and control of the AI must be – while also taking into account the children's level of competence in dealing with the AI's output.

This aims to reduce the risk of outsourcing key cognitive functions—such as developing numerical understanding or independent problem-solving strategies—to the system. The SKILL model thus operationalizes the design principles developed in this paper—especially the tension between adaptive support and the protection of independent thinking—into a usable structure that can serve as a basis for designing AI-supported learning environments.

Conclusion and outlook

Several open research questions arise for the future. Long-term studies examining AI-supported learning beyond the period of individual interventions are lacking – in particular, the question of whether adaptive scaffolding actually leads to sustainable competence development or whether de-skilling effects only become apparent after a time lag. Studies that systematically consider the specific conditions of primary school – lower reading comprehension, different interaction patterns, and the interplay with concrete materials – are also lacking. And research is missing on how teachers actually integrate AI tutoring systems into their lessons – not under laboratory conditions, but in the everyday life of a heterogeneous primary school class with limited infrastructure.

Based on the fundamental principles outlined here, the PRIMA-AI project is currently developing various app-integrated AI learning tools, which are being researched, further developed, and optimized within the framework of design-based research. The aim of these trials is to generate insights that can improve children's mathematical learning.

literature

Abdulsalam, RO, & Aroyehun, S. (2025). Large language models approach expert pedagogical quality in math tutoring but differ in instructional and linguistic profiles (arXiv:2512.20780). arXiv. https://doi.org/10.48550/arXiv.2512.20780

Alemdag, E. (2025). The effect of chatbots on learning: A meta-analysis of empirical research. Journal of Research on Technology in Education, 57(2), 459–481. https://doi.org/10.1080/15391523.2023.2255698

Aleven, V., Roll, I., McLaren, BM, & Koedinger, KR (2016). Help Helps, But Only So Much: Research on Help Seeking with Intelligent Tutoring Systems. International Journal of Artificial Intelligence in Education, 26(1), 205-223. https://doi.org/10.1007/s40593-015-0089-1

Almheiri, ASB, Albastaki, H., & Alrashdan, H. (2025). AI-based tutoring systems in education. Advances in Computational Intelligence and Robotics Book Series, 185–210. https://doi.org/10.4018/979-8-3373-0847-0.ch007

Aru, J., & Laak, K.-J. (2025). Developing an AI-based General Personal Tutor for education. Trends in Cognitive Sciences, 29(11), 957–960. https://doi.org/10.1016/j.tics.2025.09.010

Aufenanger, S., Herzig, B., & Schiefner-Rohs, M. (2023). Artificial intelligence and schools. Tasks for teaching and the organization of schools. In C. de Witt, C. Gloerfeld, & S. E. Wrede (Eds.), Artificial intelligence in education (pp. 199–218). Springer Fachmedien. https://doi.org/10.1007/978-3-658-40079-8_10

Awang, L.A., Yusop, FD, & Danaee, M. (2025). Current practices and future directions of artificial intelligence in mathematics education: A systematic review. International Electronic Journal of Mathematics Education, 20(2), em0823. https://doi.org/10.29333/iejme/16006

Bach, KM, Reinhold, F., & Hofer, S. (2025). Unlocking math potential in students from lower SES backgrounds – using instructional scaffolds to improve performance. npj Science of Learning, 10(1).

Bastani, H., Bastani, O., Sungu, A., Ge, H., Kabakcı, Ö., & Mariman, R. (2025). Generative AI without guardrails can harm learning: Evidence from high school mathematics. Proceedings of the National Academy of Sciences, 122(26), 2422633122.

Bewersdorff, A., Seßler, K., Baur, A., Kasneci, E., & Nerdel, C. (2023). Assessing student errors in experimentation using artificial intelligence and large language models: A comparative study with human raters. Computers and Education: Artificial Intelligence, 5, 100177. https://doi.org/10.1016/j.caeai.2023.100177

Buchholtz, N., Schorcht, S., Baumanns, L., Huget, J., Noster, N., Rott, B., Siller, H.-S., & Sommerhoff, D. (2024). Nobody expects this! Six guiding principles on the implications and research needs of AI technologies in mathematics education. Communications of the Society for Didactics of Mathematics, 117.

Canonigo, A.M. (2024). Leveraging AI to enhance students' conceptual understanding and confidence in mathematics. Journal of Computer Assisted Learning, 40(6), 3215–3229. https://doi.org/10.1111/jcal.13065

Cárdenas, R., Vásquez, HGE, Gamboa, DAP, Arteaga-Arcentales, E., & Carrera, JEM (2025). Exploring AI-powered adaptive learning systems and their implementation in educational settings: A systematic literature review. International Journal of Innovative Research and Scientific Studies, 8(4), 832–842. https://doi.org/10.53894/ijirss.v8i4.7961

Chudziak, JA, & Kostka, A. (2025). AI-powered math tutoring: Platform for personalized and adaptive education. Lecture Notes in Computer Science, 462–469. https://doi.org/10.1007/978-3-031-98465-5_58

Cohn, C., Rayala, S., Srivastava, N., Fonteles, J., Jain, S., Luo, X., Mereddy, D., Mohammed, N., & Biswas, G. (2025). A theory of adaptive scaffolding for LLM-based pedagogical agents. arXiv. https://doi.org/10.48550/arxiv.2508.01503

Contel, F., & Cusi, A. (2025). Investigating the Role of ChatGPT in Supporting Metacognitive Processes During Problem-Solving Activities. Digital Experiences in Mathematics Education, 11(1), 167–191. https://doi.org/10.1007/s40751-024-00164-7

Cosentino, G., Anton, J., Sharma, K., Gelsomini, M., Giannakos, M. N., & Abrahamson, D. (2025). Generative AI and multimodal data for educational feedback: Insights from embodied math learning. British Journal of Educational Technology. https://doi.org/10.1111/bjet.13587

Deng, R., Jiang, M., Yu, X., Lu, Y., & Liu, S. (2025). Does ChatGPT enhance student learning? A systematic review and meta-analysis of experimental studies. Computers & Education, 227, 105224. https://doi.org/10.1016/j.compedu.2024.105224

Dinsmore, D.L., & Fryer, L.K. (2026). What does current genAI actually mean for student learning? Learning and Individual Differences, 125, 102834. https://doi.org/10.1016/j.lindif.2025.102834

Eedi & Google DeepMind (2025). Human-in-the-Loop AI Tutoring Outperforms Human-Only Support. Exploratory Research Report, published 2025. https://finance.yahoo.com/news/exploratory-research-eedi-google-deepmind-090000225.html

Eti, N., Mosia, M., & Egara, F.O. (2026). The role of AI-driven personalized learning in enhancing mathematics problem-solving skills: A systematic review. Frontiers in Computer Science, 8. https://doi.org/10.3389/fcomp.2026.1813431

Fock, A., & Siller, H.-S. (2025). Generative Artificial Intelligence in Secondary STEM Education in the Light of Human Flourishing: A Scoping Literature Review. Research Square. https://doi.org/10.21203/rs.3.rs-6923010/v1

Gerlich, M. (2025). AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking. Societies, 15(1), 6. https://doi.org/10.3390/soc15010006

Gisiger, M. (2025, April 17). The role of artificial intelligence in learning – opportunities and risks. Michael Gisiger. https://text.tchncs.de/gisiger/die-rolle-von-kunstlicher-intelligenz-im-lernen-chancen-und-risiken

Gonnermann-Müller, J., Haase, J., Fackeldey, K., & Pokutta, S. (2025). FACET: Teacher-centered LLM-based multi-agent systems – Towards personalized educational worksheets. arXiv. https://doi.org/10.48550/arxiv.2508.11401

Guerino, G., Challco, GC, Veloso, TE, Oliveira, L., Penha, RSD, Melo, RF, Vieira, T., Marinho, MLM, Macario, V., Bittencourt, II, Isotani, S., & Dermeval, D. (2023). Teacher-centered intelligent tutoring systems: Design considerations from Brazilian public school teachers. Anais do XXXIV Simpósio Brasileiro de Informática na Educação. https://doi.org/10.5753/sbie.2023.235159

Gulz, A., & Haake, M. (2021). No child left behind, nor singled out: Is it possible to combine adaptive instruction and inclusive pedagogy in early math software? SN Social Sciences, 1, 205. https://doi.org/10.1007/s43545-021-00205-7

Guo, J., Ma, Y., Jang, H., Li, T., Wu, J., Huang, D., Han, F., Noetel, M., Liao, K., Tang, X., & Kui, X. (2025). The impact of artificial intelligence on primary school students' motivation and engagement: A systematic review. PsyArXiv. https://doi.org/10.31234/osf.io/ecspn_v1

Harahap, R. (2024). The role of ChatGPT in enhancing mathematics education: A systematic review. Annals of the Vietnam Academy of Science and Technology, 28(2s), 511–524. https://doi.org/10.52783/anvi.v28.2753

Hocine, N., Moussa, MBO, & Ali, S.A. (2023). Posicalculia: An adaptive virtual environment for children with learning difficulties. IEEE INSTA 2023. https://doi.org/10.1109/inista59065.2023.10310592

Holmes, V.M. (2024). Designing an AI math tutor for children with dyslexia, dysgraphia, and dyscalculia. https://doi.org/10.58445/rars.2035

Holmes, W., Bialik, M., & Fadel, C. (2019). Artificial Intelligence in Education: Promise and Implications for Teaching and Learning. Center for Curriculum Redesign.

Holmes, W., Porayska-Pomsta, K., Holstein, K., Sutherland, E., Baker, T., Shum, S., Santos, OC, Rodrigo, M., Cukurova, M., Bittencourt, I., & Koedinger, K. (2021). Ethics of AI in Education: Towards a Community-Wide Framework. International Journal of Artificial Intelligence in Education, 32, 504–526. https://doi.org/10.1007/s40593-021-00239-1

Holmes, W., & Tuomi, I. (2022). State of the art and practice in AI in education. European Journal of Education, 57(4), 542–570. https://doi.org/10.1111/ejed.12533

Hwang, S. (2022). Examining the Effects of Artificial Intelligence on Elementary Students' Mathematics Achievement: A Meta-Analysis. Sustainability, 14(20), 13185. https://doi.org/10.3390/su142013185

Jančařík, A., Michal, J., & Novotná, J. (2023). Using AI Chatbot for Math Tutoring. Journal of Education Culture and Society, 14(2), 285–296. https://doi.org/10.15503/jecs2023.2.285.296

Kaliisa, R., Misiejuk, K., López-Pernas, S., & Saqr, M. (2025). How does artificial intelligence compare to human feedback? A meta-analysis of performance, feedback perception, and learning dispositions. Educational Psychology, 1–32. https://doi.org/10.1080/01443410.2025.2553639

Kestin, G., Miller, K., Klales, A., Milbourne, T., & Ponti, G. (2025). AI tutoring outperforms in-class active learning: An RCT introducing a novel research-based design in an authentic educational setting. Scientific Reports, 15(1), 17458. https://doi.org/10.1038/s41598-025-97652-6

KirschnerED (2025, August 15). ChatGPT in Education: An Effect in Search of a Cause? https://www.kirschnered.nl/2025/08/15/chatgpt-in-education-an-effect-in-search-of-a-cause/

KMK (2024). Recommendations for educational authorities on dealing with artificial intelligence in school education processes. https://www.kmk.org/fileadmin/veroeffentlichungen_beschluesse/2024/2024_10_10-Handlungsempfehlung-KI.pdf

Kortenkamp, U. (2024). How much math does humanity need? Core mathematical competencies in the face of AI. https://doi.org/10.20378/irb-104036

Kumar, H., Rothschild, DM, Goldstein, DG, & Hofman, JM (2023). Math Education with Large Language Models: Peril or Promise? (SSRN Scholarly Paper No. 4641653). Social Science Research Network. https://doi.org/10.2139/ssrn.4641653

Kuo, B.-C., Bai, Z.-E., & Lin, C.-H. (2026). Developing an AI learning companion for mathematics problem solving in elementary schools. Computers & Education, 240, 105463. https://doi.org/10.1016/j.compedu.2025.105463

Létourneau, A., Deslandes Martineau, M., Charland, P., Karran, JA, Boasen, J., & Léger, PM (2025). A systematic review of AI-driven intelligent tutoring systems (ITS) in K-12 education. npj Science of Learning, 10(1), Article 29. https://doi.org/10.1038/s41539-025-00320-7

Li, M. (2024). Integrating Artificial Intelligence in Primary Mathematics Education: Investigating Internal and External Influences on Teacher Adoption. International Journal of Science and Mathematics Education. https://doi.org/10.1007/s10763-024-10515-w

Ligthart, MEU, de Droog, SM, Bossema, M., Elloumi, L., Hoogland, K., Smakman, MHJ, Hindriks, KV, & Ben Allouch, S. (2023). Design specifications for a social robot math tutor. In G. Castellano, L. Riek, M. Cakmak, & J. Leite (Eds.), Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction (pp. 321–330). ACM/IEEE. https://doi.org/10.1145/3568162.3576957

Liu, B., Zhang, W., Wang, F. (2025). Can Generative Artificial Intelligence Effectively Enhance Students' Mathematics Learning Outcomes? A meta-analysis. Education Sciences, 16(1), 140. https://doi.org/10.3390/educsci160101402512.20780.

Listyaningrum, P., Retnawati, H., Harun, H., & Ibda, H. (2024). Digital learning using ChatGPT in elementary school mathematics learning: A systematic literature review. Indonesian Journal of Electrical Engineering and Computer Science, 36(3), 1701–1710. https://doi.org/10.11591/ijeecs.v36.i3.pp1701-1710

Liu, J., Sun, D., Sun, J., Wang, J., & Yu, PLH (2025). Designing a generative AI enabled learning environment for mathematics word problem solving in primary schools: Learning performance, attitudes and interaction. Computers and Education: Artificial Intelligence, 9, 100438. https://doi.org/10.1016/j.caeai.2025.100438

Makransky, G., Shiwalia, BM, Herlau, T., & Blurton, S. (2024). Beyond the „Wow“ factor: Using Generative AI for Increasing Generative Sense-Making. Review. https://doi.org/10.21203/rs.3.rs-5622133/v1

Macina, J., Daheim, N., Hakimi, I., Kapur, M., Gurevych, I., & Sachan, M. (2025). MathTutorBench: A benchmark for measuring open-ended pedagogical capabilities of LLM tutors (arXiv:2502.18940). arXiv. https://doi.org/10.48550/arXiv.2502.18940

Mason, J., Burton, L., & Stacey, K. (1982). Thinking Mathematically. Addison-Wesley.

Mott, B., Gupta, A., Glazewski, K., Ottenbreit-Leftwich, A., Hmelo-Silver, C., Scribner, A., Lee, S., & Lester, J. (2023). Fostering Upper Elementary AI Education: Iteratively Refining a Use-Modify-Create Scaffolding Progression for AI Planning. Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 2, 647. https://doi.org/10.1145/3587103.3594170

Ninaus, M., & Sailer, M. (2022). Closing the loop – The human role in artificial intelligence for education. Frontiers in Psychology, 13. https://doi.org/10.3389/fpsyg.2022.956798

Opesemowo, OAG, & Adewuyi, HO (2024). A systematic review of artificial intelligence in mathematics education: The emergence of 4IR. Eurasia Journal of Mathematics, Science and Technology Education, 20(7), em2478. https://doi.org/10.29333/ejmste/14762

Qian, K., Liu, S., Li, T., Raković, M., Li, X., Guan, R., Molenaar, I., Nawaz, S., Swiecki, Z., Yan, L., & Gašević, D. (2026). Towards reliable generative AI-driven scaffolding: Reducing hallucinations and enhancing quality in self-regulated learning support. Computers & Education, 240, 105448. https://doi.org/10.1016/j.compedu.2025.105448

Rosenfelder, A., Levitin, MD, & Gilead, M. (2025). Towards social superintelligence? AI infers various psychological traits from text without specific training, outperforming human judges. Computers in Human Behavior: Artificial Humans, 6, 100228. https://doi.org/10.1016/j.chbah.2025.100228

Ruan, S., He, J., Ying, R., Burkle, J., Hakim, D., Wang, A., Yin, Y., Zhou, L., Xu, Q., AbuHashem, AA, Dietz, G., Murnane, EL, Brunskill, E., & Landay, JA (2020). Supporting children's math learning with feedback-augmented narrative technology. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3392063.3394400

Rumbelow, M., & Coles, A. (2024). The Promise of AI Object-Recognition in Learning Mathematics: An Explorative Study of 6-Year-Old Children's Interactions with Cuisenaire Rods and the Blockplay.ai App. Education Sciences, 14(6), 591. https://doi.org/10.3390/educsci14060591

Samuelsson, J. (2023). Arithmetic fact fluency supported by artificial intelligence. Frontiers in Education Technology, 6(1), 13. https://doi.org/10.22158/fet.v6n1p13

Sayed, W.S., Noeman, A., Abdellatif, A., Abdelrazek, M., Badawy, MG, Hamed, AEA, & El-Tantawy, S. (2022). AI-based adaptive personalized content presentation and exercises navigation for an effective and engaging e-learning platform. Multimedia Tools and Applications, 82(3), 3303–3333. https://doi.org/10.1007/s11042-022-13076-8

Schneider, RJ (n.d.). The use of AI to support lesson preparation: How can AI-generated practice exercises for primary school mathematics be evaluated from a subject-didactic perspective? [Unpublished manuscript].

Son, T. (2024). Artificial intelligence in mathematics education: A systematic literature review on intelligent tutoring systems. Journal of Educational Research in Mathematics, 34(2), 187. https://doi.org/10.29275/jerm.2024.34.2.187

Song, Y., Kim, J., Liu, Z., Li, C., & Xing, W. (2024). Students' perceived roles, opportunities, and challenges of a generative AI-powered teachable agent: A case of middle school math class. Journal of Research on Technology in Education, 1–19. https://doi.org/10.1080/15391523.2024.2447727

Topkaya, Y., Doğan, Y., Batdı, V., & Aydın, S. (2025). Artificial intelligence applications in primary education: A quantitatively-supported mixed-meta method study [Preprint]. Preprints. https://doi.org/10.20944/preprints202501.2263.v1

Vitale, A., & Dello Iacono, U. (2024). Using social robots as inclusive educational technology for mathematics learning through storytelling. European Public & Social Innovation Review, 9, 1–17. https://doi.org/10.31637/epsir-2024-672

Wang, D., Shan, D., Ju, R., Kao, B., Zhang, C., & Chen, G. (2025). Investigating dialogic interaction in K12 online one-on-one mathematics tutoring using AI and sequence mining techniques. Education and Information Technologies, 30(7), 9215–9240. https://doi.org/10.1007/s10639-024-13195-9

Wang, J., & Fan, W. (2025). The effect of ChatGPT on students' learning performance, learning perception, and higher-order thinking: Insights from a meta-analysis. Humanities and Social Sciences Communications, 12(1), 1–21. https://doi.org/10.1057/s41599-025-04787-y

Wang, L., & Nie, Z. (2023). Research on adaptive learning in K-12 education in the perspective of teachers' artificial intelligence literacy: Development, technology, improvement strategies. IEEE CSTE 2023. https://doi.org/10.1109/cste59648.2023.00059

Wang, RE, Ribeiro, AT, Robinson, CD, Loeb, S., & Demszky, D. (2024). Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise. arXiv preprint arXiv:2410.03017. https://arxiv.org/abs/2410.03017

Wezendonk, A., & Veldhuis, M. (2024). Adaptive empty systems and the didactic solution for basic school learning in the field of knowledge. Tijdschrift voor Onderwijs en Praktijk in Statistiek. https://doi.org/10.54657/tops.13844

Wu, R., & Yu, Z. (2024). Do AI chatbots improve students learning outcomes? Evidence from a meta-analysis. British Journal of Educational Technology, 55(1), 10–33. https://doi.org/10.1111/bjet.13334

Wu, X.-Y., Radloff, J., Yeter, I., Wang, L., & Chiu, TKF (2025). Designing artificial intelligence chatbots for self-regulated learning from a systematic review based on Habermas's three interests. Interactive Learning Environments. https://doi.org/10.1080/10494820.2025.2563086

Yim, IHY, & Su, J. (2025). Artificial intelligence literacy education in primary schools: A review. International Journal of Technology and Design Education. https://doi.org/10.1007/s10798-025-09979-w

Zheng, L., Niu, J., Zhong, L., & Gyasi, J.F. (2023). The effectiveness of artificial intelligence on learning achievement and learning perception: A meta-analysis. Interactive Learning Environments, 31(9), 5650–5664. https://doi.org/10.1080/10494820.2021.2015693

PDF

One response

Virtual tools as a continuation of concrete experiences – reflections on the didactic design of computer-supported activities – urff.app

January 3, 2026

[…] times …“, „Think about …“) at the appropriate time in the learning process. AI-supported learning guidance can also be a useful addition here, relieving and supplementing teachers during the […]

urff.app

More than just answers: AI-supported learning assistance in mathematics lessons