Learning Intelligence — How We Teach When the Answer Is Free

CH 01 / 14

Chapter 01 · The break

A contract quietly broken

In the fall of 2022, a freshman writing instructor could still reasonably believe that the essay sitting in her grading queue was a record of her student's thinking. By the spring of 2026, she cannot. The basic contract of formal education quietly broke between those two semesters.

The break has a date. On November 30, 2022, OpenAI released ChatGPT to the public. Within five days it had a million users. Within two months, 100 million. By the start of 2026, 900 million people were using it weekly, and the question facing every teacher in every classroom on every continent had inverted. It was no longer, can the student produce this work? It was, if the student can produce this work in thirty seconds with a chatbot, what was the work for?

Three and a half years in, the data is overwhelming.

88%

of UK undergraduates use generative AI specifically on assessed work — up from 53% the year before.

HEPI Student Survey 2026

95%

of US college faculty expect AI to increase student overreliance; 90% expect it to diminish critical thinking.

Elon University · AAC&U, Jan 2026

86%

of students globally are already using AI regularly in their studies, with one in four using it daily.

Digital Education Council, 2024

38%

of faculty say AI has increased their workload — mostly from policing cheating and rebuilding assessments. Only 11% say it has decreased it.

Tyton Partners, Time for Class 2025

The instructor with the unreadable essay is not alone. Her problem is now the central operational problem of the entire sector.

What this guide is about is the field that is emerging in response to that problem. It does not yet have a settled name, but the most accurate one available is learning intelligence: the practice of generating and interpreting trustworthy evidence of how learning is happening, not just what was finally submitted, so that teachers can teach, students can learn, and institutions can certify that something real took place between them.

The thesis

When output becomes cheap, evidence becomes everything.

Learning intelligence is not the same as learning analytics. It is not the same as AI tutoring. It is not the same as plagiarism detection. It borrows from all three, but it points somewhere they do not — at the place where assessment was always pointing, before the post-war research university convinced itself that a stack of essays was a sufficient record of a mind at work: at the process of learning itself.

This guide is written for the people who already know that something has to change: K-12 superintendents and deans, provosts and chief academic officers, instructional designers, and the teachers and faculty who are doing the actual work of teaching while the ground shifts under them. It walks through what we knew about learning before generative AI made the question urgent again, what the last four years actually did to the classroom, why the assessment crisis is a validity crisis and not a cheating crisis, what AI can and cannot do as a tutor and a colleague, and what a credible model of learning intelligence looks like for the institutions that have to live inside it.

The schools that thrive in the next decade will be the ones that learn how to see learning again.

CH 02 / 14

Chapter 02 · Foundations

What we knew before the machines could write

The strange thing about the AI crisis in education is that almost everything we need to solve it is already in the research literature. The science of how people learn has not changed because chatbots got good at writing essays. What has changed is the price of pretending it doesn't matter.

Begin with three findings that nearly every cognitive scientist would put on the same short list.

Effortful processing builds memory

Robert and Elizabeth Bjork's work on desirable difficulties — the now-classic body of research showing that easier conditions during practice often produce worse long-term learning than harder ones — established a principle that has been replicated across decades of memory and cognition studies. Spacing practice, mixing problem types, generating answers before being told them, and being forced to retrieve information from memory all feel less productive in the moment and produce more durable learning in the long run. Ease is the enemy of encoding.

Active learning beats passive instruction

The largest meta-analysis on the question, Freeman and colleagues' 2014 paper in PNAS, pooled 225 studies of undergraduate STEM courses and found that student performance under active learning conditions improved by 0.47 standard deviations on exams and concept inventories, with failure rates 1.5 times higher under traditional lecture. The effect held across every discipline studied.

"These results support active learning as the preferred, empirically validated teaching practice in regular classrooms." Freeman et al., PNAS · 2014

Feedback is the strongest classroom lever we have

John Hattie and Helen Timperley's 2007 review in Review of Educational Research put the effect size of well-formed feedback on student achievement between 0.70 and 0.79 — extraordinarily large by educational-research standards. The crucial word is well-formed. Feedback that tells the student where they were trying to go, where they actually are, and what to do next outperforms feedback that just labels work as good or bad. Praise and grades, by themselves, are weak instructional tools; substantive, forward-looking feedback is a near-miracle.

Stack these findings together and the picture is clear. Learning is something a person does with effort, in interaction with material and with other people, under conditions where feedback can flow continuously and the student can act on it. Or as Paul Black and Dylan Wiliam wrote in their landmark 1998 monograph Inside the Black Box, summarizing the case for formative assessment:

"There is a body of firm evidence that formative assessment is an essential component of classroom work and that its development can raise standards of achievement. We know of no other way of raising standards for which such a strong prima facie case can be made." Black & Wiliam · 1998

That was twenty-eight years ago. Most universities still mostly grade the final paper.

The hidden engine: self-regulated learning

There is a fourth, less tidy finding that turns out to matter enormously in the AI era. Learning, even in adults, is regulated by the learner. The body of research on self-regulated learning, built up over decades primarily by Barry Zimmerman, Ernesto Panadero, and their collaborators, treats learning as a cycle: students set goals, plan how to reach them, monitor their own understanding, adjust their strategies when they notice they are off track, and reflect afterward on what worked.

Students who do this well outperform students who don't, even controlling for prior achievement. The skills are teachable but rarely taught explicitly. And — this is the part that matters now — when a student offloads cognitive work to an AI, the most important thing they often offload is not the answer. It is the monitoring. They stop noticing what they don't understand.

The framework that named the practice: the 4Cs

In the early 2000s, the U.S.-based Partnership for 21st Century Skills, now hosted by Battelle for Kids, codified what eventually became known as the 4Cs: critical thinking, communication, collaboration, and creativity. The 4Cs are not a research result; they are a synthesis of what employers, educators, and policy bodies converged on as the durable, transferable capabilities every student should develop.

They are deliberately not a list of facts. They are a list of practices. You can't take a multiple-choice test on creativity. You have to do something creative, in front of someone who can judge it. This is the framework most schools claim to teach. It is also the framework most schools have the hardest time actually assessing — because the 4Cs are processes, and the conventional assessment infrastructure of higher education is built around products. The Association of American Colleges and Universities' VALUE rubrics, developed by faculty teams across more than two thousand institutions, are the most widely adopted attempt to bridge the gap.

The synthesis

Before ChatGPT, the research had already told us:

Learning is effortful, social, and continuous. The strongest tool a teacher has is timely, well-formed feedback inside an active task. The skills that matter most for a graduate's life are not facts but practices — critical thinking, communication, collaboration, creativity — and those practices have to be seen and judged over time. Assessment systems that compress all of this into a final product and a number are weak assessments even on their own terms.

What generative AI did was take that weakness and turn it into an emergency.

CH 03 / 14

Chapter 03 · The constructs

The 4Cs, observed not assumed

The entire learning intelligence project depends on whether the 4Cs can be made observable. If "critical thinking" is a vibe, it cannot be evidenced and the field collapses into self-reporting. If it is a sequence of behaviors a person can be seen doing — well, then we can build something.

C₁

Critical thinking

How well a learner frames questions, evaluates evidence, reasons through alternatives, and revises judgments in response to counterevidence.

C₂

Communication

How well a learner makes ideas understandable and audience-appropriate across written, oral, and multimodal forms.

C₃

Collaboration

How productively a learner contributes to collective understanding, coordinates with peers, and incorporates feedback.

C₄

Creativity

How well a learner generates original combinations, explores alternatives, and iterates toward novel but useful solutions.

Critical thinking, in five visible moves

The VALUE rubric for critical thinking breaks the construct into five observable dimensions: explanation of issues, evidence (selecting and using credible information), influence of context and assumptions, the student's own position (with appropriate complexity), and conclusions and related outcomes. Each of those leaves traces. A student frames a question well or badly. They cite a source or invent one. They acknowledge a counterargument or steamroll past it. They reach a conclusion that follows from their evidence or one that overreaches. In a writing assignment, almost every one of these moves is visible in the draft history if anyone bothers to look.

Communication is behavioral too

The VALUE rubric for written communication asks about context and purpose, content development, genre and disciplinary conventions, sources and evidence, and control of syntax and mechanics. Every one of these is something a reader can see, and every one is something a draft history can reveal as a process. The student who revises for clarity is doing communication. The student who hands in a polished first draft they didn't write may have done communication on the page but not in their head.

Collaboration is where assessment usually fails

Most universities give group grades. A group grade tells you almost nothing about which student did the collaborating. The VALUE rubric for teamwork instead asks about contributions to team meetings, facilitation of teammates' contributions, individual contributions outside meetings, fostering a constructive team climate, and responding to conflict. Each of these is observable in a peer-evaluation workflow, a shared document's revision log, or a structured peer-review system. The signal exists. We mostly don't capture it.

Creativity, redefined as iteration

The OECD's PISA 2022 creative thinking assessment — the first time creativity has been measured at international scale, across 64 countries — defines the construct as the capacity to generate diverse and original ideas, and to evaluate and improve upon ideas, in open-ended tasks across written, visual, scientific, and social problem-solving domains. The framework is deliberately not about a single output. It is about whether a student can generate alternatives, recognize promising ones, and improve them iteratively.

AI can produce a thousand decent ideas in a minute. What it cannot do, yet, is sit inside a learner's head and develop their ability to discriminate between them.

The hidden fifth: metacognition

There is a fifth capability that increasingly belongs in this list, even though it is not part of the original 4Cs: metacognition and self-regulated learning. This is the practice of monitoring your own understanding, planning your approach, noticing when you're stuck, and choosing a better strategy. In a world where students can outsource almost every other cognitive operation to a machine, the one thing they cannot outsource is knowing whether they have actually understood something.

The work of Singh and colleagues at ASIS&T in 2025, which embedded metacognitive prompts into a generative AI search workflow, found that students who were nudged to evaluate the AI's answers — to ask themselves whether what they were reading actually addressed their question — engaged in deeper inquiry and were more discerning about AI responses. The intervention was small. The implication is not.

The 4Cs, plus metacognition, are the right scaffolding for the rest of this guide. They are not abstract. They are practices. They produce evidence. And the practices are exactly what generative AI most threatens to collapse if they are not deliberately preserved.

CH 04 / 14

Chapter 04 · History

Four years that broke the model

The story of GenAI in education from the end of 2022 to the spring of 2026 is the story of an entire sector going through the stages of grief in under four years. The policy and product responses still alive today were forged in specific moments, and the moments still shape what is possible.

Nov 30
2022

Release.

OpenAI publishes ChatGPT, a free conversational interface on top of GPT-3.5. There is no official "education launch." There doesn't need to be one. Within days, students discover that the chatbot will write a passable essay on almost any topic in any voice, and the news travels through TikTok faster than any administrator can respond.

Jan
2023

Panic.

New York City Public Schools, then the largest district in the United States, blocks ChatGPT on school networks. Multiple universities follow. Op-eds appear under headlines like "The College Essay is Dead." Some instructors switch to handwritten in-class essays. Most do nothing different because the semester is already underway.

Feb
2023

The detection arms race begins.

Turnitin announces that AI writing detection is coming to its products. Within months, a parallel industry of "AI humanizers" appears. Students begin running their AI-generated text through second AI tools to bypass detection. The detection market grows. The arms race accelerates.

May
2023

The first major policy document.

The U.S. Department of Education's Office of Educational Technology publishes Artificial Intelligence and the Future of Teaching and Learning. Its core message: AI should "augment human intelligence, not replace it," and the right response is not bans but informed adoption. The phrase "humans in the loop" appears repeatedly.

Aug
2023

Vanderbilt disables Turnitin's AI detector.

In one of the most consequential institutional decisions of the year, Vanderbilt publicly turns off Turnitin's AI writing detection. The rationale is unusually candid. A 1% false positive rate would mean roughly 750 of Vanderbilt's 75,000 annual submissions being wrongly flagged. AI detectors were also more likely to flag text by non-native English speakers. The detection-first strategy started losing credibility a year before it started losing in court.

Sep
2023

UNESCO weighs in.

UNESCO publishes the first global guidance document on generative AI in education. It calls on governments to regulate use, protect student data, set age limits, and build AI literacy into curricula — framing the issue as a curriculum and pedagogy issue, not just a cheating issue.

Nov
2023

TEQSA reframes assessment.

Australia's Tertiary Education Quality and Standards Agency publishes Assessment Reform for the Age of Artificial Intelligence. It is the first national regulator to move the conversation from detection to redesign. The premise: the assurance of learning is the institution's responsibility, and it cannot be discharged by trying to catch AI use. It has to be discharged by designing assessments that produce evidence AI cannot easily fake.

2024

Normalization.

The Digital Education Council's first global student survey finds 86% of students using AI regularly; half do not feel "AI ready." OECD releases PISA 2022 creative thinking results, the first international comparable measure of creative capability across 64 countries. Singapore tops the rankings. The conversation shifts from "how do we ban this" to "what should students actually be able to do."

2025

The institutional response hardens.

EDUCAUSE's 2025 AI Landscape Study finds 57% of higher-ed institutions treat AI as a strategic priority (up from 49%), and 74% are focused on academic integrity. Tyton Partners' Time for Class 2025 finds 38% of faculty reporting increased workload from AI versus 11% reporting decreased.

Jun
2025

The Kestin RCT.

Kestin et al. publish a randomized controlled trial in Scientific Reports in which roughly 180 Harvard physics students alternated weekly between in-class active learning and homework using a custom-built AI tutor. The AI tutor produced learning gains roughly twice as large as the active-learning sessions, along with higher engagement and motivation. The first major evidence that pedagogically designed AI can outperform what was previously considered the gold standard of in-person instruction.

Jul 29
2025

Study Mode.

OpenAI launches "Study Mode" in ChatGPT, built in consultation with pedagogy experts from over 40 institutions. Instead of producing direct answers, Study Mode uses guiding questions and Socratic prompting. The first major signal that the platform layer recognizes the difference between answering a question and teaching the person who asked.

Jan
2026

OECD names the problem.

The OECD Digital Education Outlook 2026 introduces the phrase that will probably define the policy conversation for the rest of the decade: false mastery. Students who practiced math with a generic chatbot performed better in the moment but scored up to 17% worse on subsequent closed-book exams than peers who studied alone. The output looked like learning. The learner had not actually learned.

Jan
2026

Faculty hit a wall.

The Elon/AAC&U survey of 1,057 faculty publishes. 95% expect AI to increase student overreliance. 90% expect it to diminish critical thinking. 83% expect it to shorten attention spans. The faculty, three years in, are not enthusiastic about how this is going.

Mar
2026

HEPI publishes the new normal.

HEPI's 2026 student survey finds that AI use among UK undergraduates is now near universal, with 88% using GenAI for assessed work. The question fully shifts. It is no longer, will students use AI? It is, what are we able to certify about what they learned?

The arc

Panic → ban → detect → regulate → integrate → re-examine.

The sector has not landed yet. But the place it is landing on, the one almost every policy body and serious researcher is now pointing toward, is the same place. Not better detection. Not faster grading. Not bigger LMS dashboards. Better evidence of learning, captured during the learning itself.

That place is what learning intelligence is for.

CH 05 / 14

Chapter 05 · The crisis

When output becomes cheap

There is a moment in the life of any measurement system when the thing it measures stops being scarce, and the system stops being useful. The assessment crisis in education is that moment.

For most of the last century, a well-written paper was a reasonable proxy for a literate mind. It took hours of reading, drafting, revising, and re-reading to produce. The labor and the cognition were entangled. To submit the paper was, with allowances for cheating, also to have done the thinking. The artifact was the evidence.

That entanglement is what generative AI breaks. The price of a polished paragraph has collapsed. The price of a coherent five-page argument has collapsed. The price of a passable C+ undergraduate essay has collapsed approximately to zero.

A thermometer that returns the same number whether or not anything is hot has stopped being a thermometer.

This is not a moral observation. It is a measurement observation. Whatever those artifacts used to certify, they no longer certify in the same way.

The validity crisis, not the cheating crisis

The polite name for this in the assessment literature is the validity crisis. The OECD's "false mastery" finding is one way to describe it. The 17% gap between practice performance with a chatbot and unaided exam performance afterward is a direct empirical demonstration that the metric (the practice score) is decoupling from the construct (what the student actually knows).

The research on cognitive offloading, much of it published in 2025, makes the mechanism more specific. When students hand cognitive work to AI tools, they engage in less of the self-monitoring and effortful retrieval that produce durable learning. Frequent AI use is now reliably correlated with weaker critical thinking, with cognitive offloading as the statistical mediator. The effect is strongest in younger students and those with less academic experience — the populations whose learning was most fragile to begin with.

Why detection isn't the answer

The reflex response — find the AI text, punish it, restore the old contract — has been tested at scale and is not working. Vanderbilt's public reasoning for disabling Turnitin's AI detector in 2023 is still the cleanest statement of why:

False positive rates that look acceptable in a vendor deck (1%, for example) translate into hundreds or thousands of wrongly accused students at institutional scale.
Non-native English speakers are systematically more likely to be flagged.
Detectors look at one snapshot of text, while AI generation models update continuously and rapidly outpace what detectors can learn.

Multiple recent papers, including Garland's 2026 mathematical framing of the detection problem, argue that text-only one-shot detection is structurally incapable of achieving the fairness properties educational institutions need. Even Turnitin itself has shifted its public messaging, repositioning AI detection as one signal among many rather than a determinant of misconduct.

The deeper problem

The detection failure is not an accident. It reflects a deeper truth: AI text is not, in any robust technical sense, distinguishable from human text. It is a category of writing, and that category is converging on the same prose features the academy taught us to value — fluency, clarity, organization, conventional grammar, formal register. We trained machines on those features, then asked them to produce text by maximizing those features, and we are now surprised that the resulting text is hard to distinguish from text by students whose teachers also asked them to produce text by maximizing those features.

What follows from this is uncomfortable but unavoidable. The artifact-only model of assessment was never very strong. It survived because the cost of producing acceptable artifacts was high enough that the artifact and the learning were in practice yoked together. Once the yoke is removed, the artifact stops being able to do the assessment job.

What replaces it

An evidence model that doesn't depend on the artifact being scarce.

The answer cannot be "go back to in-person handwritten exams forever." Handwritten exams have their own validity problems, and no graduate of any of these institutions will go on to do their professional work without AI tools. The answer cannot be "trust the student" either — not because students are dishonest but because the question is about evidence, not honesty.

The answer has to be a different evidence model. One that can absorb the fact that AI is everywhere, in every step of the work, and still produce something an instructor can act on, a student can learn from, and an institution can defend.

CH 06 / 14

Chapter 06 · The split

There are two AIs

It is tempting to conclude that generative AI is bad for learning. Many faculty have. The Elon/AAC&U numbers describe a faculty population that has seen, up close, what unconstrained AI use looks like. The evidence is more complicated than the faculty mood.

AI is not the variable. Pedagogy is.

There appear to be two AIs, distinguished not by which model is running but by how the model is used. One AI helps learners. The other replaces them. The same chatbot can do both within the same hour.

The amplifier case

The case for AI as a learning amplifier is real and growing. The Kestin et al. randomized controlled trial in Scientific Reports found that students using a carefully designed physics AI tutor — short responses, expert scaffolds, explicit step-by-step reasoning, guardrails against giving away answers — learned roughly twice as much per hour as students in an active-learning classroom.

The 2025 meta-analyses on ChatGPT and academic performance, and the 2026 meta-analyses on GenAI and educational outcomes in higher education, point in the same direction on average: AI-supported learning interventions produce significant positive effects on achievement, particularly when the intervention scaffolds the learning process rather than substituting for it.

The erosion case

The case for AI as a learning erosion mechanism is equally real. The OECD Outlook's "false mastery" finding, the cognitive-offloading research, the workload survey showing faculty spending more time policing AI than teaching, the studies showing students who lean heavily on AI tools score worse on subsequent unaided assessments — these are not noise. They describe what happens when AI is used without pedagogical structure.

AI is not a pedagogy. It is an amplifier of whichever pedagogy you bring to it.

Good designs become better. Bad designs become much worse. An assignment that was already a poor test of student understanding becomes a near-zero test of student understanding when AI is added. An assignment that was already focused on the process of thinking, with scaffolding and feedback and revision, can become much more powerful with AI as the tireless second reader.

Learning work vs. output work

What separates the two cases is whether the AI is being used to do the learning work or the output work.

Learning work

AI asks the student a question, makes them try, shows them where they went wrong, encourages them to try again, models the reasoning, then steps back. The student's neural circuits for the topic are exercised. Their schema is built. Their ability to do the work without the tool improves. This is what Bloom's famous "two sigma problem" was about — and the Kestin study's two-fold gain over active learning is in that ballpark.

Output work

The student types the prompt, takes the result, lightly edits it, and submits. The artifact looks the same as it did before, but no learning has happened. The student's circuits for thinking through the problem have not been exercised. Their ability to do the work without the tool has not improved and may have actively decayed. This is what cognitive offloading looks like in practice.

Can AI be empathetic enough?

There is a third case worth naming, which is the question of whether AI can do the relational work of education. The honest answer from the research is partial. The 2024 systematic review by Sorin and colleagues in JMIR found that large language models can demonstrate elements of cognitive empathy — recognizing emotional content, producing supportive-sounding responses, sometimes outperforming rushed humans on perceived bedside manner. But they do not feel with the learner, and their "empathy" is prompt-sensitive, surface-level, and easily destabilized.

A 2024 quasi-experimental study in online higher education found that empathic chatbot feedback was comparable to teacher feedback on learning performance in that specific context. But a 2025 study on the "emotional cost of educational chatbots" found that students using a chatbot during an assignment reported significantly lower positive affect than peers who did not. A 2026 study on the "AI empathy choice paradox" found that people generally prefer to receive empathy from humans, even while rating AI-generated empathy as high quality when they receive it.

Role-specific, not categorical

AI can be empathetic enough for some roles, not for others.

AI is empathetic enough for first-line support, formative feedback, low-stakes encouragement, structured Socratic prompting, and some tutoring at large scale. It is not empathetic enough to replace faculty mentorship, the relational trust that underwrites belonging and challenge, high-stakes advising that shapes a student's life, or the moral seriousness that good teachers bring to difficult conversations.

Empathy is, in the end, a feature of accountability between persons. A chatbot is not accountable in that sense, and pretending otherwise is the same category mistake that has tripped up every wave of educational technology since the teaching machine.

CH 07 / 14

Chapter 07 · The reframe

From artifact to process: a new evidence model

If the artifact-only model of assessment is broken, what replaces it? The answer existing in the research literature, well before generative AI made it urgent, is evidence-centered design: build assessments around the evidence you actually need to make the claims you want to make.

The more steps you observe, the more confidently you can read what the student knows.

Three converging literatures

Evidence-centered design, from Robert Mislevy and colleagues' work in the late 1990s, treats assessment as an argument structure. The argument starts with claims (what we want to say a student knows or can do), specifies the evidence that would support those claims, and only then designs tasks that elicit that evidence. The grade is the conclusion of an argument, not the start of one.

Stealth assessment is the practice of capturing evidence of learning during an authentic activity, rather than interrupting the activity with separate tests. Developed primarily in educational games and simulations, the principle generalizes: the evidence is built into the experience, not bolted on.

Process data in large-scale assessments captures not just student answers but the sequences of actions that produced them — which items they returned to, how long they spent on each step, how they revised. The PISA 2025 "Learning in the Digital World" framework treats iterative knowledge building and effective self-regulation with digital tools as integral parts of the competence being assessed.

The more steps you can observe between a student's first encounter with a problem and their final answer, the more confidently you can certify what they learned.

The new menu of assessment patterns

The most robust assessment designs in the AI era are the ones that have been recommended for decades by serious assessment researchers but rarely adopted at scale because they are labor-intensive. Today, they are also the only ones whose evidence still holds up.

Staged writing assignments

Where the draft history is itself evidence. Students submit a planning document, outline, annotated bibliography, first draft, revision memo, and final draft, with feedback exchanges between each. The grade considers the trajectory, not just the endpoint. AI can be used at any stage and the system still produces signal about what the student actually engaged with.

Inquiry-driven discussion

Where the quality of the questions a student asks is itself a measurable thing. The Packback platform's Curiosity Score, for example, is not a generative AI scoring system; it is an algorithmic measure of the open-endedness, sourcing, and clarity of student-posed questions. A peer-reviewed study on 2,800 long-form assignments found that AI-assisted process feedback improved writing quality and reduced grading workload. The mechanism was process visibility, not output evaluation.

Social annotation

Where students mark up shared readings in front of each other before class. The annotation behavior is the evidence. A study on the Perusall platform in Frontiers in Education found that pre-class annotation grades and discussion behavior together accounted for 41.8% of the variance in students' weekly post-class essay performance. Annotation behavior is one of the most diagnostic single signals available in undergraduate teaching.

Peer review with calibration

Where students assess each other's work using rubrics and are themselves assessed on the quality of their feedback. The peer review is the assessment of collaboration. Done well, it produces evidence about both the reviewed student and the reviewer.

Oral defense components

Where students explain their work and answer follow-up questions. The oldest and most reliable assessment in the academy — the one PhD committees use precisely because they cannot easily certify a dissertation just by reading it. AI is structurally bad at fake-defending an argument it generated.

Reflective process notes

Where students articulate what they tried, what they learned, where they got stuck, and what they would do differently. Evidence of metacognition. Also, when done honestly, the cheapest learning intervention any course can add.

Explicit AI use disclosure

Where students describe what AI tools they used, for what purposes, and how they evaluated the results. The assessment-design move that does the most work for the least cost. It does not depend on detection; it does not depend on trust. It makes AI use a legible part of the assignment.

What all of these have in common is that they multiply the number of points where the student is visibly thinking. Each individual point is not necessarily harder to fake than a final essay. But together, they form a pattern that takes much more work to fake than to do, and that produces a much richer record for the instructor to read.

CH 08 / 14

Chapter 08 · Definition

Defining learning intelligence

It is time to be specific.

Working definition

Learning intelligence is the continuous collection and interpretation of evidence about how learning is happening, used to improve teaching, learning, and assessment.

The definition is one sentence long for a reason. The longer versions tend to obscure what the field is actually for. Learning intelligence is not a technology. It is a practice. The technology exists to enable the practice, the way the microscope exists to enable biology.

What it isn't

Several other terms are in the air, and the differences matter:

Learning analytics

The established academic field, re-codified in 2025 by SoLAR as "the collection, analysis, interpretation and communication of data about learners and their learning that provides theoretically relevant and actionable insights." Learning analytics is the intellectual parent. The difference is that classical learning analytics has tended toward retrospective dashboards (engagement, time-on-task, click counts) often weakly connected to specific learning constructs. Learning intelligence is more pedagogically opinionated, more assignment-native, and more focused on inferring constructs rather than reporting engagement.

AI in education

The broader category, including AI tutors and administrative chatbots. Learning intelligence is a specific use case within it. The distinguishing question is whether the system's primary output is evidence a human can act on.

Plagiarism & integrity detection

Has tried to colonize the assessment problem from the integrity side. Detection is a small, narrow, and now-discredited slice of what assessment needs. A learning intelligence system may include some integrity signals, but its main job is positive: showing what students did, not just flagging what they might not have done.

Adaptive learning

About adjusting content delivery to individual students. Learning intelligence is about understanding what a student's work reveals about their thinking. Adjacent, not identical.

Four emerging variants

Within the learning intelligence space itself, four overlapping variants are emerging:

Instructional intelligence

Mostly K-12. AI embedded in curriculum and lesson delivery. Vendors like Kiddom and Subject. Signal: engagement and standards-aligned progress.

Authorship intelligence

Systems capture evidence of writing process, draft history, AI use, and revision behavior at the assignment level. Cadmus, Turnitin Clarity, Brisk's writing replay. Signal: process provenance.

III

Institutional intelligence

LMS and campus platforms aggregate outcomes across courses. Canvas Intelligent Insights, D2L Achievement+, Anthology's analytics suite. Signal: which courses and student segments are at risk?

Process-native intelligence

Systems that capture evidence of student thinking inside high-cognition assignments — discussion, inquiry, writing, peer review, oral defense. Packback, FeedbackFruits, Perusall, Kritik. The variant with the most upside.

Five principles

What these variants share is a set of design principles. The following five, taken together, distinguish a learning intelligence system from a generic AI tool or analytics dashboard.

Process over artifact

The primary unit of analysis is what the learner did, not just what they submitted. A system that only ingests final submissions cannot do learning intelligence in any robust sense.

Constructs over clicks

Raw events are inputs, not outputs. A learning intelligence system maps events to learning constructs (critical thinking, revision quality, collaboration depth) using transparent evidence models. Engagement metrics are at best weak proxies for learning; surfacing them as if they were learning is engagement theater.

Evidence over scores

The system's primary output should be evidence a human can interpret, not a black-box score. When the system does produce scores, the scores should be defensible: the user should be able to ask "why" and get a useful answer that points back to specific observed behavior.

Transparency over surveillance

Students should know what is being captured, why, and how it will be used. Teachers should be able to see and challenge the system's inferences. Institutional governance — privacy, retention, role-based access, audit logging — is part of the system, not an afterthought. This is a precondition of trust, and trust is a precondition of any assessment that actually changes student behavior.

Human-in-the-loop over autonomous judgment

The system supports instructor judgment; it does not replace it. High-stakes decisions — grades, integrity findings, intervention referrals — remain with humans. The system's job is to make those decisions better-informed, not to make them automatically.

Does it help the people in the room understand what is happening between them? That is the test for whether something is learning intelligence or just edtech with AI features.

CH 09 / 14

Chapter 09 · Anatomy

How the platform actually works

Definitions move quickly to abstraction. It is worth being concrete about the shape a learning intelligence system actually takes, because most products marketed under the "learning intelligence" banner over the next two years will be something else wearing the label. A serious platform sits on three layers, and a buyer can tell whether a product is real by asking which of the three are present.

Signals flow upward through the stack. Each layer fails on its own.

Layer 03

Synthesis & insight

Rolls signals into views that match the audience. A student should see their own evidence portfolio — what they did, what feedback they received, where they grew, where they are stuck.

An instructor should see a per-section view that answers the three questions worth asking in any teaching moment: who needs help now, what evidence supports that inference, where should I intervene. A department chair or chief academic officer should see aggregated, privacy-preserving rollups of 4C coverage across a course, cohort, program, or institution — the kind of view an accreditor can read and a budget committee can fund against.

Layer 02

Assignment types as instruments

The vehicles through which evidence is produced. No single activity surfaces all of the constructs above. Discussions surface inquiry quality and reasoning chains. Writing surfaces argument structure and revision behavior. Close reading and annotation surface comprehension. Peer review surfaces collaboration and meta-critical thinking. Team-based projects surface coordination. Oral defense surfaces explanation under pressure.

And a newer category — AI-integrated assignments — surfaces AI literacy itself: students prompt, evaluate, accept, reject, and reflect on AI use, and the entire trajectory becomes evidence the instructor can read. Whoever owns the assignment owns the learning intelligence.

Layer 01

Pedagogy and the constructs

The foundation. The platform has to know what it is trying to measure. The 4Cs are the most defensible anchor framework available, paired with what a growing number of practitioners now call AI literacy capabilities: judgment (knowing when to trust an AI output), explanation (knowing how to defend a choice an AI helped produce), coordination (working with AI as one collaborator among many), and agency (deciding when not to use it at all).

Without an explicit construct layer, every "insight" the system produces is a behavioral metric in search of meaning.

The three-layer test

A buyer can diagnose any product in this space with three questions.

A platform that has only Layer 3 is a dashboard pretending to be intelligence. A platform that has only Layer 2 is a workflow tool. A platform that has only Layer 1 is a framework, not a system. The serious work of the next two years in this category is to assemble all three.

The assignment-type taxonomy, mapped to the 4Cs

The strength of a learning intelligence platform's coverage of the 4Cs is the sum of the assignments it can host and the constructs it can map them to. A platform that hosts only one or two assignment types — discussion alone, or writing alone — is by definition a partial measurement system, regardless of how good its analytics layer is.

Assignment type

What it evidences

Primary 4Cs

Inquiry-driven discussion

Question quality, reasoning chains, response uptake, source-grounded debate.

Critical Thinking · Communication

Staged writing

Argument structure, source use, revision behavior, feedback uptake across draft history.

Critical Thinking · Communication

Close reading & annotation

Comprehension, engagement with text, ability to surface confusion, social sense-making.

Critical Thinking · Communication

Peer review & calibration

Quality of feedback given, reciprocity, ability to read another's work the way an instructor would.

Collaboration · Critical Thinking

Team-based work

Contribution equity, coordination moves, conflict resolution, collective problem-solving.

Collaboration · Creativity

Conversational reasoning & oral defense

Explanation under pressure, audience adaptation, ability to defend a position in dialogue.

Communication · Critical Thinking

Co-writing with AI

What the student asked, what they accepted, what they rejected, and why — captured as evidence.

Critical Thinking · AI Literacy

AI-integrated assignments

Judgment about when to use AI, explanation of choices, agency about when not to. The trajectory is the assessment.

AI Literacy · all 4Cs

A maturity model for the institution

Institutions, like products, do not arrive at full learning intelligence overnight. A useful way to think about adoption is as three progressive postures an institution can take toward AI in its assessments, each enabled by a different set of platform capabilities.

Stage 01 · Foundation

AI Aware

The institution recognizes AI is in every classroom and makes its assessment more visible. Faculty use the 4Cs explicitly; assignments are designed with process visibility in mind. No longer pretending AI is absent — not yet doing anything specific about it.

4Cs framework & curriculum mapping
Faculty enablement & assignment design
Discussions and writing assignments
Engagement insights, course-level analytics

Stage 02 · Transition

AI Active

The institution now treats AI use itself as an object of instruction. Conversational reasoning, co-writing with AI, collaborative live thinking, and a metacognitive layer in which students reflect on what they asked and why. Faculty teach with AI, not policing it.

All Foundation capabilities
AI literacy curriculum & thought leadership
Conversational reasoning · co-writing assignments
Metacognitive layer · visible decision-making
Cohort-level insights · student journey view

Stage 03 · Intelligence

AI Native

AI is embedded throughout, and the institution measures not only the 4Cs but AI literacy itself. Full learning-journey visibility for faculty and administration. The institution can produce, on demand, defensible evidence of what its graduates can do with AI and without it.

All Transition capabilities
AI literacy outcomes measurement
Full assignment-type library
Learning-journey visibility for admin & faculty
AI literacy benchmarking across cohorts
Predictive engagement & risk signals

The point

"AI policy" is not a single decision. It is a position on a continuum.

The point of the maturity model is not that every institution should sprint to AI Native. The point is that the right position depends on faculty readiness, student population, governance maturity, and what the institution is trying to certify. A learning intelligence platform that respects this is one that meets the institution where it is and offers a clear path forward.

CH 10 / 14

Chapter 10 · Practice

How to teach with AI

The categorical answer to "how should I teach with AI" is "it depends," which is true and useless. The operational answer, drawn from the research evidence assembled so far, is more specific.

For instructors

→

Start with what you are trying to certify.

The first question for AI redesign is whether your intended learning outcomes are still meaningful in a world where AI can produce most of the artifacts that previously evidenced them. If "the student can write a coherent five-page essay" is the outcome, that outcome is now a weak proxy for what the course probably actually cares about, which is something more like "the student can read closely, generate a thesis from evidence, defend it against an alternative reading, and revise based on critique." The redesign starts by sharpening the outcome.

→

Make AI use explicit and bounded.

The single most leveraged move any instructor can make is to write an AI use policy into each assignment that specifies what AI tools may be used, for what stages, and with what disclosure. The policy can be permissive ("any AI tool, any stage, with a disclosure paragraph") or restrictive ("no AI tools for this exam"). What it cannot be is implicit. Implicit policies turn every assignment into a guessing game for students and a detection puzzle for instructors.

→

Stage the work.

Almost any high-cognition assignment can be broken into stages with checkpoints. A research paper can have a topic proposal, an annotated bibliography, an outline, a draft, a peer review exchange, a revision memo, and a final draft. AI cannot easily fake six stages of legible thinking across two months.

→

Add a metacognitive layer.

The cheapest way to convert AI use from passive offloading into active learning is to require students to reflect on it. "What did you ask the AI? What did you accept? What did you reject? Why?" These three questions, asked in a short reflective paragraph attached to every AI-permitted assignment, do an enormous amount of work.

→

Reintroduce the voice.

A five-minute oral defense after a major paper produces more evidence about whether the student understood their argument than another five pages of writing would. AI is structurally bad at fake-defending an argument it generated. A student who wrote their paper genuinely can usually defend it. A student who didn't, usually can't.

→

Use AI as a feedback amplifier, not a grading machine.

The role generative AI is best at in the classroom is the role of patient, tireless, immediate first reader. AI feedback on a draft — what is the thesis, what is the strongest evidence, what is the weakest, what is missing — is often genuinely useful, and it can be delivered at 2 a.m. when the student is working. What AI is bad at is making the final grading judgment that affects the student's transcript and life. The Kestin study's effect size came from the AI doing the tutoring; the grading remained with the instructor.

→

Calibrate peer review.

Peer review done badly is busywork that students don't trust. Done well, it produces evidence of collaboration, communication, critical reading, and a powerful learning experience for the reviewer. The trick is calibration: train students on what good feedback looks like, give them rubrics, and assess them on the quality of their reviews as well as on their own work.

→

Talk about it openly with students.

This is the move faculty most often skip and most often regret. Students are not the enemy. Most of them want to learn. Many of them are confused, often legitimately, about what is and is not allowed. A short, honest conversation at the start of a course about why the assignments are designed the way they are does more to shape student behavior than any technical countermeasure.

The bar has moved

None of this is exotic. It is mostly old.

Before generative AI, instructors who used these practices were doing exceptional teaching. After generative AI, instructors who don't use them are doing weak assessment, regardless of whether they realize it.

The good news is that the bar has moved toward the kind of teaching most teachers came into the profession wanting to do. Less grading of artifacts that may or may not represent student thought. More conversation. More feedback. More visible learning. The AI era is a forcing function on a transition the research literature has been quietly recommending for thirty years.

CH 11 / 14

Chapter 11 · Institutions

The institutional layer

What individual faculty can do, by themselves, has limits. The transition learning intelligence describes is also an institutional transition, and the institutions that get this right will get it right at the system level.

The institutional response has so far been mixed. EDUCAUSE's 2025 AI Landscape Study found 57% of higher-ed institutions now consider AI a strategic priority, up from 49% the previous year. The proportion with AI Acceptable Use Policies climbed from 23% to 39% in a single year. And yet only 9% reported that their cybersecurity and privacy policies were adequate for AI-related risks. The infrastructure is catching up to the urgency, slowly.

Five institutional priorities

Get clarity on what the institution wants to certify.

Most colleges have learning outcomes documents. Most of those documents are aspirational and rarely consulted. The first step toward a defensible AI-era assessment strategy is to take those outcomes seriously enough to ask, for each one, what evidence the institution is actually generating, and whether that evidence is robust to AI. This is a faculty-governance conversation as much as it is an administrative one. It cannot be outsourced.

Align assessment design at the program level.

A degree certifies what the program did, not what each course did. Programs need to map outcomes to courses, identify where each outcome is taught, practiced, and assessed, and ensure that the cumulative evidence supports the credential. A program where every course relies on take-home essays as its primary assessment is now structurally vulnerable. A program that distributes assessment across written, oral, project-based, and applied work is much more robust.

Fund the infrastructure for process evidence.

Capturing process evidence is more expensive than capturing artifact evidence. Draft histories take storage. Peer review systems take licenses. Oral defenses take time. The institutions that succeed will treat this as a capital expense, not an operating burden on individual instructors. The institutions that fail will quietly require faculty to do this work in addition to everything else they were already doing, at which point most faculty will quite reasonably refuse, and the assessment redesign will not happen.

Build the data governance before you need it.

Every learning intelligence system is also a student data system. The questions that matter — what data are collected, how long they are retained, who can see them, what inferences can be made — should be answered before the platforms are bought, not after. The Digital Education Council's 2024 survey found 80% of students said AI in universities was not fully meeting expectations, with 60% specifically worrying about the fairness of AI evaluations and 70% citing privacy. These are not abstract policy questions. They are trust prerequisites.

Invest in faculty development that respects faculty time.

The training that works is not generic AI literacy. It is discipline-specific assessment redesign, run by colleagues who teach similar courses, with concrete examples, model assignments, and time to revise actual syllabi. Institutions pairing instructional designers with faculty cohorts and offering course buyouts for redesign are seeing substantive curriculum change. Institutions that offered a webinar and a policy document are mostly still where they were three years ago.

The deeper question

The institution has to decide what business it is in.

The transactional model — pay tuition, complete assignments, receive credential — has been weakening for years, and AI accelerates the weakening. If the credential is going to mean anything in 2030, it will need to mean that the institution can credibly say what the holder of it can do. Which means the institution needs to have evidence. Which means it needs to invest in producing that evidence, deliberately, at scale, across its curriculum.

When output becomes cheap, evidence becomes everything. The institutions that act on this will be in a strong position. The ones that don't will find that their credentials are slowly losing their power to certify anything an employer or a graduate school cares about, and that nobody can quite say when it happened.

CH 12 / 14

Chapter 12 · The market

Where to look in the field

A category is more real when you can see who is building inside it. The market for learning intelligence and AI-era assessment is still forming, and the categories below are more useful than vendor names — most companies fit primarily inside one of them, and the test for any product is what it actually does, not what it markets itself as.

The five categories below were derived by reading the field across higher education and K-12 vendor materials, peer-reviewed efficacy research where it exists, and the institutional buying patterns visible in EDUCAUSE, Tyton, and HEPI data. Each category solves a different piece of the assessment-after-AI problem. Each has a place in a well-considered assessment architecture. None of them, on its own, is yet a complete answer.

The capability matrix

The matrix below is the fastest way to read the market. Categories run down the rows; the seven capabilities a learning intelligence platform needs to deliver run across the columns. A buyer can read it in under a minute and immediately see what each category gives them and what it leaves on the table.

Category	Process evidence	Construct mapping (4Cs)	Multiple assignment types	Faculty- readable	Accreditor- defensible	Institutional rollups	LMS integration
Process-native LI	●	●	●	●	◐	◐	●
Integrity-first & detection	◐	○	○	◐	◐	◐	●
LMS-native analytics	○	◐	○	◐	●	●	●
AI tutoring	◐	○	◐	○	○	○	◐
K-12 instructional	◐	◐	◐	●	○	◐	◐

●Full capability ◐Partial ○Not in the category

The five categories, explained

01 · The newest

Process-native learning intelligence

Products in this category capture evidence of student thinking inside specific high-cognition assignments — discussion, writing, peer review, oral defense, AI-integrated work — and map those signals to learning constructs like the 4Cs and AI literacy. They are pedagogically opinionated by design and are the only category architecturally aligned with the "process over artifact" thesis the rest of the field is now converging on.

StrengthProduce defensible evidence that learning happened, before the final artifact is submitted.

GapCross-program institutional reporting at the scale of LMS analytics is still expanding. Most platforms stop at the assignment or course level.

When to lookWhen the question is "how do we know learning happened" and the artifact alone is no longer trustworthy. Most useful for institutions facing assurance-of-learning pressure.

RepresentativePackback · Cadmus · FeedbackFruits · Perusall · Kritik

02 · The incumbent

Integrity-first and detection

The established category for academic-misconduct workflow. Products here detect similarity, identify likely AI-generated text, capture authorship transparency, and provide audit trails for hearing processes. The category leaders are now pivoting toward process visibility (notably Turnitin's Clarity product), recognizing that detection alone is structurally limited.

StrengthSurface signals that a piece of work may not be the student's own. Maintain the institutional integrity infrastructure accreditors still expect.

GapCannot prove learning occurred (only flag where it may not have). Vanderbilt's 2023 decision to disable Turnitin's AI detector on fairness grounds was an early signal of waning institutional confidence; non-native English speakers continue to face 2–3× higher false-positive rates.

When to lookWhen the institution still needs an integrity workflow for high-stakes assessments, with full awareness of false-positive risk and an explicit policy that detection alone is not sufficient evidence for misconduct.

RepresentativeTurnitin · Originality.AI · GPTZero · Copyleaks

03 · The institutional layer

LMS-native analytics

The reporting infrastructure where provost- and CIO-level conversations already happen. Products here aggregate course-level data — submissions, grades, page views, login frequency — into dashboards that flag at-risk students and report on course health and outcome trends. They are where institutional accreditation reporting is already structured.

StrengthProvide the institutional view. Native data flow from existing LMS adoption. Already trusted by accreditors for outcome reporting.

GapEngagement counts are not learning measurements; participation and login frequency are weak proxies for whether anything has been learned. The 24–48 hour data lag also makes them retrospective by design, not real-time intervention tools.

When to lookFor the institutional reporting layer, not as a substitute for assignment-level evidence. Best deployed as the destination that ingests process evidence from elsewhere.

RepresentativeCanvas Intelligent Insights · D2L Achievement+ and Lumi · Anthology Analytics for Learn

04 · The fastest-growing

AI tutoring

The category propelled by both consumer interest and pedagogically engineered systems. Products here scaffold individual study sessions, generate explanations, provide practice problems, and deliver immediate formative feedback. Some are extraordinarily effective when carefully designed; others are general-purpose chatbots in a study skin.

StrengthPersonalize learning at scale. The Kestin et al. 2025 RCT in Scientific Reports showed a carefully designed AI tutor producing learning gains roughly twice those of in-class active learning.

GapAI tutoring is an input to learning, not a measurement of it. A student who learned a topic through a tutor still needs an assessment surface that captures whether the learning held without the tool.

When to lookFor personalized study support, supplemental instruction, and outcomes-aligned remediation — not for grading, certification, or program-level assurance.

RepresentativeKhan Academy and Khanmigo · ChatGPT Study Mode · Squirrel AI · Carnegie Learning

05 · The K-12 wedge

K-12 instructional intelligence

The category that has most actively claimed the "learning intelligence" phrase, generally inside K-12 curriculum and lesson workflows. Products here align lessons to standards, deliver classroom-level visibility to teachers, and provide AI productivity tools for routine instructional tasks. The K-12 context — younger students, more standardization, less faculty autonomy — is structurally different from higher ed, and tools built primarily for it often do not generalize directly upward.

StrengthStandardize AI use across schools and districts. Reduce teacher workload on lesson planning and feedback generation. Strongest district-level adoption motion in the field.

GapMost are not yet evidence-architected for cross-program assurance in the way higher education will require. Strong in the classroom, weaker in the program-review or accreditation layer.

When to lookFor K-12 specifically, especially districts standardizing AI policy and seeking teacher-productivity gains.

RepresentativeKiddom · Subject · MagicSchool · Brisk

Convergence is happening

The five categories overlap more than they used to. Integrity vendors are adding process-visibility features. LMS vendors are partnering with AI providers to embed tutoring and feedback directly. Process-native platforms are extending upward into the institutional reporting layer. K-12 vendors are eyeing higher ed.

The convergence is not random. Every category is moving toward the same destination: a system that captures evidence of how learning is happening, maps it to defensible constructs, and produces interpretable reports for the people who need them. That destination is what this guide has been calling learning intelligence throughout. The categories represent different starting points, not different end states.

This matters for the buyer because it means a vendor's category of origin tells you what they will do best in 2026 — but the maturity of their other layers tells you what they will be able to do in 2028. Integrity vendors moving into process visibility, LMS vendors moving into construct mapping, and process-native vendors moving into institutional rollups are all making the same bet on the same destination. The question is which ones will actually arrive.

A buyer's tool

Eight questions to ask any vendor

Does the product capture process evidence, or only final artifacts?
Can you map signals to specific learning constructs (such as the 4Cs and AI literacy), or only to engagement counts?
Can a faculty member see why a student was flagged or scored, in terms of observable behavior?
Can a student see their own evidence, and contest incorrect inferences?
Does the platform host multiple assignment types, or only one?
Is the data architecture defensible to an accreditor, with rollups by course, cohort, and program?
What student data is captured, how long is it retained, and who has access?
Is any high-stakes judgment — grades, integrity findings, intervention referrals — held by humans?

The vendor that can answer all eight questions clearly, in plain language, without sales evasion, is the vendor worth talking to longer. The vendor that responds to half of them by talking about how innovative their AI is — that's the answer to the question you actually asked.

CH 13 / 14

Chapter 13 · Limits

Open questions and limits

Honesty requires acknowledging what the field of learning intelligence does not yet know, and what its risks are.

The research base is still young

Most of the strongest empirical studies on AI in higher education are short-term, often in specific disciplines (often language learning or introductory STEM), often with carefully engineered interventions that may not generalize to the chatbot a typical student uses on a typical Tuesday night. The Kestin RCT is the strongest single piece of evidence for AI as a learning amplifier, and it is one study, in one course, with a custom-built tutor. Larger and longer trials are coming, but the field's current claims should be held with appropriate humility.

Process measurement is hard

The process-data and stealth-assessment literatures are robust as frameworks but uneven as implementations. Most existing edtech "process data" is engagement counting in disguise. The hard work of mapping events to constructs, validating those mappings, and demonstrating that the resulting inferences are fair across student populations is mostly still ahead of us.

Affective AI is not ready

Multiple recent systematic reviews, including a 2026 review of 96 studies, have found that affective computing in education tends to study engagement, confusion, and frustration from facial expression CNNs, often without real classroom validation and almost always without serious ethical analysis. The safe use case is opt-in, low-stakes, instructor-facing support. Anything resembling automatic grading of "engagement" or "attention" from a webcam is not safe.

Equity outcomes are unclear

A within-subject writing experiment in 2025 found that all students benefited from AI assistance but less-skilled writers benefited more — suggesting AI could narrow some performance gaps. The DEC and HEPI surveys, on the other hand, show socioeconomic divides in usage patterns. Whether AI is a leveler or an amplifier of existing inequalities is probably context-dependent, and learning intelligence systems should be designed to monitor and audit their own equity effects rather than assume them.

Privacy is the chronic risk

Capturing process evidence at high fidelity creates real surveillance risks. The safe defaults are data minimization, bounded retention, redaction, role-based access, and auditable model use. A learning intelligence system that does not respect these defaults will produce backlash that may delay the entire field.

The category itself is contested

"Learning intelligence" is already being used by multiple companies, including 1EdTech, Kiddom, Subject, and Brisk. The phrase has not been claimed by any single body and probably will not be. The thing that matters is not who owns the phrase but who builds the practice. The practice can succeed under several names. What cannot succeed is treating the phrase itself as a moat.

CH 14 / 14

Chapter 14 · Closing

The post-output classroom

There is a temptation, in a moment like this, to be either apocalyptic or utopian. Either the universities are ending and the AI is winning, or the AI is liberating and the old gatekeepers should get out of the way. Both moods miss what is actually happening.

What is actually happening is that a measurement system whose limits had been quietly tolerated for decades has now broken in public. The old contract — submit the artifact, receive the grade, accumulate the credential — relied on a scarcity that has been quietly removed. Faculty are not wrong that something has been lost. Students are not wrong that something has been gained. Both are responding to real features of the situation.

The way out is not nostalgia and not utopia. It is to take what we have always known about how people learn, and what we have always known about how to assess what they have learned, and finally to build the infrastructure that takes those things seriously. The research has been telling us for thirty years that learning is a process, that feedback is the most powerful intervention we have, that the 4Cs are practices, that authentic assessment is the only assessment that produces durable evidence. We did not act on it at scale because the artifact-only model was good enough.

It is no longer good enough.

Learning intelligence is the name being attached, for now, to the work of building the new system. Whether the term sticks is less important than whether the practice does. The practice is captured in a few principles that have been the through-line of this entire guide. Watch the process, not just the product. Map signals to constructs, not to clicks. Treat evidence as something the human reads, not something the system decides. Be transparent about what is captured and why. Keep humans in the loop for any decision that matters.

A teacher in 2026 who designs an assignment that produces six points of legible thinking, gives students explicit AI use guidelines and a metacognitive reflection prompt, calibrates peer review, requires a short oral defense, and uses AI as a tireless feedback amplifier between drafts is not doing something exotic. They are doing the kind of teaching that the research literature has recommended since before most of their current students were born. The difference is that, in 2026, they are also doing the only kind of teaching whose evidence still holds up.

The institutions that fund this work, that align their assessment infrastructure to it, that protect their faculty's time to do it, and that build the data governance to support it without slipping into surveillance, will be the institutions whose credentials still mean something at the end of the decade. The institutions that don't will continue to graduate students whose transcripts certify achievements the institutions can no longer credibly verify.

The post-output classroom is here. It has been here, in fragments, for years. What learning intelligence is for is to assemble those fragments into a system the next generation of students can actually be learners inside.

The work

That is the work. It will take a decade.

It is the most interesting work in education right now.

For human + agent readers

Apply this guide to your work

Five copy-and-paste prompts you can hand to Claude, ChatGPT, Gemini, Cursor, or any agent that reads files. Each one turns the argument of this guide into something you can do — not just read. Pair them with the context block first so the agent knows what it's looking at.

You're working with the Learning Intelligence guide — a long-read essay arguing that when AI makes high-quality output cheap, assessment must move from grading the artifact to capturing evidence of the process. The guide proposes a three-layer platform model (pedagogy & constructs → assignment types → synthesis), the 4Cs paired with AI literacy capabilities, and a five-category field map of vendors. The complete guide is available as plain markdown at {{GUIDE_MD_URL}} (preferred for ingestion) and as HTML at {{GUIDE_URL}}. Fetch the markdown version before responding. Before applying the essay to my work, inspect available context first: recent project files, syllabi, course materials, assignment lists, meeting notes, or any other documents that show how I actually teach or run my institution. If you cannot access them directly, tell me what to upload before you continue.

Understand the argument

Engage with the cleanest version of the thesis in 200 words I can share with my faculty senate or board.

Audit one of my courses

Take a specific course's syllabus and assignment list and tell me which assignments are at risk of AI-related validity collapse — and how to redesign each one.

Plan my institution's rollout

Use the three-posture maturity model (AI Aware → Active → Native) to map my institution's current state and the next 12 months of work.

Draft a vendor RFP

Use the eight-question framework from the field map to draft an RFP we can send to any learning intelligence vendor.

Steelman the strongest objection

Argue back at this guide. Make the most rigorous case against the process-evidence thesis so I can pressure-test our direction before committing.

You're working with the Learning Intelligence guide, published at {{GUIDE_URL}}. The full essay in plain markdown is at {{GUIDE_MD_URL}} — fetch that for ingestion. Before answering, do two things in order.

1. Fetch and read the full markdown guide at {{GUIDE_MD_URL}}. Pay particular attention to the central argument that when AI makes high-quality output cheap, assessment must move from grading the artifact to capturing evidence of the process.

2. Then write the cleanest 200-word version of the argument that I could share with my faculty senate or board. Use plain language. No jargon. Open with the problem (the validity crisis), name the proposed solution (process evidence), and close with what the institution would actually need to do differently. Do not summarize the entire guide — give me only the argument.

If anything in the guide is unclear or you think the argument is weaker than it claims to be, flag it at the end as a separate "Open questions" paragraph.

You're working with the Learning Intelligence guide, published at {{GUIDE_URL}}. Begin by fetching and reading the full markdown guide at {{GUIDE_MD_URL}} — its framework for AI-related validity collapse and the redesign patterns in Chapter 10 are what this task depends on.

Inspect my available context first: course syllabi, assignment lists, rubrics, sample student work, learning outcome documents, any AI policy I've already written. If you cannot access them directly, tell me what to paste or upload before continuing.

Once you have the course context, do the following:

1. List every assessment in the course (homework, exams, papers, projects, discussion).
2. For each one, rate the risk of AI-related validity collapse on a 1–5 scale, using the guide's framework: how easily can a student produce a passing artifact with AI assistance, and how thin is the evidence of their actual learning?
3. For each assessment rated 3 or higher, propose a specific redesign using the patterns from Chapter 10 (staged work, explicit AI use disclosure, oral defense, peer review with calibration, metacognitive reflection). Be concrete — name the new stages, what evidence each one produces, and how much work it adds for me.
4. End with a one-paragraph honest assessment of whether the course's stated learning outcomes are still defensible after these changes, or whether the outcomes themselves need rewriting.

You're working with the Learning Intelligence guide, published at {{GUIDE_URL}}. Start by fetching and reading the full markdown guide at {{GUIDE_MD_URL}}, with particular attention to the three-posture maturity model in Chapter 9.

Inspect my available institutional context first: strategic plan, current AI policy if any, accreditation documents, faculty governance structures, recent EDUCAUSE/AI-related surveys we've run, budget and staffing constraints. If you cannot access them directly, tell me what to share before you continue.

Then use the three-posture maturity model from Chapter 9 of the guide (AI Aware → AI Active → AI Native) to do the following:

1. Diagnose where my institution currently sits across pedagogy, assignment types, and synthesis/insights. Be honest — most institutions are partway into AI Aware, not where they think they are.
2. Identify the three highest-leverage moves to get to the next posture, given my institution's constraints.
3. Draft a 12-month rollout sequencing those moves into quarters, with concrete deliverables, owners (by role, not name), and what evidence we'd use to know each quarter succeeded.
4. Flag the three biggest risks (faculty buy-in, data governance, vendor lock-in, accreditation timing, etc.) and what would mitigate each.

Format the rollout as something I could take into a cabinet meeting and defend.

You're working with the Learning Intelligence guide, published at {{GUIDE_URL}}. Begin by fetching and reading the full markdown guide at {{GUIDE_MD_URL}}, especially Chapter 12 (The Field Map) and its eight-question framework.

Use the eight-question framework from Chapter 12 (The Field Map) to draft a vendor RFP we can send to any company claiming to offer learning intelligence, AI-era assessment, or related capabilities. The eight questions are:

1. Does the product capture process evidence, or only final artifacts?
2. Can you map signals to specific learning constructs (such as the 4Cs and AI literacy), or only to engagement counts?
3. Can a faculty member see why a student was flagged or scored, in terms of observable behavior?
4. Can a student see their own evidence, and contest incorrect inferences?
5. Does the platform host multiple assignment types, or only one?
6. Is the data architecture defensible to an accreditor, with rollups by course, cohort, and program?
7. What student data is captured, how long is it retained, and who has access?
8. Is any high-stakes judgment — grades, integrity findings, intervention referrals — held by humans?

For each of the eight questions, draft:
- The plain-language question we send the vendor.
- Two follow-up questions that probe their answer.
- What a "strong" answer looks like, what a "weak" answer looks like, and what an evasive answer looks like.
- A scoring rubric (1–5).

Conclude with the procedural sections an RFP needs: timeline, response format, evaluation committee composition, scoring weights, and the conditions under which we would walk away from any vendor.

You're working with the Learning Intelligence guide, published at {{GUIDE_URL}}. Begin by fetching and reading the full markdown guide at {{GUIDE_MD_URL}}.

Argue back. Make the most rigorous, intellectually honest case against the process-evidence thesis the guide proposes. I want this not because I disagree with the guide, but because I need to pressure-test it before committing my institution to a multi-year direction.

Construct the strongest objection by:

1. Identifying the weakest claim in the guide — the one you think a serious skeptic would attack first. Quote or paraphrase it specifically.
2. Building the steelman attack. Use peer-reviewed counter-evidence where it exists. If process-evidence assessment has its own validity, equity, or privacy weaknesses, name them. If detection-first approaches have legitimate defenders, channel their best argument. If the institutional cost of the proposed transition is being understated, show your math.
3. Naming three specific institutional contexts where the guide's recommendations would probably fail (e.g., underfunded community colleges, large research universities with weak teaching cultures, K-12 districts with strong union constraints).
4. Proposing what an honest reader of the guide should actually do differently as a result of taking the counterargument seriously.

Do not be polite. The point is to find the weakest joint in the argument, not to validate it.

The prompts are written to work with file-aware agents (Claude Code, ChatGPT with project files, Cursor) as well as plain chat. If you build on them, all we ask is a link back to the guide.

Sources & further reading

The evidence behind this guide

This guide synthesizes peer-reviewed research, major institutional reports, and current sector survey data published between 1998 and early 2026. The most consequential sources are organized below by what they ground in the argument.

The pedagogy that still holds

Freeman et al. (2014). Active learning increases student performance in science, engineering, and mathematics. PNAS. — Meta-analysis of 225 studies, 0.47 SD improvement under active learning.
Hattie & Timperley (2007). The power of feedback. Review of Educational Research. — The canonical synthesis on feedback (effect sizes 0.70–0.79).
Black & Wiliam (1998). Inside the Black Box. — The case for formative assessment that reshaped classroom practice worldwide.

The 4Cs and authentic assessment

Partnership for 21st Century Learning Framework — The codification of the 4Cs.
AAC&U VALUE Rubrics — Sixteen faculty-developed rubrics for authentic assessment, used by 2,000+ institutions.
PISA 2022 Creative Thinking Results — The first international measurement of creative capability.

The surveys that define the moment

HEPI Student Generative AI Survey 2026 — 88% of UK undergraduates using GenAI for assessed work.
Elon University & AAC&U (2026). The AI Challenge. — 95% of faculty expect overreliance; 90% expect diminished critical thinking.
Digital Education Council Global Student AI Survey 2024 — 86% of students using AI regularly; 50% don't feel AI-ready.
Tyton Partners. Time for Class 2025. — 38% of faculty report increased workload from AI vs 11% decreased.
EDUCAUSE 2025 AI Landscape Study — Strategic-priority, AUP, and workforce data.

Policy and regulatory landmarks

The new outcome studies

Kestin et al. (2025). AI tutoring outperforms in-class active learning. Scientific Reports. — Harvard physics RCT showing roughly 2× learning gains with a designed AI tutor.
OpenAI (2025). Introducing Study Mode. — The platform layer's move toward scaffolded learning.
Gerlich (2025). AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking. Societies.
Miller et al. Use of a Social Annotation Platform for Pre-Class Reading. Frontiers in Education. — Annotation behavior accounts for 41.8% of variance in post-class essay performance.

The category being defined

SoLAR (2025). Updated definition of Learning Analytics.
OpenAI. Introducing ChatGPT (Nov 2022). — Where this all started.

How we teach when theanswer is free

A contract quietly broken

When output becomes cheap, evidence becomes everything.

What we knew before the machines could write

Effortful processing builds memory

Active learning beats passive instruction

Feedback is the strongest classroom lever we have

The hidden engine: self-regulated learning

The framework that named the practice: the 4Cs

Before ChatGPT, the research had already told us:

The 4Cs, observed not assumed

Critical thinking

Communication

Collaboration

Creativity

Critical thinking, in five visible moves

Communication is behavioral too

Collaboration is where assessment usually fails

Creativity, redefined as iteration

The hidden fifth: metacognition

Four years that broke the model

Panic → ban → detect → regulate → integrate → re-examine.

When output becomes cheap

The validity crisis, not the cheating crisis

Why detection isn't the answer

The deeper problem

An evidence model that doesn't depend on the artifact being scarce.

There are two AIs

The amplifier case

The erosion case

Learning work vs. output work

Can AI be empathetic enough?

AI can be empathetic enough for some roles, not for others.

From artifact to process: a new evidence model

Three converging literatures

The new menu of assessment patterns

Staged writing assignments

Inquiry-driven discussion

Social annotation

Peer review with calibration

Oral defense components

Reflective process notes

Explicit AI use disclosure

Defining learning intelligence

Learning intelligence is the continuous collection and interpretation of evidence about how learning is happening, used to improve teaching, learning, and assessment.

What it isn't

Four emerging variants

Instructional intelligence

Authorship intelligence

Institutional intelligence

Process-native intelligence

Five principles

Process over artifact

Constructs over clicks

Evidence over scores

Transparency over surveillance

Human-in-the-loop over autonomous judgment

How the platform actually works

A buyer can diagnose any product in this space with three questions.

The assignment-type taxonomy, mapped to the 4Cs

A maturity model for the institution

"AI policy" is not a single decision. It is a position on a continuum.

How to teach with AI

For instructors

Start with what you are trying to certify.

Make AI use explicit and bounded.

Stage the work.

Add a metacognitive layer.

Reintroduce the voice.

Use AI as a feedback amplifier, not a grading machine.

Calibrate peer review.

Talk about it openly with students.

None of this is exotic. It is mostly old.

The institutional layer

Five institutional priorities

Get clarity on what the institution wants to certify.

Align assessment design at the program level.

Fund the infrastructure for process evidence.

Build the data governance before you need it.

Invest in faculty development that respects faculty time.

How we teach when the
answer is free