AI assisted plagiarism in the humanities

Like many universities, during this summer assessment season we have had a number of students using AI generated text in their essays in ways which constitute academic misconduct under our existing rules. In my discipline, and I believe in other humanities disciplines, these have been pretty easy to spot. Of course, it is possible that we missed at least as many as we caught, but when I think about why they were so easy to spot, I am inclined to think we are pretty reliable.

There are some really obvious tells, like made up references1 or - my favourite example shared with me from another institution - including a sentence starting "In my experience as an artificially-intelligent chatbot...", but we generally spot something is fishy way before that.

Current Practice

Academics in my department read >1,000,000 words of student work every year, most of it in their specialist research field. They also spend hundreds of hours every year talking to students about that same material. As a result, they are very familiar with the ways students understand, misunderstand, explore and question. Yes, we are sometimes delightfully surprised by original and innovative work, but even that follows a pattern because the innovation is a matter of building upon the same work that the other students have grasped.

This extensive experience of how students think - not in general but specifically about the things we are teaching them - makes us pretty good pattern detectors for the property 'not written by a student' when it appears in an assessment that we have designed. We can usually spot this at the sentence level, identifying candidates for unattributed quotations (the most common form of plagiarism being cutting and pasting bits and pieces from different internet sources and stitching them together with a few student-constructed sentences).

AI plagiarism detectors

I imagine there are some AI developers who might read the last paragraph and think: Great! We could train an LLM to do that and sell it to universities as a plagiarism detection tool! And I am sure they could train an LLM and dupe university learning technologists into paying for the service, but it would be futile. We have to read the work anyway in order to mark it, so we already have our 'wetware' pattern detectors running. Perhaps, like traditional plagiarism detection software, it would serve as a way of drawing attention to possible cases we have missed, but the added value is small. And if you can train an LLM to detect that, you could also presumably train one to 'write like a student'2 and so the arms race begins.

However, thinking more about this I concluded that no general purpose generative AI is ever going to be an effective cheat - as in one that produces work which does not arouse suspicions - and what would be needed to create a special purpose one to do the job is too expensive relative to its usefulness. The tl;dr is:

General purpose LLMs are trained on too big a dataset.

How does humanities teaching work in real life?

In humanities, teaching consists of supervised learning with multiple iterations ('seminars', ‘tutorials’) on a highly curated dataset (‘lectures’, ‘reading lists') with the intention of teaching some very specific concepts and ideas in an equally specific intellectual context of that university, that degree programme and those students’ prior attainment.

Suppose I am teaching a module on X. Because in the UK formal learning outcomes are published well in advance and are tedious to change, they tend to be quite general, along the lines of 'Critically evaluate debates about the nature of X'. So when I sit down to plan my teaching for the X-module, I have a great deal of choice. I will have familiarity with a lot of the X-literature and may start reading the most recent work. There will be more such literature than any student could possibly be expected to read by around two or three orders of magnitude, so I have to select which bits of that literature I want to focus on, which key texts, which recommended texts etc. When making that selection, I will consider a few things:

  1. What I think is intellectually interesting/fruitful in the X-literature
  2. What I can presuppose the students already know about X
  3. Which aspects of X it will benefit them most (in their studies and careers) to have learned about.

I then create a series of lectures, seminars and readings to achieve these goals. But they are not tightly scripted and I will adapt as I deliver the module across the semester, digressing in lectures, cutting things that turn out to be unhelpful, adding new recommended readings etc. That will be constrained, because at the point of designing the syllabus, I will have also designed the assessment, which I cannot later change.

AI cheats?

We need to distinguish the question of whether an AI could replace the academic in the teaching process3 with the question I am interested in here: Could an AI produce outputs which would do well in the assessments for such a module? That is, could an AI provide an effective method for my students to cheat?

To that question, I suspect the answer will be affirmative. If the AI was trained first on the typical student's pre-attainment and then on the restricted dataset for my X-module (lecture recording, tagged seminar recordings, material in reading lists etc.), then it would likely produce something not too dissimilar to one of the typical students on the module. But the cost of that training would be far too high to make it a vaguely sensible proposition for cheating on a single iteration of one module in a degree which requires performance on a dozen or more modules.

The whole point of cheating is that it has to be easier than actually doing the work. Buying an essay from an essay mill is obviously easier for the student, but only because the price is low (assuming normal student finances). The price is low because it was less effort for the person who wrote the essay to do it than it was for the student to do the work. And the more customised the output from the essay mill is, the more expensive it will be.

What makes cheating one way or another feasible, and not merely just possible, is as much economic as technical. The reason I think we will continue to be able to detect AI generated plagiarism by having experience academic read the work is that it will never4 be economically feasible to to train an AI which will produce what we are expecting. Consequently, cheating students will only have access to general purpose LLMs and our 'not produced by a student on this module' detectors will kick in with reasonable reliability. Cheats will continue to get caught.


  1. Often very 'cleverly' done, such as the one that made up an entry in an online encyclopaedia of the discipline and also gave a URL in the correct format for that encyclopaedia. 

  2. Or write like an AI thinks a student writes. 

  3. If you were expecting me to drop in a subtle but surprisingly good reason why this couldn't happen, you will now be disappointed. However, I have started to think through what would be needed for it to happen - but only just started. 

  4. The actual costs, including energy and externalities, of training existing LLMs is not public knowledge, but it is obviously huge. The companies making these investments are convinced they will recoup the money and make large profits at some unspecified future point, but you can be pretty sure that income from cheating students is not even worth a mention in the business plans. 


You'll only receive email when they publish something new.

More from Tom Stoneham
All posts