ChatGPT and the AI Writing Arms Race
Upon witnessing the streams of dead, sick, and wounded returning from Civil War battlefields, inventor Richard Gatling wondered if there might be a way to “enable one man to do as much battle duty as a hundred.” If aspects of the war campaign could be automated, perhaps the human toll would be dramatically reduced? His inspiration led to the invention of the Gatling gun. Mounted on wheels, a manual hand crank rotated its multiple barrels, rapidly loading and detonating each new cartridge. The Gatling gun was a massive force multiplier on the battlefield, capable of a firing rate a hundred times faster than that of regular infantry. Previously, even a disciplined unit, where soldiers maintained formation and fired and reloaded at alternating times, ran the risk of being overwhelmed by cavalry or an infantry charge. Now, a couple of Gatling guns were more than capable of defeating forces magnitudes larger. Although its initial release saw limited use in the Civil War, the Gatling gun was later used to devastating effect in colonial campaigns across the world, forever changing the way wars are fought.
Now imagine for a moment if an opposing soldier had approached his general and said: “We know we are out-gunned, but I happen to have a special falcon that I have trained to spot any Gatling guns within ten miles of our camp. So long as we remain separated by rough terrain that cannot be traversed by wheels, we’ll be safe.” Our imagined falcon would undoubtedly be an amazing military intelligence tool, changing the course of wars. However, its usefulness would be short-lived. Within years, entrepreneurs improved upon Gatling’s invention. They developed fully automatic weapons which no longer needed a hand crank, and the gun’s size was reduced such that it could be carried by soldiers, no longer limited by the mobility restrictions of the wheels on a cart. Our Gatling scouting falcon would be outstanding at countering the specific threat of the Gatling gun, but useless at countering the wave of tactical and technical innovation that would follow.
The Threat of ChatGPT
Launched in late 2022, ChatGPT acquired its first million users within five days. An AI-powered text generation tool, it can respond to a range of prompts both general and specific. It maintains context, allowing subsequent prompts to reflect upon and refine previous responses. It can generate, explain and even debug code. In ten seconds one can generate an essay on any topic, which for now includes a large set of global knowledge up to 2021. ChatGPT immediately raised alarm bells about the future of the college research essay. How can we maintain academic integrity if we can’t determine if an essay was written by a student or AI?
In order to best understand the threat posed by ChatGPT and some of the available counter-measures, we might start by examining why students currently cheat, and what prevents more students from cheating on essay assignments. Some students cheat because they want a better grade than they could achieve by themselves. Right now, ChatGPT isn’t going to churn out an A+ essay if you are currently a B student. The quality is not yet there, but it might be soon. ChatGPT is built upon the GPT-3 large language model. Presumably, sometime in 2023, we’ll see the release of GPT-4, a larger more sophisticated AI model. We don’t know for sure just how much better it will be, but it’s reasonable to assume that it will be noticeably better at fooling us. Alternatively, many students cheat not to boost their grades, but simply as a shortcut. It takes significant time and effort to research and write an essay. If a student is looking for an essay that is ‘good enough’ without having to put in the effort, there are numerous online marketplaces where essay writing services may be obtained. This is already a multimillion-dollar business, and the providers usually guarantee that the final product will pass college plagiarism detectors. AI tooling will change the cheater’s market in multiple ways. Using an essay marketplace feels like cheating. Any students using so-called ‘contract cheating’ services are under no illusions as to what they are doing. But using an AI tool to generate an essay feels different. The student inputs and refines the prompts. It’s a shortcut, sure. But to many, it won’t carry the same stigma as traditional cheating. It will also be much faster and cheaper than traditional cheating. Additionally, current providers of essay writing services will themselves use ChatGPT to augment their work, making even traditional contract cheating more affordable and responsive. We don’t need to have read many economics papers or had ChatGPT summarize them for us, to speculate that as the time, financial, and even the emotional cost of cheating declines, we should expect to see an increase in overall cheating.
As soon as the threat to academic integrity became obvious, concerned individuals offered potential solutions. One brute force method would be to mandate that any content generated by ChatGPT gets posted to some publicly available database. That way, anyone could check if what they are reading was generated by ChatGPT. While technically feasible, this is an impossibility for economic reasons. First, the prompts and responses are valuable data to OpenAI, the company that developed ChatGPT. It’s the same reason you can’t download a database of Google search terms. The data itself is extremely valuable. Secondly, the lack of privacy makes ChatGPT less appealing to users. Even if it were anonymous, users don’t want their content to be public.
So what about a more sophisticated method that preserved privacy but still left open a way to determine if the text was AI generated? Suppose that instead of recording all content, ChatGPT recorded a hash of that content. This is similar to the way that well-designed systems don’t store our passwords directly, but instead, a hash is generated. No one can decode the hash back to its original password, but when a user enters a password the system can check if the hashes match and thus authenticate the user. Perhaps OpenAI provides some way to take any content, apply their hash, and see if it matches any of the hashes of the ChatGPT-generated content. Unfortunately, the nature of such hashes is that they would change once the user alters even a single character. Cheaters would quickly learn that they need only slightly alter what they receive from ChatGPT before submitting their assignments.
This all pushes us in the direction that in order to determine whether the text was generated by AI, we need to understand something fundamental about the content that goes beyond an individual word choice or paragraph construction. That hard-to-define characteristic that we recognize when reading something that just sounds a little too robotic. GPTZero is an early attempt at identifying AI-generated content in a programmatic way. It’s a program that uses measures of complexity and language patterns, known as ‘perplexity’ and ‘burstiness,’ to determine the likelihood that text was written by a human. It’s knowingly imperfect, but as a first attempt, pretty clever. It’s also important to point out that OpenAI is aware that they have unleashed this problem, and will soon release what they are describing as a ‘fingerprinting’ tool, whereby text generated from ChatGPT will carry a certain algorithmic fingerprint. At some point in the future, for a small fee no doubt, institutions will be able to check essays as to whether they carry such a fingerprint.
Let us assume for a moment that OpenAI fingerprinting and tools like GPTZero are successful, meaning that they are
The Future of Academic Integrity
If AI detection software is destined to lose the arms race to cheaters, what then becomes of the college essay? The college essay can live on, but it requires a shift from focusing on examining the essay's content for clues as to its authenticity. Colleges have wanted students to learn specific content, such as “Events contributing to the French Revolution,” but also to learn how to learn. A submitted essay, and its accompanying citations, served as proof that the student had learned the material and knew how to learn. Cheating has always existed, with its methods and countermeasures evolving over hundreds of years. The internet made it much easier to find someone to write an essay for you. Plagiarism detection services then made it much harder for essay content to be reused. What has remained unquestioned is the conception that the essay itself is evidence that the student absorbed both the content and the methodology. A consequence of AI writing is that the essay document is no longer sufficient proof that the student did the work. To borrow a phrase from another discipline, we need a different kind of ‘proof of work.’
The solution to this problem is an observation that goes beyond examining the final artifact of the submitted essay. No, we don’t need clandestine spyware, NSA snooping and the data gathering of social media companies, which have rightly earned surveillance a negative connotation. What we mean here is simply an observation that is systematic. We already do this in academic settings without thinking about it. Proctored exams are observed by a neutral proctor who authenticates the identity of test takers and supervises the integrity of the exam environment. Covid-19 saw an increase in the number of exams moving to an online format, which then necessitated further supervision in the form of webcams, screen locking, and even microphone recordings. Of course, we don’t want to record a student’s screen for a whole semester, but we do want some kind of proof of work when they submit their term paper.
Supervision mechanisms can feel intrusive. They presume distrust and are designed to prevent students from doing the wrong thing. On the other hand, tools like PowerNotes provide the kind of observation needed to verify authorship, all while helping students learn and work the right way. Students use PowerNotes to quote, annotate and organize the research that forms the basis of their essay. The work that precedes the production of the essay is documented by the system, but in a way that streamlines rather than intrudes on the learning process. This yields a number of benefits for both students and teachers, beyond simply improving a student’s workflow. First, by expanding the method of authentication to include the research process, rather than the essay alone, we remove the dangers of false positives when identifying cheaters based on content. There will no doubt be systems that are very good at identifying AI-generated content. However, there will always be false positives, either due to system limits, or because some students happen to write in a style that has AI-like characteristics. If we are able to examine the essay writing process, rather than just the final artifact, then both students and teachers are protected.
A larger benefit of adding observability to the essay writing process is that it provides the opportunity for students and teachers to make the process more responsive. Teachers gain visibility into a student’s progress well before submission time. A simple dashboard enables a professor to identify students who have yet to start an assignment, despite a looming deadline, and proactively reach out. To be sure, this also happens to be a mechanism for preserving academic integrity, as students who are struggling or leaving term papers to the last minute are the most likely to consider cheating. But first and foremost, it's a method for promoting better educational outcomes by identifying students in need of support.
AI writing is here to stay, and institutions are right to be concerned about the risks of cheating. By shifting the focus of our tooling from merely analyzing the final product, to streamlining and observing the research and writing process itself, we can both improve learning outcomes and preserve academic integrity.