Journal and Publisher AI Policy Statements
James P. Purdy
Understanding Generative AI
Understanding the technology of generative AI like ChatGPT helps us recognize its affordances and constraints and the resultant ways in which it shapes writing process and product. ChatGPT, short for Generative Pretrained Transformer, is a large language model (LLM), a type of generative AI that predicts and generates language based on its reading of a massive corpus of texts. An LLM can be used to summarize content in its corpus, generate new content based on its corpus, and translate into other languages content in its corpus. Chatbot LLMs like ChatGPT can “converse” with users by responding to their questions or instructions (Kerner). ChatGPT-3, the free version available at the time of this writing, was trained on a corpus comprising 45 terabytes of data, including multiple datasets as outlined in Table 1. The exact composition of these datasets is unknown.
Dataset | Definition | Percentage of Training Corpus |
---|---|---|
Common Crawl | Lightly filtered raw web page data, metadata, and text gathered over 8 years of web crawling | 60% |
WebText2 | Text from all outbound Reddit links from posts with 3 or more upvotes | 22% |
Books1 | An online corpus of books | 8% |
Books2 | An online corpus of books | 8% |
Wikipedia | Articles from the English language version | 3% |
This expansive corpus results in responses that are sometimes remarkably cogent, informative, and even well written. However, ChatGPT is only as good as the texts in its corpus. Using this corpus as the basis for training can cause several writing problems. First, some of this text is offensive, biased, and flat out inaccurate. Think about the outcry over offensive Reddit content (Hussain, n.d.; Mak, 2018) and the skewed demographic representation of Wikipedia contributors (Bear and Collier, 2016; Gruwell, 2015; Lam et al., 2011). Second, ChatGPT heightens DEI (diversity, equity, and inclusion) concerns that have been made more visible in higher education in the last several years. ChatGPT privileges language that follows the rules of its corpus, that is, Westernized English and homogenized language without “accents.” Moreover, the most sophisticated version of ChatGPT as of the time of this writing, ChatGPT-4, requires payment so is available only to those who can afford to buy it. Those who pay more can get better (i.e., more human sounding) writing. Third, ChatGPT contributes to the “fake news” misinformation culture increasingly prevalent in the last 10 years. It confidently circulates misinformation and replicates biased views. This is not to say that ChatGPT cannot be used productively for writing or writing instruction. It is to say that ChatGPT, like all writing tools, is best used with careful consideration of its limitations. The journal and publisher policies examined for this study seek to outline and ask authors to work within those limitations.
Because computers are built to handle math, generative AI like ChatGPT work by turning language into math. In particular, they attend to probability. As such, generative AI treat humans as pattern producing machines. In the corpus from which ChatGPT learned, for instance, certain words are more likely to follow from and be grouped with other words. Thus, the answers it returns reflect these associations. Words that appear less frequently in the corpus yield poorer predictions than words used more often. ChatGPT generates language; it does not generate new knowledge. At its most fundamental level, it (re)arranges words that appear in its corpus. But it can do so very well.
AI’s role in writing is nothing new. Even before chatbots like ChatGPT, AI began to play an increasingly prevalent role in our writing activities. During drafting, for instance, predictive text in word processing programs like Microsoft Word, email programs like Outlook, and texting software and apps suggest what words come next. As I write this chapter, for instance, Word predicts what word I am typing and what words could follow. It gives me the opportunity to push Tab to accept these suggestions. GenText AI also now offers an advanced Microsoft Word add-in that automatically generates, summarizes, and proofreads text. Moreover, once a text is drafted, programs like Grammarly offer automated writing corrections and suggestions based on acontextual grammatical rules and word associations. And for years the squiggly colored lines of Word and other word processing programs have alerted writers to potential spelling and grammar errors and offered corrections at the click of a mouse. In this way, proofreading and editing have increasingly become the purview of AI. But with generative AI like ChatGPT, this influence has expanded to shape activities of writing processes prior to proofreading and editing, including invention, research, and drafting. This earlier intervention is what fuels much of the concern about ChatGPT. While textual arrangement and delivery have for years been outsourced, invention rarely has—until now.