VOICES: The trouble with ChatGPT

I’m a cognitive psychologist at Wittenberg. People like me study perception, memory, language, and reasoning, and whether computers are good models of human thought. In my teaching, I hope a clear understanding of how ChatGPT and other Large Language Models (LLMs) work will help students decide if they want to use them.

When a student types into ChatGPT a prompt such as “Write a short essay summarizing themes in Moby Dick,” output text is generated by examining connections between the words used to describe ideas, places and people found in the millions of articles, books and pieces of internet writing which are used as training data for LLMs. As so much has been written about Moby Dick, there will be many common – and simplistic – observations in the writing that an LLM has available and “chooses” from.

LLM-generated outputs are thus vague and easy to spot.

Good writers can create highly specific prompts and edit outputs to make them better. But because few students come to college as good writers, ChatGPT-written work often doesn’t meet requirements for college-level writing. How can students use these tools when they don’t have good writing skills at the start? Use of LLMs increases skills in those who already have some ability and can’t help those with less well-developed writing; the haves get better, and the have-nots don’t.

LLMs make connections that are probable given their training data. Because LLMs make probabilistic connections among concepts that may be correctly and incorrectly related by authors, errors are inevitable; LLMs hallucinate. I assign students to ask ChatGPT for a plot summary of their favorite movie and they find mistakes easily. If student use a tool that can be wrong – in an area they don’t know well - how can they correct their work? Students with more initial knowledge will be better able to find and correct errors. The haves will get better, and the have-nots will get worse.

Psychologists have found in many studies that when people write about women scientists, they use terms emphasizing effort and teamwork and de-emphasize intellectual brilliance. Conversely, descriptions of men highlight brilliance – often despite a lack of teamwork. LLMs reproduce these stereotypes even if they were unintentional when first written. The outputs show algorithmic bias, one reason for suggestions that LLMs are unethical to use.

An aspect of making LLM products ready-for-market is to have humans check for problems.

Because training data sometimes contains vulgar or explicitly violent content, LLM outputs necessarily reflect these tendencies. In many so-called developing countries, human output-checkers evaluate utterly revolting LLM outputs for a few dollars a day and suffer mental health consequences as a result (extensively researched in Kenya.) This is another reason behind suggestions that LLMs are unethical to use.

I hope readers will consider that LLMs are not like existing tools such as calculators. Mathematics may be applied in racist or sexist ways, but math calculations don’t contribute to these divisions. LLMs do. Spelling may be used to label someone as “smart” or not, but the way a spell-checker works doesn’t reinforce societal biases. LLMs do. When we learn that LLMs are different than other tools, produce poor quality output that can be wrong, and are unethical to use, it seems inappropriate to label someone with such concerns as under the influence of a moral panic. I don’t ban the use of LLMs in classes. I teach students how they work, and we discuss the many reasons why they most likely don’t want to use them.

Michael D. Anes, Ph.D. is an Associate Professor and Chair of Psychology at Wittenberg University.

About the Author