By: Michael Gay

LLMs (large language models) are the foundational technology behind the AI models currently taking over the news cycle. After seeing the headlines about how ChatGPT can now pass the SAT, GRE, bar, and U.S. Medical Licensing exams; answer questions that seemingly require it understand the physical world; and even supposedly generate code for polymorphic malware that can evade AV and EDR solutions, I’ll admit that I was beginning to buy into the hype. It seemed like a golden age of AI might really be upon us—disruptive, glorious, and horrible.

Many of the biggest names in tech are rushing to integrate an LLM into any and every product it can be stuffed into, while curiously and simultaneously claiming they don’t really know how they work. The supposed human-like capabilities that LLMs are reported to possess have a huge number of people feeling as though their jobs are at risk, while just as many people are flooding YouTube, blog posts, and social media with get-rich-quick schemes tied to their use.

As a naturally curious person with a special interest in AI, I’ve spent a good chunk of my free time over the last several weeks obsessively watching videos, listening to podcasts, and reading news about the subject. I’ve also personally tinkered with a handful of AI systems myself to get a feeling for what they’re really capable of.

My experiences so far have alarmed me, but not at all in the way I expected.

Something worse than a con artist.

Whether I was using ChatGPT, ChatGPT Plus, Bard, or a locally hosted 13-billion parameter Vicuna model, it didn’t matter. Each one of them convincingly and confidently told me things that were demonstrably false and did so very often.

For all but the simplest of coding tasks, they spit out syntactically correct, but semantically nonsense code. Subsequently, this requires a real human with real coding skills to identify its mistakes and re-write the results to make something functional. They cite research papers, legal cases, and other sources of information that don’t exist, which has recently landed at least one lawyer in hot water before a federal judge after he relied on ChatGPT to do his case research for him. They “helpfully” recommend calorie restriction diets for people suffering from anorexia and bulimia, graduating from being simply wrong to producing dangerous misinformation.

Even ChatGPT Plus, which can access the internet when requested to look up information it wasn’t trained on, didn’t fare much better than free alternatives. During a live demonstration of how to use it to compare the features and prices of various products, my colleagues and I watched as it gathered data from various websites, and then seemingly disregarded the pricing information on the pages it accessed and made-up the information instead.

Piles of blog posts and influencers might have you believe that if you get a nonsense answer out of an LLM, that it’s a rare example of an AI “hallucination”, or that you simply haven’t gotten the hang of “prompt engineering” yet. It’s hard to decide whether statements like this are intentional misinformation, or the result of a lack of understanding regarding how LLMs work.

Personally, interacting with these large language models has felt a lot like conversing with a con artist or a pathological liar, only worse because it’s unreasonable to be angry at them when they spout nonsense. These systems cannot intentionally lie, and surely don’t “hallucinate”. Believing that they can do either is an example of anthropomorphizing LLMs; assigning them human-like qualities that they simply do not have.

“Stochastic Parrots”

Though she likely did not coin the term, the AI ethicist Timnit Gebru (who made the news after leaving Google under contentious circumstances) gave the most apt description of these systems I’ve seen so far. She referred to LLMs as “stochastic parrots”.

“Stochastic” because there’s an element of randomness to their output that makes them seem more human, and “parrots” because given some context, they mimic the patterns of text that they were trained on without any understanding of the meaning behind the words they generate. It’s this lack of real intelligence and genuine comprehension that can lead to serious consequences when people rely on them for important tasks or decision-making.

I’m very worried for the future of companies that are rushing to integrate these current-generation LLMs into everything. How could you possibly trust what they tell you after using one for any appreciable length of time?

Even for this very blog post I asked ChatGPT to answer a question: “who coined the phrase stochastic parrots?” It helpfully informed me that the phrase was coined by Rodney Brooks, an influential roboticist and co-founder of iRobot and Rethink Robotics. He supposedly used the term in a blog post titled “The Seven Deadly Sins of Predicting the Future of AI,” published in 2017. The funny thing is, I looked up the essay, and did a simple CTRL+F search through its contents. Neither the word “stochastic” nor “parrot” show up anywhere in the piece. You can check it out for yourself here: https://rodneybrooks.com/the-seven-deadly-sins-of-predicting-the-future-of-ai/

Whether or not more training and/or bigger models with more parameters will make LLMs more trustworthy remains to be seen. For now, maybe don’t trust your LLM yet.