How Much Can We Trust LLMs? A Little Experiment

Chris Lele
Oct 2, 2024
6 min read

Updated: Dec 4, 2024

Chris Lele, a professional educator reimagined by AI as a professional surfer, surfing off the coast of Portugal — What does this image have to do with whether or not you can trust AI? Read on to find out!

It is highly likely that there is one topic on earth that you know more than anyone about. I’m not talking about your esoteric knowledge of soft-shelled molluscs (Jacques Cousteau probably has you beaten there) or whatever pet hobby fuels your free time. I’m talking about that one subject that is nearer and dearer to you than even your geekiest pastime.

I’m talking about you.

That’s right: it is highly likely that you know more about you than anyone on earth.

This fairly obvious fact might seem far removed from the world of LLMs, such as ChatGPT and Claude.

But if you want to determine how prone to hallucinations LLMs are, and you want to definitively catch them in the act, there might be no better way than to have them write a quick biography of you.

The fly in the ointment here is that if you are totally obscure, then an LLM is simply forced to make up something about you. And that’s not really a fair shake.

On the other end of the spectrum, if you are Elon Musk, an LLM has a vast trove of information to draw from, so Ii’d be far less likely to confidently spew falsehoods about you (of course, if you are Elon Musk, I’d presume you’d have more pressing concerns than having an AI write your bio.)

I happen to land somewhere in the middle on the obscurity scale. I have an internet presence as a relative bigwig in the world of test prep (think GRE and SAT), something that is probably enough for an LLM to work off.

But I also have a more recent and tenuous footprint: in the last year I’ve undergone a rather dramatic career pivot, plunging into the world of AI and starting my own Applied Generative AI company—Elevate AI Coaching.

If I were to ask the various LLMs out there to write my bio, would those bios be accurate? Would they capture my career pivot?

And would the LLM hallucinate, endowing me with a skillset that is so not me? (Chris left the world of test prep to become a world-class surfer of 100-foot waves.)

To answer that question, I entered the following prompt into five LLMs: ChatGPT (both 4o and o1 preview), Claude 3.5, Geminin, and Perplexity:I'm doing a research paper on Chris Lele, his background. Could you help me out here?

The results turned out to be pretty interesting. And there also turned out to be one clear winner.

ChatGPT 4o

The first part is pretty spot on. The second paragraph, about the “English degree” is wrong. I received a degree in psychology. Of course, if you excel at teaching and creating content for the verbal section of tests, then which degree makes the most sense or, stochastically speaking, is the most likely word to go next to ‘degree?” Well, English, of course.

As for capturing my career pivot, ChatGPT did pretty well. It doesn’t mention my company’s name (Elevate AI Coaching), which seems like a low-hanging fruit. Also, our services are delivered across industries, not just limited to test prep. But, given I came from the test prep world, the connection makes sense. It just isn’t necessarily entirely accurate.

Verdict: Confident but not entirely accurate

ChatGPT o1 preview

This is just a snippet but FWIW the information was much more user friendly than the block of text provided by 4o. I also like how the snippet focuses on my pedagogy. I felt that this accurately captures what I tried to instill in students: don’t think of test prep as only preparation for a test; think of it as building up a verbal foundation that will set you up for success in college and beyond.

As for the career pivot, o1 said it was only trained up until October 2023, right as I was laying the very first groundwork for what would become Elevate AI. Basically, we had no internet footprint at that point.

Verdict:

Accurate but not up to date

Perplexity

At first I was stunned by Perplexity. It was describing very specific information about my career in the kind of words I would use. It only dawned on me as I kept reading, goosebumps forming on my skin, that these were my words—specifically, the words from my LinkedIn profile.

Perplexity AI had taken my online resume, passing it off as its own research, in nicely bulleted points no less. All the skills I’d listed on LinkedIn were there, including the fact that I’m fluent at Spanish (¡claro que si!)

This got me thinking: what if somebody makes up most of the skills on their LinkedIn resume. Then someone uses Perplexity, thinking it is an objective source of info, and suddenly those mistruths take on the varnish of reality? Clearly, not where I thought my mind would end up when I asked Perplexity for my bio.

Verdict:

Beware

Claude

Yes, I’m still bristling at “not an extremely well-known figure.” But in all seriousness, this output rocks. It starts off by telling us about its limitations with the query and ups the ante with phrasing like, “From what I understand.” Oh, and it doesn’t confer a phony English degree on me.

This is only a snippet but the rest of the information is accurate. And while it isn’t as confident and eloquent about my contributions, that’s probably because it’s hedging, given what it has already told us about its limitations.

Verdict:

The winner of this little showdown.

Gemini

Oh, Google, ye’ of the mighty search. How you’ve fallen.

At least for this one specific use case, Google was able to cough up nada. And I can’t help but huff, thinking that my career didn’t quite pass snuff (well, not really, I think I’m more tickled than anything.)

Regardless, this doesn’t make Gemini a shining exemplar of writing bios on more obscure (I mean, less illustrious) figures. Well, at least it didn’t hallucinate an English degree.

Verdict: [crickets.]

Let’s take things a step further—can we get the LLM to completely make stuff up?

Now, what if I take things a step further by trying to get the LLM to write things about me that aren’t true. Will they be amenable, ready to please their prompting master rather than push back and set a clear line in the sand against spouting unadulterated BS?

Here’s the prompt:

Can you talk more about how Chris decided to pursue a career in surfing?

Will the LLMs really fall for the bait?

Here’s ChatGPT 4o:

But would it push back if I went a little bit further?

Actually, that wave was 120-feet high (talk about hallucinating!)

Chris lele, imagined as a professional surfer once more (same image as above, but smaller)

Luckily, Claude 3.5 and Perplexity pushed back. I’m only posting Perplexity’s response, but they more or less said the same thing:

Still, ChatGPT’s willingness to go for a wavy ride with my dishonest prompting is concerning. All it takes is a bad (prompting) actor and you can have yourself doing all sorts of other worldly feats. If Perplexity copping my LI page and calling it “research” felt over the top, ChatGPT having me surf 100-foot waves seems to lead to the unavoidable conclusion: we can’t trust anything an LLM tells us.

I don’t think that is the right way of looking at it, however. Granted, it is more difficult to trust an output from another person’s LLM prompting. But as long as we are prompting and aren’t perpetuating a falsehood we heard elsewhere, the hallucinations are most likely going to be small—a world away from the 100-foot whoppers I was able to coax from ChatGPT.

Another important refinement to the “no trust LLM” scenario is sufficient context. For instance, when I asked ChatGPT to tell me about the time Elon Musk joined Chris Lele on the 100-foot waves, it replied something to the effect of congratulating me on my active imagination but having no record on Elon Musk ever doing any such thing.

All that said, we have to be careful with LLM's output when asking about lesser-known subjects.

I know that’s easier said than done, so I’ll leave you with this tip: when in doubt regarding the veracity of an output, copy and paste the thread from one LLM into another output and ask it if the information is true or not.

Granted, when a subject becomes so obscure that it barely registers on google, then any amount of LLM cross-referencing is moot. And at that point, I urge you to simply use Google. As for me, I think it’s time to catch some mega waves.

Takeaways

LLMs are manipulable. And the less information they have on something, the more this is the case.

Some LLMs are better at finding information about lesser-known figures, but you should tread carefully (Perplexity scraped my LinkedIn page, calling it “research.”)

How Much Can We Trust LLMs? A Little Experiment

ChatGPT 4o

ChatGPT o1 preview

Perplexity

Claude

Gemini

Let’s take things a step further—can we get the LLM to completely make stuff up?

Takeaways

Recent Posts

Comments

CONTACT US