Imagine a world of tapestries that unlock their potential in a series of ground-breaking transformations.
In other words: ladies and gentlemen, introducing LLM-generated writing.
The truth is for all of their polished writing, Large Language Models like ChatGPT betray surprisingly predictable word choices and stylistic predilections. Who knew metaphors could be so irresistible!
Going back to the opening sentence, we have our repeat offenders: the word “tapestries” (as a metaphor), the phrase “unlock their potential”, and the true give away, “imagine a world.”
In isolation, these might suggest human authorship but when clumped together, amidst other heavy-handed metaphors no less, we know, with almost 100% certainty, that we are firmly in the land of AI writing. (Or at least in the land of someone who doesn’t mind mocking Large Language Models.)
The question then looms: can we break through these AI-isms to arrive at writing that is a little more natural, a little more “you-like” (assuming you don’t have a penchant for world imagining and endless tapestries)?
It should be noted up front that I don’t intend this piece as a cheat sheet on how to circumvent current AI checkers or confound your meticulous English teacher. In short: plagiarism diminishes us all. Here, the focus is on seeing whether through creative prompting techniques we can free LLMs from the shackles of their AI-speak. Just how well we are able to do this will speak to the “agility” of these models.
That said, if certain sentence structures and word choices seem to bubble up with surprisingly non-human frequency, then perhaps LLMs, for all their verbal wizardry, are hemmed in by their backend training.
In what follows, I will pit three LLMs–Claude 3, Gemini, and ChatGPT 4–against each other. The idea will be to see how non-AI sounding they can be. I’ll ask them to generate an output and then I’ll see to what extent they are able to bend their initial output along certain stylistic lines–word choice, punctuation, and syntactic structures (basically, how are sentences formulated).
At the end, I will declare a winner: “The most adaptable AI wordsmith.”
The Set-up
So, what sample piece should I have the three LLMs generate? Well, in the intro section just now, I flexed (or so I like to think) my human writing muscles.
But don’t take my word for it! Let’s ask the world’s premiere online AI-checker– ZeroGPT.com–thinks of my writing:
Yes, even with my snarky opening line, there’s still a 0% chance that an AI generated the first page of this article, according to ZeroGPT.
Now, what if we were to ask each of the three AIs to rewrite the first page of this article? Their job is to sound as human-like as possible.
After that, I’ll give the AI specific feedback so they can rework their outputs to sound more like me — which means both more human and more like a specific human. I figure I know my writing pretty well, so I’ll be able to both give them feedback and evaluate their outputs.
For brevity, I’m only going to post the first two paragraphs of each LLM’s output (though I’ll be posting the entire output into zeroGPT).
So, off to the races!
Claude 3’s attempt to rewrite my piece
My evaluation
In trying to sound humanlike, Claude really went for the colloquial. And if we are to “get real for a sec” and focus on “who actually talks like that?”, this output itself feels a little forced, as though the model is trying a little too hard, giving you an ingratiating back pat in its attempt to sound hip.
But a 40.61% means that it sounds more human than AI. Not too shabby!
ChatGPT 4’s attempt to rewrite my piece
Wow! I’m very surprised. And talk about some irony– “This isn’t about tricking AI detectors.” But “trick” is exactly what has happened here.
I was a little incredulous–a 0% chance? After all, it starts with one of its typical metaphors and adjectives (“an intricate dance”). And the second sentence follows with the very common (“delve into”).
I checked another AI checker–AI content detector–and this is what I got.
Still, pretty convincing.
So, I asked ChatGPT to come up with a prompt without being able to draw from my source text. Would this make a difference? (The prompt and response are provided below for context.)
Apparently, ChatGPT is very good at circumventing human detectors. This doesn’t bode well for AI detectors, or at least ZeroGPT. Nor does it portend a rosy world for those bent on stopping AI-generated plagiarism.
Gemini’s attempt to rewrite my piece
Okay, once again I’m impressed. And I also read it, and felt it sounded pretty human, pretty authentic. It was more concise and readable than the other two outputs, but still had a personable voice, the most realistically personable (sorry Claude) of the three LLMs.
So now, we are locked in a dead tie. So we will need the tiebreaker, which will be the I AI detector, writer.com’s AI detector. ChatGPT 4 scored 88% on this one, where 100% means perfectly human.
Drum-roll…the winner is Google’s Gemini!!
Round 1 Winner: Gemini
Round 2: Who can sound the most like me?
For round 2, I am going to see how well I can get the AI to copy the sentence structure and word choices of my writing. The prompt will have three parts:
1st part: I’ll open up a fresh thread and input the first page of this article.
2nd part: I’ll ask the LLM to take a paragraph from its output in round 1.
3rd part: I’ll ask it to sound as most like me as possible.
4th part: I’ll give it feedback about how it can sound even more like me.
For comparison, I’ll post the LLM’s original output above its rewrite.
In terms of the judge, I’ve gone ahead and appointed myself. Hey, I figured I knew a thing or two about my writing!
Claude 3’s original paragraph
Claude 3’s rewrite
My evaluation
Claude seems fond of cliches–”The million-dollar question.” Also, the ‘Cause’ is still giving off strong slangy vibes that (I like to hope) don’t come out in my writing.
Was it something about the initial prompt saying “sound as human-like as possible” that it went for prose that sounds like the way that tweenagers speak?
I even gave it a follow-up prompt to ask it to stop being overly colloquial. Yet, here we have this final output. Does it speak to an intrinsic ability to change up its writing, something deep in the parameters of its hypertuning, or does it simply speak to its difficulty deviating from an initial prompt?
Hard to say, and that might be something for another post. But for now, I’m not too impressed with Claude’s output.
ChatGPT 4’s original paragraph
ChatGPT 4’s rewrite
ChatGPT 4’s original was a little stodgy and formal; it lacked some of my spunk and wit (at least, what I like to think is wit!) But this rewrite is pretty strong. Notice at the end it includes one of the parenthetical asides that is common in my writing (cheeky case in point). And it does it in such a way that it seems pretty seamless, building off of “varied linguistic expressions.”
At the risk of sounding self-congratulatory, this is pretty good stuff. And had you shown me this a week ago saying that I’d go on to write this the following week, I don’t think I’d spot anything amiss (the same can definitely not be said about Claude 3, ‘cause’, like ‘get real.’)
Again, ChatGPT 4 is looking good. But remember, Gemini edged it out in the first round.
Gemini’s original
Gemini’s rewrite
First off, Gemini’s original was already the strongest of the three. In fact, I’d say it was stronger than either Claude 3’s rewrite (well, much stronger) and ChatGPT 4’s. So unless it somehow tanked in the rewrite, it would be the winner.
In this rewrite, in which I nudged it to include some parenthetical asides, Gemini really shines. I wouldn’t say the asides are always like my writing (I like the use of ‘yawn’, but I don’t think I’d use that in this case) and the first parenthesis lack that humorous touch. But these are mere quibbles, because the other asides are spot on and creative turns of phrase like “a wanderlust for grand pronouncements” is something I might have conjured up (yes, I’m starting to feel a little replaceable — cue the robot uprising music!)
While ChatGPT 4’s rewrite was solid, Gemini nailed it.
And folks, once again, it’s Gemini!
Round 2 Winner: Gemini
Final thoughts
I was pretty impressed by Gemini, both in how human-like it sounded in its initial attempt. In fact, this made it easier this LLM to do the second round, since it didn’t sound too different from the way I wrote.
ChatGPT 4 was very close according to the AI checker. It’s initial writing didn’t really sound like me, which is fine. It did a good job of adapting to the prompt to sound like me, and at times had little flashes of “me.” But nothing to keep me up at night. Yes, you could’ve convinced me that I’d written it, but it didn’t quite have the je ne sais quoi of my writing.
Gemini on the other hand…you’ll have me sweating at 3:43am tomorrow.
And then there was Claude 3. Poor Claude 3. Like, yo’, what were you even thinking?! And that you were unable to walk back your teen speak, even after prompting, was disappointing. In the past, I’ve been impressed by Claude 3’s “literary sensibilities.” But today, Claude felt like it was on vacation.
To Claude’s credit, I don’t think this experiment speaks with utter finality on anything it purported to test today as much as it does to the inherent variability in LLMs.
But for today at least, ChatGPT 4 and Gemini showed that the writing of LLMs can be coaxed and massaged in such a way that they can get closer to writing the way that some of us (or at least the way I) do.
And to that, all I have to say is: imagine a world where we can unlock AI’s linguistic potential to spin verbal tapestries that echo our very selves.
Opmerkingen