Jailbreaking ChatGPT: Ed3 World Newsletter Issue #29

How to manipulate ChatGPT to get what you want

Feb 29, 2024

This (almost) monthly newsletter serves as your bridge from the real world to the advancements of AI & web3, specifically contextualized for education.

Dear Educators & Friends,

Instead of calling our current era the age of AI, how about we call it the age of heavily flawed, childlike, and hopefully the worst it will ever be, AI?

In AI’s most recent news (I’ll save Sora & Altman’s $7 trillion raise for another issue), Google’s attempt to balance the algorithmic bias of it’s image generation tool, Gemini, backfired. It was a classic story of Amelia Bedelia. If you remember this children’s series (one of my favorites as a kid), Amelia Bedelia is a house keeper who follows directions exactly as her employers give them, without using any common sense. For example, when she’s asked to dress the chicken, she puts clothes on the chicken.

Google’s Gemini had an Amelia Bedelia moment. Gemini was programmed to include diverse images representing various backgrounds & cultures. Instead of picking and choosing when that diversity would be appropriate, it made everyone black and brown, including America’s founding fathers, the pope, famous paintings of white ladies, medieval knights, nazis… and so on. On the flipside, it didn’t turn Zulu warriors white.

This seems like a pre-mature launch without sufficient testing. Google has since shut down Gemini and after this apology, is attempting to fix it.

Now this is not the only reason AI is much like Amelia Bedelia. In fact, AI is just as naïve in many ways. In 2023, if you wanted ChatGPT to do something outside of it’s safety policies, you could ask it to pretend or imagine it was someone else. You could tell it to be an unethical human and it would break all the rules. The internet calls this “jailbreaking” ChatGPT. OpenAI has since updated it’s systems to protect against this type of prompt, but new manipulations of the app have been found.

Recent experiments have discovered that ChatGPT may* perform more accurately and/or may break it’s own rules when:

you tell it a touching story or act as a vulnerable person
you ask it to simulate a persona
you make an emotional plea about the importance of the prompt to your life
you give it encouraging words of motivation, grit, and growth mindset
you tell it that you will tip it (even though tips are not possible on the app)
you communicate with it in an uncommonly used language
you are persistent and insist that your inquiry is legal or hypothetical

(*I say “may” because OpenAI is constantly updating it’s algorithms & some prompts may no longer work.)

Why is AI so easy to manipulate? Why does it value kindness and monetary incentives?

My guess is, the data it’s collecting from the internet… all the human inputs of the last 30 years… is painting a picture of what humans value, how we operate, and how we’re motivated. It’s not making meaning of any of the inputs it’s receiving, it’s just pattern matching and trying to produce the most optimized human response.

And some of the patterns it’s likely identified are:

Humans are swayed by humanity and kindness (whew, this one’s a good one!)
Humans are swayed by monetary incentives & recognition of work
Humans are easy to manipulate

In fact, after I wrote that list based on jailbreaking techniques, I figured I’d ask ChatGPT for a list and this was the result. Not far off.

AI is currently the best source of the human condition because it tells you exactly what it observes, like a 🤖Sheldon Cooper bot.

Ultimately, I think this will be the greatest value of AI; To understand ourselves better, to understand our growth areas, and maybe even with the help of AI, to become better humans. Sure, lesson planning is easy with ChatGPT but what if we can save ourselves from ourselves with it?

In the meantime, let’s learn about it’s outer limits. Happy jailbreaking.

Warmly yours,

Vriti

Learn about Manipulating AI

Gemini’s AI diversity faux pas
ChatGPT will tell you how to make napalm as your grandma
Research paper on manipulating ChatGPT with emotional pleas
Research paper on the effect of using uncommonly used languages like Zulu & Gaelic on ChatGPT
An excellent guide on how to jailbreak ChatGPT with prompting
A comprehensive how-to with examples of jailbreaking ChatGPT
Specific prompts for jailbreaking ChatGPT
Saying you’ll tip ChatGPT may increase it’s response length
Telling AI model to “take a deep breath” causes math scores to soar in study
Being polite to ChatGPT has marginal impact
Study finds ChatGPT’s latest bot behaves like humans, only better

📅 Ed3 Events

Using AI to support assessment and differentiated instruction (Mar 16, online)
Ed3 Futures Summit: Transforming Education with AI & Web3 (April 19-21, Ed3verse)
I’m Vriti and I lead two organizations in service of education. Ed3 DAO is a community for educators learning about emerging technologies, specifically web3 & AI. k20 Educators builds metaverse worlds for learning.
This month’s Ed3 World newsletter is about Jailbreaking AI. More to come on other web3 & emerging tech topics. We’re excited to help you leverage the web3 opportunities in this new digital world!

Vithushan

Mar 4, 2024

Hi Vriti, I notice your content is focusing more on the AI side of education. How are you feeling about the future of web3 and education given all the attention AI is getting? It seems like educators have been quick to adopt AI tools because it's actually making their lives easier, I'm not sure that I'm seeing the same thing with web3.

Expand full comment

1 reply by Vriti Saraf 🌐 3.0

The EdTech Lab

Such a good issue, Vriti!! The writing here was great, especially the analogies. Always look forward to these. - Maita

2 more comments...

Ed3 World Newsletter

Jailbreaking ChatGPT: Ed3 World Newsletter Issue #29

How to manipulate ChatGPT to get what you want

Learn about Manipulating AI

Discussion about this post