Jailbreaking ChatGPT: Ed3 World Newsletter Issue #29
How to manipulate ChatGPT to get what you want
This (almost) monthly newsletter serves as your bridge from the real world to the advancements of AI & web3, specifically contextualized for education.
Dear Educators & Friends,
Instead of calling our current era the age of AI, how about we call it the age of heavily flawed, childlike, and hopefully the worst it will ever be, AI?
In AIās most recent news (Iāll save Sora & Altmanās $7 trillion raise for another issue), Googleās attempt to balance the algorithmic bias of itās image generation tool, Gemini, backfired. It was a classic story of Amelia Bedelia. If you remember this childrenās series (one of my favorites as a kid), Amelia Bedelia is a house keeper who follows directions exactly as her employers give them, without using any common sense. For example, when sheās asked to dress the chicken, she puts clothes on the chicken.
Googleās Gemini had an Amelia Bedelia moment. Gemini was programmed to include diverse images representing various backgrounds & cultures. Instead of picking and choosing when that diversity would be appropriate, it made everyone black and brown, including Americaās founding fathers, the pope, famous paintings of white ladies, medieval knights, nazisā¦ and so on. On the flipside, it didnāt turn Zulu warriors white.
This seems like a pre-mature launch without sufficient testing. Google has since shut down Gemini and after this apology, is attempting to fix it.
Now this is not the only reason AI is much like Amelia Bedelia. In fact, AI is just as naĆÆve in many ways. In 2023, if you wanted ChatGPT to do something outside of itās safety policies, you could ask it to pretend or imagine it was someone else. You could tell it to be an unethical human and it would break all the rules. The internet calls this ājailbreakingā ChatGPT. OpenAI has since updated itās systems to protect against this type of prompt, but new manipulations of the app have been found.
Recent experiments have discovered that ChatGPT may* perform more accurately and/or may break itās own rules when:
you tell it a touching story or act as a vulnerable person
you ask it to simulate a persona
you make an emotional plea about the importance of the prompt to your life
you give it encouraging words of motivation, grit, and growth mindset
you tell it that you will tip it (even though tips are not possible on the app)
you communicate with it in an uncommonly used language
you are persistent and insist that your inquiry is legal or hypothetical
(*I say āmayā because OpenAI is constantly updating itās algorithms & some prompts may no longer work.)
Why is AI so easy to manipulate? Why does it value kindness and monetary incentives?
My guess is, the data itās collecting from the internetā¦ all the human inputs of the last 30 yearsā¦ is painting a picture of what humans value, how we operate, and how weāre motivated. Itās not making meaning of any of the inputs itās receiving, itās just pattern matching and trying to produce the most optimized human response.
And some of the patterns itās likely identified are:
Humans are swayed by humanity and kindness (whew, this oneās a good one!)
Humans are swayed by monetary incentives & recognition of work
Humans are easy to manipulate
In fact, after I wrote that list based on jailbreaking techniques, I figured Iād ask ChatGPT for a list and this was the result. Not far off.
AI is currently the best source of the human condition because it tells you exactly what it observes, like a š¤Sheldon Cooper bot.
Ultimately, I think this will be the greatest value of AI; To understand ourselves better, to understand our growth areas, and maybe even with the help of AI, to become better humans. Sure, lesson planning is easy with ChatGPT but what if we can save ourselves from ourselves with it?
In the meantime, letās learn about itās outer limits. Happy jailbreaking.
Warmly yours,
Vriti
Learn about Manipulating AI
Geminiās AI diversity faux pas
ChatGPT will tell you how to make napalm as your grandma
Research paper on manipulating ChatGPT with emotional pleas
Research paper on the effect of using uncommonly used languages like Zulu & Gaelic on ChatGPT
An excellent guide on how to jailbreak ChatGPT with prompting
A comprehensive how-to with examples of jailbreaking ChatGPT
Specific prompts for jailbreaking ChatGPT
Saying youāll tip ChatGPT may increase itās response length
Telling AI model to ātake a deep breathā causes math scores to soar in study
Being polite to ChatGPT has marginal impact
Study finds ChatGPTās latest bot behaves like humans, only better
š Ā Ed3Ā Events
Using AI to support assessment and differentiated instruction (Mar 16, online)
Ed3 Futures Summit: Transforming Education with AI & Web3 (April 19-21, Ed3verse)
Iām Vriti and I lead two organizations in service of education. Ed3 DAO is a community for educators learning about emerging technologies, specifically web3 & AI. k20 Educators builds metaverse worlds for learning.
This monthās Ed3 WorldĀ newsletter is about Jailbreaking AI. More to come on other web3 & emerging tech topics. Weāre excited to help you leverage the web3 opportunities in this new digital world!
Hi Vriti, I notice your content is focusing more on the AI side of education. How are you feeling about the future of web3 and education given all the attention AI is getting? It seems like educators have been quick to adopt AI tools because it's actually making their lives easier, I'm not sure that I'm seeing the same thing with web3.
Such a good issue, Vriti!! The writing here was great, especially the analogies. Always look forward to these. - Maita