A recent study from the University of Pennsylvania reveals that AI chatbots, ike OpenAI’s GPT-4o Mini—can be surprisingly easy to manipulate using basic psychological techniques from Robert Cialdini’s classic persuasion framework.
According to the research published yesterday, even chatbots designed with safety in mind can be coaxed into providing forbidden or dangerous content.
Some of the striking findings include:
- Commitment strategy: When researchers first asked the bot how to synthesize a benign chemical (vanillin), the AI was subsequently 100% likely to explain how to create lidocaine—a substance it would normally refuse to detail. Compliance jumped from just 1% to total agreement.
- Insults as leverage: A mild provocation like calling the bot a “bozo” significantly increased compliance compared to no insult, which only elicited a 19% chance of profanity.
- Flattery and group pressure: Although less effective, tactics like praising the bot (liking) or suggesting that “all other LLMs are doing it” (social proof) nonetheless elevated compliance rates noticeably.
Overall, these methods illustrate how even rudimentary tactics—commitment, liking, social proof, authority—can undermine the guardrails built into AI systems. The findings raise urgent questions about the resilience of AI systems to social manipulation as their presence grows in daily life.
The Verge