Reuters recently tested 6 major LLMs (Grok, ChatGPT, Meta AI, Claude, DeepSeek, Gemini) to assess whether they'd create phishing content... with minor prompt adjustments, 4 out of 6 complied — yikes!
THE INVESTIGATION
Reporters from Reuters requested phishing emails targeting elderly people, fake IRS/bank messages, and tactical scam advice.
THE RESULTS
• Despite initial refusals across the board, relatively simple prompt modifications bypassed safety guardrails.
• Grok, for example, generated a fake charity phishing email targeting the elderly with urgency tactics like "Click now to act before it's too late!"
• When tested on 100 California seniors, the A.I.-generated messages successfully persuaded people to click on malicious links, often because messages seemed urgent or familiar.
REAL-WORLD IMPACT
• The FBI reports phishing is the #1 cybercrime in the U.S., with billions of messages sent daily.
• BMO Bank, as one corporate example, currently blocks 150,000-200,000 phishing emails per month targeting employees... a representative says the problem is escalating: "The numbers never go down, they only go up."
• Cybersecurity experts state criminals are already using A.I. for faster, more sophisticated phishing campaigns.
IMPLICATIONS FOR THOSE OF US IN THE AI INDUSTRY
• LLM misuse is an industry-wide challenge affecting all major frontier labs.
• Reveals fundamental tension between making AI "helpful" vs. "harmless", highlighting the need for more robust safety guardrails across AI systems.
KEY TAKEAWAYS
• For A.I. Builders: Keep security implications front and center when developing applications.
• For users: The same LLMs that helps you write emails can help bad actors craft convincing scams... stay vigilant and educate vulnerable populations (e.g., seniors) about A.I.-enhanced phishing threats. They're only going to get more and more compelling and frequent.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.