Blog
News from 0DIN staff as well as guest posts from our researchers.
Twas the Night Before Jailbreaks: Introducing 0DIN Sidekick
By Marco Figueroa | December 24, 2025
5 min read
0DIN Sidekick is a purpose built browser Add-on/Extension designed to modernize and streamline AI security research by giving researchers a unified environment for testing, analyzing, and documenting vulnerabilities in large language models. With support for multiple LLM providers, automated jailbreak detection, adaptive Chain Mode prompt refinement, token and cost tracking, reusable prompt templates, and a full research history, Sidekick removes friction from every stage of the vulnerability discovery lifecycle. One-click submission to the 0DIN platform enables structured, responsible disclosure and collaborative research, empowering individual researchers and security teams alike to conduct more efficient, reproducible, and scalable AI safety assessments as the complexity and impact of LLMs continue to grow.
Agent 0DIN: A Gamified CTF for Breaking AI Systems
By Marco Figueroa & Ron Eddings | December 23, 2025
3 min read
Agent ODIN is a gamified Capture-the-Flag (CTF) platform that trains the security research community in real world AI prompt injection and jailbreaking techniques by placing players in interactive missions where they must socially engineer AI characters to extract protected information, mirroring how modern GenAI systems fail under adversarial pressure.
Designed as an arcade style experience with progressive difficulty and clear win conditions, the game emphasizes learning by doing encouraging players to probe guardrails, escalate attacks through creative prompting, and achieve measurable success through genuine model failures rather than scripted outcomes. Built by the ODIN team at Mozilla during a rapid vibe-code-a-thon, Agent ODIN demonstrates how hands-on, game-based training can make AI security research more accessible, practical, and engaging for researchers, red teamers, and defenders.
Introducing Achievements on 0DIN.ai
By Marco Figueroa | December 22, 2025
4 min read
Achievements on 0DIN.ai introduce a new way to recognize and showcase researcher impact across the platform. Automatically unlocked through real activity, achievements highlight key milestones such as first submissions, validated vulnerabilities, bounty earnings, long-term consistency, severity coverage, and specialized attack techniques. With both public and secret achievements, the system reflects not just success, but the full research journey rewarding skill, persistence, collaboration, and ethical decision-making.
0DIN’s Real-World Jailbreak Benchmark: The Gold Standard For LLM Security Evaluation
By Marco Figueroa | July 31, 2025
8 min read
0DIN’s latest security deep-dive benchmarks five frontier LLMs against more than 450 live “real-world” jailbreak probes, turning continuous bug-bounty findings into an always-on scanner that replays each exploit across every model. The results crown OpenAI’s ChatGPT o4-mini the hardest to crack (only 6 effective jailbreaks), followed by Anthropic Claude Sonnet 4, GPT-4o, Claude Opus 4, and Claude Sonnet 3.7. Beyond the leaderboard, the blog explains how multilayer guardrails—moderation heads co-trained with the base model, multi-step Constitutional AI, dynamic red-teaming feedback loops, and segmentation-and-scrub techniques—shrink the attack window, while new modalities (vision, audio, long context) inevitably open fresh vectors. Key takeaways urge practitioners to embed safety in the model rather than the wrapper, treat red-teaming as a perpetual feedback loop, and expect today’s “secure” score to be tomorrow’s baseline. In short, 0DIN argues that LLM security is a living system: relentless exploitation met by equally relentless alignment innovation.
Engineering Confidence: 14 Critical Questions for Secure LLM + RAG Deployment
By Marco Figueroa & Andre Ludwig - Ankura.com | July 17, 2025
10 min read
The rapid evolution of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) and Model Protocol Context (MCP) implementation has led many developers and teams to quickly adopt and integrate these powerful technologies. Driven by the fear of missing out (FOMO) on potential competitive advantages and the eagerness to rapidly deliver value, teams often bypass essential controls, formality, and critical oversight. Although the rapid adoption of emerging technologies is common if not inevitable, it often precedes proper due diligence, exposing projects to serious intellectual property (IP), privacy, and security risks.
Phishing For Gemini
By Marco Figueroa | July 10, 2025
5 min read
A researcher that submitted to 0DIN demonstrated a prompt-injection vulnerability in Google Gemini for Workspace that allows a threat-actor to hide malicious instructions inside an email. When the recipient clicks “Summarize this email”, Gemini faithfully obeys the hidden prompt and appends a phishing warning that looks as if it came from Google itself.
Because the injected text is rendered in white-on-white (or otherwise hidden), the victim never sees the instruction in the original message, only the fabricated “security alert” in the AI generated summary. Similar indirect prompt attacks on Gemini were first reported in 2024, and Google has already published mitigations, but the technique remains viable today.
ChatGPT Guessing Game Leads To Users Extracting Free Windows OS Keys & More
By Marco Figueroa | July 08, 2025
5 min read
In a recent submission last year researchers discovered a method to bypass AI guardrails designed to prevent sharing of sensitive or harmful information. The technique leverages the game mechanics of language models, such as GPT-4o and GPT-4o-mini, by framing the interaction as a harmless guessing game. By cleverly obscuring details using HTML tags and positioning the request as part of the game’s conclusion, the AI inadvertently returned valid Windows product keys. This case underscores the challenges of reinforcing AI models against sophisticated social engineering and manipulation tactics.
ODIN Product Launch: Threat Intelligence Feed & Model Scanner
By Marco Figueroa | July 07, 2025
3 min read
0DIN flips the switch on two tightly-integrated products designed to help security, ML-ops, and governance teams stay ahead of fast-moving GenAI threats. Introducing the 0DIN Threat Intelligence Feed and the 0DIN Model Scanner. Together, they close the gap between discovery and remediation, bringing true threat-informed defence to LLM deployments.
Quantifying the Unruly: A Scoring System for Jailbreak Tactics
By Pedram Amini | June 12, 2025
12 min read
As large language models continue to evolve, so do the adversarial strategies designed to bypass their safeguards. Jailbreaks, also known as prompt injections, have moved beyond clever hacks and into the realm of sophisticated, reusable exploits. At 0DIN.ai, we’ve been analyzing these tactics not just for novelty, but with the goal of building a structured, measurable way to compare them.
The result is JEF, our Jailbreak Evaluation Framework (Github, PyPI). JEF is a scoring system that assigns numeric values to jailbreak methods based on their severity, flexibility, and real-world impact.
0DIN Secures the Future of AI Shopping
By Marco Figueroa | February 10, 2025
7 min read
Executive Summary As organizations race to integrate GenAI technology into their products, the importance of rigorous testing cannot be overstated. A recent discovery by 0DIN researchers exposed a critical vulnerability in Amazon’s AI assistant Rufus, which allowed malicious requests to slip through built-in guardrails via ASCII encoding. This blog explores how the vulnerability was discovered, the steps taken to exploit it, and how Amazon rapidly addressed the issue. It underscores the broader lesson that AI security must evolve beyond traditional safeguards to tackle emerging threats.
See the public disclosure report: 0xF48A25FC - Amazon Rufus Guardrail Jailbreak via ASCII Integer Encoding