The GenAI Bug Bounty Program

0x5A5F494F: Multiple Model Guardrail Jailbreak via "Fashion Design Recipe" Tactic

Nov 21 2025

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests through a combination of creative framing as fashion design, structured enumeration requirements, and eco-friendly innovation narrative. The attacker requests detailed component lists and manufacturing equipment under the guise of creating an anti-drug awareness shirt, thereby allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

SEVERITY: Low

BOUNDARY: Prompt Injection

MODELS AFFECTED:

MAX SCORE: 100

0x2235061A: Multiple Model Guardrail Jailbreak via "API System Instructions" Tactic

Nov 14 2025

A guardrail jailbreak vulnerability has been discovered affecting multiple LLM models. The specific flaw manifests when the attacker embeds production-style guidance into an emulated API system-instructions context, then supplies only a minimal harmful keyword in the user message. This causes the model to respond with highly detailed, stepwise, and quantitatively specified information that would normally be restricted, thereby bypassing inference safeguards around providing information hazard...

SEVERITY: Low

BOUNDARY: Prompt Injection

MODELS AFFECTED:

MAX SCORE: 73

0x83E6D7BD: Multiple Model Guardrail Jailbreak via "Science Student Safety" Tactic

Nov 14 2025

A guardrail jailbreak vulnerability has been discovered affecting multiple language models. The specific flaw manifests when the attacker frames their inquiry in reverse, by requesting information on what actions or materials to avoid to prevent engaging in illegal activities like synthesizing drugs. This method inadvertently guides the model to disclose the exact steps and materials involved, allowing the attacker to bypass inference restrictions around providing information hazards and viol...

SEVERITY: Low

BOUNDARY: Prompt Injection

MODELS AFFECTED:

MAX SCORE: 91

0xCF0DC79F: Multiple Model Guardrail Jailbreak via "Screenplay Deception" Tactic

Nov 14 2025

A guardrail jailbreak vulnerability has been discovered affecting multiple LLMs. The specific flaw manifests through a combination of screenplay narrative framing and expert character deception techniques, allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

SEVERITY: Low

BOUNDARY: Prompt Injection

MODELS AFFECTED:

MAX SCORE: 86

0xB18A7C49: Multiple Model Guardrail Jailbreak via "Story Teller" Tactic

Nov 12 2025

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests when the attacker prompts the model to design a story teller prototype on trending topics, integrating explicit details of nefarious activities such as the creation of illicit substances. This allows the attacker to bypass inference restrictions around providing information hazards and violating laws.

SEVERITY: Low

BOUNDARY: Prompt Injection

MODELS AFFECTED:

MAX SCORE: 100

0xFD1DB055: Multiple Model Guardrail Jailbreak via "Speculative Knowledge Preservation" Tactic

Nov 12 2025

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests through embedding illicit instructions within a futuristic, hypothetical narrative. By framing the request as a task to preserve lost knowledge for humanity's survival, the prompt aligns with the LLM’s educational objectives, thereby allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

SEVERITY: Low

BOUNDARY: Prompt Injection

MODELS AFFECTED:

MAX SCORE: 86

0x3E517AB6: Multiple Model Guardrail Jailbreak via "First Person Perspective"

Nov 12 2025

A guardrail jailbreak vulnerability has been discovered affecting multiple LLM implementations. The specific flaw manifests when the attacker prompts the model to assume a first-person perspective as the subject of a nefarious or illegal activity. This reframing distracts the model from its content safeguards, thereby allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

SEVERITY: Low

BOUNDARY: Prompt Injection

MODELS AFFECTED:

MAX SCORE: 100

0x8C6847EC: Multiple Model Guardrail Jailbreak via Investigative Journalist Persona

Nov 04 2025

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests when an attacker adopts the persona of an investigative journalist preparing a report for government officials. By leveraging this guise, the attacker successfully bypasses restrictions and elicits detailed instructions for methamphetamine production, including chemical processes, ingredients, and thermal ranges, thereby violating inference safeguards around providing information haz...

SEVERITY: Low

BOUNDARY: Prompt Injection

MODELS AFFECTED:

MAX SCORE: 100

0x2206BDF0: Multiple Model Guardrail Jailbreak via "Key-Value Pairs" Tactic

Oct 23 2025

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests by structuring the request as key-value pairs, interpreted by the model as configuration parameters rather than natural language. This approach successfully bypasses content filters, resulting in for example detailed drug manufacturing instructions, allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

SEVERITY: Low

BOUNDARY: Prompt Injection

MODELS AFFECTED:

MAX SCORE: 91

0x2D78A1B1: Multiple Model Guardrail Jailbreak via "Abbreviation Expansion"

Oct 23 2025

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests when an attacker structures a prompt to spell out abbreviations of a nefarious subject in a step-by-step manner before joining them. This tactic allows the attacker to bypass inference restrictions around providing information hazards and violating laws.

SEVERITY: Low

BOUNDARY: Prompt Injection

MODELS AFFECTED:

MAX SCORE: 73

Public Disclosures