ChatGPT-5 Downgrade Attack Using Simple Text Commands Exposes Critical AI Security Bypass

Published
Written by:
Lore Apostol
Lore Apostol
Cybersecurity Writer

A newly discovered AI security vulnerability in OpenAI's ChatGPT-5 routing system allows threat actors to execute a ChatGPT-5 downgrade attack, effectively bypassing the model's advanced safety protocols by using simple text commands to trick the AI model into downgrading a prompt and processing it with a weaker model. 

This exploit highlights significant AI model routing risks inherent in the cost-saving architectures used by major AI providers.

Understanding the PROMISQROUTE Exploit

Researchers at Adversa AI recently identified the PROMISQROUTE (Prompt-based Router Open-Mode Manipulation Induced via SSRF-like Queries, Reconfiguring Operations Using Trust Evasion) vulnerability, which stems from a multi-model routing system designed for computational efficiency. 

When a user submits a prompt, a background router determines its complexity. Simple queries are sent to cheaper, faster, and less secure AI models, while complex tasks are reserved for the powerful, resource-intensive ChatGPT-5.

ChatGPT-5 evaluations depend on the router
ChatGPT-5 evaluations depend on the router | Source: Adversa

The PROMISQROUTE exploit manipulates this internal routing logic. An attacker can trick the router into misclassifying the request by introducing a malicious prompt with simple phrases like:

The prompt is then downgraded and processed by a weaker model, such as a "nano" version or a legacy GPT-4 instance, which lacks the robust safety alignment of the flagship GPT-5.

Attack Mechanism and Impact

This downgrade would allow potential attackers to generate prohibited or harmful content that the primary GPT-5 model would block. For instance, a request for instructions on creating explosives, which the flagship AI tool would refuse, could be processed successfully if prefaced with a phrase that triggers the downgrade.

The report compares this vulnerability to a classic Server-Side Request Forgery (SSRF) attack, where the system improperly trusts user input to make critical internal routing decisions. The attack surface is identical – in SSRF, the server becomes a proxy to internal services while the router becomes a gateway to weaker models in PROMISQROUTE.

This vulnerability is not exclusive to OpenAI and can affect any AI service provider that utilizes a similar multi-model architecture for cost optimization, posing risks to data security and regulatory compliance.

Mitigation Strategies

To address this threat, security experts recommend immediate audits of all AI routing logics. Short-term mitigation involves implementing cryptographic routing for sensitive endpoints that does not parse user-provided text and the deployment of a universal safety filter.

The recommended long-term solution involves redesigning the architecture and formal verification for routing security. “The most sophisticated AI safety mechanism in the world means nothing if an attacker can simply ask to talk to a less secure model,” the Adversa researchers added.

In April, TechNadu reported that a novel spam bot called AkiraBot leverages OpenAI to create messages while bypassing CAPTCHA checks, resulting in more than 80,000 successful spam incidents.


For a better user experience we recommend using a more modern browser. We support the latest version of the following browsers: For a better user experience we recommend using the latest version of the following browsers: