When Third-Party Content Appears Inside ChatGPT Responses, Trust Gets Transferred Unintentionally

Published
Written by:
Vishwa Pandagle
Vishwa Pandagle
Cybersecurity Staff Editor

Question: In a study, Permiso Security discovered a vulnerability in ChatGPT's Markdown rendering that could allow content from third-party web pages to seep into ChatGPT responses. Where do filters need to be added to prevent phishing, tracking, and social engineering attacks? What was the most unexpected finding during the investigation?


Andi Ahmeti, Threat Researcher at Permiso Security

One of the most interesting aspects of this research was that the issue was not fundamentally a model problem, it was a trust boundary problem.

Most discussions around prompt injection focus on whether a model can be influenced by untrusted content. At this point, we know the answer is yes. The more important question is what happens after the model has been influenced

In our investigation, we found that content originating from a third-party web page could make its way into a ChatGPT response and be rendered as 

We tend to think about phishing as something that begins with a click. In some of our tests, simply asking ChatGPT to summarize a page was enough to trigger remote image requests to attacker-controlled infrastructure. 

In others, we were able to render QR codes and spoofed account-security messaging directly inside the assistant response. The attack surface extended beyond traditional phishing and into passive tracking and cross-device social engineering.

From a defensive perspective, I don't believe this problem can be solved exclusively at the model layer. The industry has invested heavily in prompt-injection detection and guardrails, but attackers only need one successful path through those controls. 

The more reliable place to enforce security is at the rendering layer. If content originates from an untrusted source, the client should preserve that context all the way to the user interface.

For example,

The larger challenge is that AI systems are increasingly becoming brokers of trust. Users are conditioned to treat assistant responses differently than they treat raw web content. Attackers understand this and are adapting accordingly.

Historically, they had to convince users to trust an email, a document, or a website. Increasingly, they only need to convince an AI system to repeat or reformat their content. What concerns me most is the imbalance between attacker and defender incentives:

Attackers only need to identify one path where content can cross a trust boundary and be rendered in a more trustworthy context.

Defenders, meanwhile, are trying to secure an ecosystem that now spans:

Every new capability expands the number of places where trust can be unintentionally transferred. Organizations should start treating AI rendering layers as part of their attack surface today. Security reviews cannot stop at model behavior.

They need to include:

As AI assistants become more integrated into everyday workflows, those distinctions will matter far more than whether a prompt injection technically succeeds.


For a better user experience we recommend using a more modern browser. We support the latest version of the following browsers: For a better user experience we recommend using the latest version of the following browsers: