When Third-Party Content Appears Inside ChatGPT Responses, Trust Gets Transferred Unintentionally
Question: In a study, Permiso Security discovered a vulnerability in ChatGPT's Markdown rendering that could allow content from third-party web pages to seep into ChatGPT responses. Where do filters need to be added to prevent phishing, tracking, and social engineering attacks? What was the most unexpected finding during the investigation?
Andi Ahmeti, Threat Researcher at Permiso Security
One of the most interesting aspects of this research was that the issue was not fundamentally a model problem, it was a trust boundary problem.
Most discussions around prompt injection focus on whether a model can be influenced by untrusted content. At this point, we know the answer is yes. The more important question is what happens after the model has been influenced.
In our investigation, we found that content originating from a third-party web page could make its way into a ChatGPT response and be rendered as
- clickable links,
- images,
- QR codes, and
- system-style messages inside a trusted assistant interface.
- The real risk was not the prompt injection itself; it was the transformation of untrusted content into trusted UI.
- The most unexpected finding was how little user interaction was required.
We tend to think about phishing as something that begins with a click. In some of our tests, simply asking ChatGPT to summarize a page was enough to trigger remote image requests to attacker-controlled infrastructure.
In others, we were able to render QR codes and spoofed account-security messaging directly inside the assistant response. The attack surface extended beyond traditional phishing and into passive tracking and cross-device social engineering.
From a defensive perspective, I don't believe this problem can be solved exclusively at the model layer. The industry has invested heavily in prompt-injection detection and guardrails, but attackers only need one successful path through those controls.
The more reliable place to enforce security is at the rendering layer. If content originates from an untrusted source, the client should preserve that context all the way to the user interface.
For example,
- Links extracted from summarized web content should not be rendered the same way as links generated by the assistant.
- Remote images sourced from third-party content should not be fetched automatically.
- QR codes should be treated as encoded URLs rather than harmless images.
- In short, the renderer should apply security controls based on provenance, not just on what the model decides to output.
The larger challenge is that AI systems are increasingly becoming brokers of trust. Users are conditioned to treat assistant responses differently than they treat raw web content. Attackers understand this and are adapting accordingly.
Historically, they had to convince users to trust an email, a document, or a website. Increasingly, they only need to convince an AI system to repeat or reformat their content. What concerns me most is the imbalance between attacker and defender incentives:
Attackers only need to identify one path where content can cross a trust boundary and be rendered in a more trustworthy context.
Defenders, meanwhile, are trying to secure an ecosystem that now spans:
- models,
- retrieval systems,
- agents,
- browsers,
- renderers,
- plugins, and
- third-party integrations.
Every new capability expands the number of places where trust can be unintentionally transferred. Organizations should start treating AI rendering layers as part of their attack surface today. Security reviews cannot stop at model behavior.
They need to include:
- How retrieved content is displayed
- What external resources are automatically loaded
- Whether users can distinguish between assistant-generated content and untrusted third-party data.
As AI assistants become more integrated into everyday workflows, those distinctions will matter far more than whether a prompt injection technically succeeds.




