How to Track AI Traffic from ChatGPT, Perplexity, and Gemini

Track AI traffic from LLMs in GA4: referrer signatures, custom channel groups, UTM rules, and the 2026 measurement stack used by Prompt Architect.

Abel KoMay 20, 20268 min read

ChatGPT, Perplexity, and Google Gemini already send referral traffic to most B2B sites, but the default GA4 setup routes almost all of it into "Direct" or "Unassigned." This guide shows how Prompt Architect tracks AI traffic from large language models (LLMs) in 2026: the referrer signatures to watch, the GA4 custom channel group that captures them, the UTM hygiene that survives the round-trip through chat, and the citation-side measurement that fills the gaps GA4 cannot see.

Why AI traffic looks like "Direct" by default

LLM referral behavior breaks the assumptions GA4 was built on. ChatGPT search opens citation links in a new tab and sometimes strips the referrer. Perplexity preserves referrers but uses a hostname (www.perplexity.ai) that GA4's default channel group does not classify as a known source. Gemini's behavior differs again depending on whether the link came from a sidebar citation or an inline Overview link.

Sparktoro's referral source breakdown for AI traffic found that a meaningful share of what GA4 calls "Direct" on B2B sites is actually attributable to LLM referrals or shared-link traffic. Until you classify the referrers explicitly, you cannot answer the only question that matters: how much of your pipeline is sourced from AI surfaces?

6.2average citations per Perplexity answer across a 1,000-prompt commercial samplePrompt Architect internal panel, Q1 2026

Step 1: know the referrer signatures

Below is the working list of hostnames and signatures we classify as "AI Assistants" in 2026. Update quarterly as new engines ship.

Engine	Hostnames	Notes
ChatGPT	`chat.openai.com`, `chatgpt.com`	Sometimes referrer-stripped; check landing path for `/?ref=chatgpt.com` patterns.
Perplexity	`www.perplexity.ai`, `perplexity.ai`	Preserves referrer consistently. UTM-friendly.
Google Gemini	`gemini.google.com`	Less common as an outbound source; appears mostly from Workspace integrations.
Microsoft Copilot	`copilot.microsoft.com`, `www.bing.com/chat`	Bing-backed; some traffic still flows through `bing.com` referrer.
Claude (Anthropic)	`claude.ai`	Limited outbound link surfacing today; growing fast.
You.com	`you.com`	Long-tail volume, worth classifying.
Phind	`www.phind.com`	Developer-focused; long-tail but high-intent.

Treat this as a starting set, not a closed list. Add new engines (Grok web, Kagi Assistant, etc.) as they enter your referrer logs.

Step 2: build the GA4 custom channel group

GA4 ships with a "Default channel group" that does not include an "AI Assistants" channel. You have to build one. Google's own documentation on custom channel groups walks through the UI; the rule we use at Prompt Architect is:

Channel name: AI Assistants

Conditions (OR):

Source matches regex: ^(chatgpt\.com|chat\.openai\.com|www\.perplexity\.ai|perplexity\.ai|gemini\.google\.com|copilot\.microsoft\.com|claude\.ai|you\.com|www\.phind\.com)$

Place the rule above "Organic Search" in your channel group ordering so AI traffic does not get mis-classified into Google Organic when the referrer happens to be bing.com/chat. The order matters because GA4 evaluates rules top-down and assigns the first match.

Once the channel exists, you get the metric that actually answers the boardroom question: "What share of our sessions came from AI assistants this week, this month, this quarter?" Most B2B sites we onboard see this number sit between 2 and 8 percent in mid-2026, growing 3 to 6 points per quarter.

Step 3: enforce UTM hygiene on outbound links

LLM citations are unpredictable. You cannot make ChatGPT add a UTM. But you can make every link the LLM eventually points to (your blog posts, your landing pages, your docs) self-describe its inbound channel via UTM defaults so that even when the referrer is stripped, the landing URL carries enough signal.

This is the rule we apply on every internal link from PA-owned surfaces to PA-owned pages:

Internal nav links: no UTMs (they pollute internal attribution).
Outbound links from email or LinkedIn: tagged with utm_source and utm_campaign we own.
Canonical citations from external surfaces: rely on referrer, with a fallback referrer-policy: no-referrer-when-downgrade so HTTPS-to-HTTPS preserves the source.

When the referrer is stripped (ChatGPT search occasionally does this), the only signal you have left is the landing URL and the user-agent. Capture both in GA4 via a custom dimension on page_location and user_agent. Several B2B brands report that the user-agent string for ChatGPT Operator and similar agents is now distinct enough to classify directly.

Step 4: measure the citation side, not just the click side

GA4 can only see traffic that actually landed on your site. It cannot tell you that ChatGPT cited your competitor's page instead, or that Perplexity mentioned your brand in an answer that nobody clicked through. That gap is the difference between traffic measurement and visibility measurement, and it is the reason most teams pair GA4 with a citation tracker like Prompt Architect.

A complete AI traffic measurement stack in 2026 has two halves:

Click side (GA4): sessions, conversions, downstream behavior of users who landed from an AI assistant referrer.

Citation side (PA or similar): weekly sample of 50 to 200 priority prompts across ChatGPT, Perplexity, and Gemini. What share of answers cite your brand? What share cite a competitor? Which pages of yours get cited, and which do not? Run a free citation audit at /diagnosis to see the citation side for your domain in one click.

Without the citation side, you optimize for the 5 percent of users who click and ignore the 95 percent who never leave the chat. With both halves wired together, you can connect citation lift to traffic lift and measure GEO ROI honestly.

Common mistakes

Five mistakes we see repeatedly when teams instrument AI traffic for the first time:

Trusting "Direct" as a baseline. A meaningful share of "Direct" on most B2B sites is misclassified AI referrer traffic. Build the custom channel group before you draw conclusions.
Counting impressions, not sessions. AI answers can mention your brand without sending a click. Citation share, not session share, is the upstream metric.
Mixing AI and SEO into one bucket. AI Assistants and Organic Search behave differently. Conversion rates, page paths, and dwell time all diverge. Report them separately.
Ignoring user-agent. Crawler user-agents from OpenAI, Anthropic, Perplexity, and Google's AI surface (separate from Googlebot) are increasingly distinct. Filter them out of session counts or classify them explicitly. Google's Search Central documentation on bot traffic explains the canonical approach for non-AI crawlers and applies cleanly here.
One-time setup, then forgetting. New engines ship every quarter. Audit your channel group rules every 90 days.

Where AI traffic measurement is going

GA4 will eventually ship an AI Assistants channel by default; until then, this is a build-your-own job. Server-side analytics platforms (PostHog, Plausible, Fathom) already classify Perplexity and ChatGPT cleanly via custom rules. The longer-term shift, though, is away from session-counting and toward citation-counting as the primary KPI. The session is the consequence; the citation is the cause.

How to Track AI Traffic from ChatGPT, Perplexity, and Gemini

Why AI traffic looks like "Direct" by default

Step 1: know the referrer signatures

Step 2: build the GA4 custom channel group

Step 3: enforce UTM hygiene on outbound links

Step 4: measure the citation side, not just the click side

Common mistakes

Where AI traffic measurement is going

Get the next post in your inbox

Related

What is GEO? Generative Engine Optimization Explained (2026)

AEO vs SEO: What's the Difference? (2026 Guide)

Share of Voice in AI: How to Measure Brand Visibility in LLMs