How to Track AI Traffic from ChatGPT, Perplexity, and Gemini
Track AI traffic from LLMs in GA4: referrer signatures, custom channel groups, UTM rules, and the 2026 measurement stack used by Prompt Architect.
ChatGPT, Perplexity, and Google Gemini already send referral traffic to most B2B sites, but the default GA4 setup routes almost all of it into "Direct" or "Unassigned." This guide shows how Prompt Architect tracks AI traffic from large language models (LLMs) in 2026: the referrer signatures to watch, the GA4 custom channel group that captures them, the UTM hygiene that survives the round-trip through chat, and the citation-side measurement that fills the gaps GA4 cannot see.
Why AI traffic looks like "Direct" by default
LLM referral behavior breaks the assumptions GA4 was built on. ChatGPT search opens citation links in a new tab and sometimes strips the referrer. Perplexity preserves referrers but uses a hostname (www.perplexity.ai) that GA4's default channel group does not classify as a known source. Gemini's behavior differs again depending on whether the link came from a sidebar citation or an inline Overview link.
Sparktoro's referral source breakdown for AI traffic found that a meaningful share of what GA4 calls "Direct" on B2B sites is actually attributable to LLM referrals or shared-link traffic. Until you classify the referrers explicitly, you cannot answer the only question that matters: how much of your pipeline is sourced from AI surfaces?
Step 1: know the referrer signatures
Below is the working list of hostnames and signatures we classify as "AI Assistants" in 2026. Update quarterly as new engines ship.
| Engine | Hostnames | Notes |
|---|---|---|
| ChatGPT | chat.openai.com, chatgpt.com | Sometimes referrer-stripped; check landing path for /?ref=chatgpt.com patterns. |
| Perplexity | www.perplexity.ai, perplexity.ai | Preserves referrer consistently. UTM-friendly. |
| Google Gemini | gemini.google.com | Less common as an outbound source; appears mostly from Workspace integrations. |
| Microsoft Copilot | copilot.microsoft.com, www.bing.com/chat | Bing-backed; some traffic still flows through bing.com referrer. |
| Claude (Anthropic) | claude.ai | Limited outbound link surfacing today; growing fast. |
| You.com | you.com | Long-tail volume, worth classifying. |
| Phind | www.phind.com | Developer-focused; long-tail but high-intent. |
Treat this as a starting set, not a closed list. Add new engines (Grok web, Kagi Assistant, etc.) as they enter your referrer logs.
Step 2: build the GA4 custom channel group
GA4 ships with a "Default channel group" that does not include an "AI Assistants" channel. You have to build one. Google's own documentation on custom channel groups walks through the UI; the rule we use at Prompt Architect is:
Channel name: AI Assistants
Conditions (OR):
- Source matches regex:
^(chatgpt\.com|chat\.openai\.com|www\.perplexity\.ai|perplexity\.ai|gemini\.google\.com|copilot\.microsoft\.com|claude\.ai|you\.com|www\.phind\.com)$
Place the rule above "Organic Search" in your channel group ordering so AI traffic does not get mis-classified into Google Organic when the referrer happens to be bing.com/chat. The order matters because GA4 evaluates rules top-down and assigns the first match.
Once the channel exists, you get the metric that actually answers the boardroom question: "What share of our sessions came from AI assistants this week, this month, this quarter?" Most B2B sites we onboard see this number sit between 2 and 8 percent in mid-2026, growing 3 to 6 points per quarter.
Step 3: enforce UTM hygiene on outbound links
LLM citations are unpredictable. You cannot make ChatGPT add a UTM. But you can make every link the LLM eventually points to (your blog posts, your landing pages, your docs) self-describe its inbound channel via UTM defaults so that even when the referrer is stripped, the landing URL carries enough signal.
This is the rule we apply on every internal link from PA-owned surfaces to PA-owned pages:
- Internal nav links: no UTMs (they pollute internal attribution).
- Outbound links from email or LinkedIn: tagged with
utm_sourceandutm_campaignwe own. - Canonical citations from external surfaces: rely on referrer, with a fallback
referrer-policy: no-referrer-when-downgradeso HTTPS-to-HTTPS preserves the source.
When the referrer is stripped (ChatGPT search occasionally does this), the only signal you have left is the landing URL and the user-agent. Capture both in GA4 via a custom dimension on page_location and user_agent. Several B2B brands report that the user-agent string for ChatGPT Operator and similar agents is now distinct enough to classify directly.
Step 4: measure the citation side, not just the click side
GA4 can only see traffic that actually landed on your site. It cannot tell you that ChatGPT cited your competitor's page instead, or that Perplexity mentioned your brand in an answer that nobody clicked through. That gap is the difference between traffic measurement and visibility measurement, and it is the reason most teams pair GA4 with a citation tracker like Prompt Architect.
A complete AI traffic measurement stack in 2026 has two halves:
Click side (GA4): sessions, conversions, downstream behavior of users who landed from an AI assistant referrer.
Citation side (PA or similar): weekly sample of 50 to 200 priority prompts across ChatGPT, Perplexity, and Gemini. What share of answers cite your brand? What share cite a competitor? Which pages of yours get cited, and which do not? Run a free citation audit at /diagnosis to see the citation side for your domain in one click.
Without the citation side, you optimize for the 5 percent of users who click and ignore the 95 percent who never leave the chat. With both halves wired together, you can connect citation lift to traffic lift and measure GEO ROI honestly.
Common mistakes
Five mistakes we see repeatedly when teams instrument AI traffic for the first time:
- Trusting "Direct" as a baseline. A meaningful share of "Direct" on most B2B sites is misclassified AI referrer traffic. Build the custom channel group before you draw conclusions.
- Counting impressions, not sessions. AI answers can mention your brand without sending a click. Citation share, not session share, is the upstream metric.
- Mixing AI and SEO into one bucket. AI Assistants and Organic Search behave differently. Conversion rates, page paths, and dwell time all diverge. Report them separately.
- Ignoring user-agent. Crawler user-agents from OpenAI, Anthropic, Perplexity, and Google's AI surface (separate from Googlebot) are increasingly distinct. Filter them out of session counts or classify them explicitly. Google's Search Central documentation on bot traffic explains the canonical approach for non-AI crawlers and applies cleanly here.
- One-time setup, then forgetting. New engines ship every quarter. Audit your channel group rules every 90 days.
Where AI traffic measurement is going
GA4 will eventually ship an AI Assistants channel by default; until then, this is a build-your-own job. Server-side analytics platforms (PostHog, Plausible, Fathom) already classify Perplexity and ChatGPT cleanly via custom rules. The longer-term shift, though, is away from session-counting and toward citation-counting as the primary KPI. The session is the consequence; the citation is the cause.
Get the next post in your inbox
One anchor essay a week on Answer Engine Optimization. No filler.
Related
What is GEO? Generative Engine Optimization Explained (2026)
Generative Engine Optimization (GEO) defined: what it is, how it differs from AEO and SEO, and the 2026 playbook for earning citations inside AI answers.
bestPracticesAEO vs SEO: What's the Difference? (2026 Guide)
AEO vs SEO compared: how answer engine optimization differs from SEO in 2026, the 7 key divergences, 4 overlaps, and a decision matrix.
bestPracticesShare of Voice in AI: How to Measure Brand Visibility in LLMs
Share of Voice in AI is the fraction of LLM answers that cite your brand. Here is the formula, a 30-day measurement plan, and the three pitfalls that distort the number.