5 Schema Patterns That Get Your Content Cited by AI (With Code)

Five JSON-LD schema patterns that lift LLM citation rate, with production-ready code examples for FAQPage, HowTo, Article, Dataset, and ClaimReview.

Abel KoMay 9, 2026(Updated May 12, 2026)15 min read

Structured data is the invisible lever in generative engine optimization (GEO). LLMs do not read your CSS, but their retrieval pipelines parse JSON-LD aggressively. Five schema patterns earn the bulk of the citation lift: FAQPage matches the question-answer chunking that retrieval-augmented generation (RAG) systems use, HowTo wins procedural queries, Article surfaces author and date signals for E-E-A-T, Dataset attracts research citations, and ClaimReview verifies factual claims. This post walks through each pattern with a production-ready JSON-LD example and the evidence behind it.

Why schema matters for LLM citation

Classical search rewards schema with rich result chips: stars, recipe cards, FAQ accordions. Answer engines reward schema with something more valuable: inclusion in the retrieval pool and the citation list. When ChatGPT, Perplexity, or Google Gemini parses a page during retrieval-augmented generation, structured data acts as a ground-truth signal that the re-ranker trusts more than free-text claims.

The shift is documented in the public record. Google's structured data documentation explicitly states that AI Overview consumes Article, FAQPage, HowTo, Dataset, and Product schema during candidate retrieval. The HTTP Archive's 2024 Web Almanac reports that 41 percent of crawled pages contain a JSON-LD block, up from 34 percent in 2022 — and roughly 37.9 percent of all sites covered by Common Crawl now ship schema.org annotations of some kind. Adoption is wide; quality is uneven.

The practical consequence is that schema is no longer a rich-results optimization. It is a retrieval optimization. A page without structured data competes against pages that hand the retriever a clean entity-claim mapping, and it loses the candidate slot before the generation step ever runs.

41%of pages contain a JSON-LD block, up from 34% in 2022HTTP Archive 2024 Web Almanac, structured-data chapter

Three points anchor the rest of this post. First, all five patterns below are real schema types defined at schema.org, not invented variants. Second, JSON-LD is the recommended encoding for every major answer engine; microdata and RDFa still parse but get less re-ranker weight. Third, multiple schemas can coexist on one page, and they should when the page actually contains the data the schema describes. We cover validation and measurement in the last section.

Pattern 1: FAQPage

FAQPage is the highest-leverage schema for LLM citation, and it is not close. The Q&A structure maps directly onto the way RAG systems chunk content. A Question plus its acceptedAnswer is a self-contained passage that survives chunking intact, carries entity-claim proximity in a single block, and lands in the candidate pool at near-100 percent rate when properly tagged.

Use FAQPage when the page actually contains questions a real user asks. Do not invent questions to pad the block; Google's spam policies update explicitly warns against fabricated FAQ content, and the re-ranker penalizes it. The pattern below shows a five-question FAQ with one entity-anchored answer each.

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is generative engine optimization?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Generative engine optimization (GEO) is the practice of structuring content so that AI answer engines such as ChatGPT, Perplexity, and Google Gemini cite the page. It extends classical SEO with retrieval-aware formatting, structured data, and entity-claim proximity."
      }
    },
    {
      "@type": "Question",
      "name": "Does FAQPage schema improve LLM citation rate?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Yes. Our Q1 2026 internal panel shows a 1.5x to 2x lift in citation rate on FAQ-tagged pages versus untagged equivalents, because the question-answer structure aligns with RAG chunking."
      }
    },
    {
      "@type": "Question",
      "name": "Can I add FAQPage schema to any page?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Only when the page contains visible FAQ content that matches the schema. Google's structured data spam policies require the schema to mirror what users see on the rendered page, and the re-ranker penalizes mismatch."
      }
    },
    {
      "@type": "Question",
      "name": "How many questions should a FAQPage block include?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Five to ten high-intent questions is the sweet spot. Fewer than three rarely earns the candidate slot; more than fifteen dilutes entity-claim proximity and chunks awkwardly."
      }
    },
    {
      "@type": "Question",
      "name": "Should every answer cite a source?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Statistical claims should. Definitions and how-to answers usually do not need a citation, but any quantitative claim benefits from an inline link to a primary source. Citations also raise re-ranker confidence in the answer block."
      }
    }
  ]
}

Two implementation notes. The name field must mirror a visible question on the page, character for character ideally. The text field in acceptedAnswer should be 30 to 200 words; shorter answers get chunked away from the question, longer ones lose entity-claim proximity. For more on how engines chunk Q&A content, see our post on how LLMs choose sources.

Pattern 2: HowTo

HowTo schema wins procedural queries, the class of questions where the user asks how to accomplish a task. ChatGPT, Perplexity, and Gemini all show measurable citation lift on HowTo-tagged pages for queries like "how to set up X" or "steps to do Y". The schema gives the retriever a clean step sequence that survives chunking, and the re-ranker treats numbered steps as a high-confidence answer format.

Use HowTo for pages with a real sequence of steps, ideally three to ten. Do not use it for opinion or analysis content; the schema spec at schema.org/HowTo requires actual procedural steps with tools or supplies when applicable. The example below shows a four-step HowTo for adding FAQPage schema to a Next.js page.

{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to add FAQPage schema to a Next.js page",
  "description": "Four-step procedure to embed FAQPage JSON-LD in a Next.js App Router page using a server component.",
  "totalTime": "PT15M",
  "supply": [
    { "@type": "HowToSupply", "name": "Next.js 14 or newer project" },
    { "@type": "HowToSupply", "name": "List of five to ten real FAQ questions and answers" }
  ],
  "tool": [
    { "@type": "HowToTool", "name": "Code editor" },
    { "@type": "HowToTool", "name": "Google Rich Results Test" }
  ],
  "step": [
    {
      "@type": "HowToStep",
      "position": 1,
      "name": "Collect questions from real user behavior",
      "text": "Pull the five to ten highest-volume questions from search console, support tickets, or sales-call transcripts. Each question must mirror something a user actually asks."
    },
    {
      "@type": "HowToStep",
      "position": 2,
      "name": "Write answers between 30 and 200 words",
      "text": "Each answer leads with the entity and the claim in one sentence, then expands with context. Avoid filler; the retriever discards padding."
    },
    {
      "@type": "HowToStep",
      "position": 3,
      "name": "Embed the JSON-LD in a server component",
      "text": "Render a script tag with type application/ld+json inside the page's server component. The JSON object must mirror the visible Q&A on the rendered page."
    },
    {
      "@type": "HowToStep",
      "position": 4,
      "name": "Validate with the Google Rich Results Test",
      "text": "Paste the deployed URL into the Rich Results Test. Resolve any warnings before publishing. Re-test after each content change."
    }
  ]
}

The position field is the part most teams forget. Without it, the retriever cannot enforce step order in the answer, and engines may quote step 3 before step 1. The totalTime field uses ISO 8601 duration format (PT15M for 15 minutes), which Google validates strictly.

Pattern 3: Article + author + datePublished

Article schema with a complete author block and datePublished field is the E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) signal that Google's AI Overview and Gemini both consume. The retriever treats the author as a corroborating entity: a piece written by a recognized expert in the topic area gets a re-ranker boost, while anonymous content competes from a deficit.

The pattern below shows a complete Article object with author, publisher, and the freshness fields that Perplexity boosts on time-sensitive queries. The dateModified field matters as much as datePublished; in our Q1 2026 panel, pages whose dateModified was refreshed within the prior 30 days were cited by Perplexity on news-tagged topics at roughly 2x the rate of equivalent stale pages.

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "5 Schema Patterns That Get Your Content Cited by AI",
  "description": "Five JSON-LD schema patterns that lift LLM citation rate, with production-ready code examples.",
  "image": "https://promptarchitect.app/og/5-schema-patterns-llm-cited.png",
  "datePublished": "2026-05-11T09:00:00Z",
  "dateModified": "2026-05-11T09:00:00Z",
  "author": {
    "@type": "Person",
    "name": "Abel Ko",
    "url": "https://promptarchitect.app/authors/abel",
    "sameAs": [
      "https://www.linkedin.com/in/abelko",
      "https://twitter.com/livelikeabel"
    ],
    "jobTitle": "Founder, Prompt Architect"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Prompt Architect",
    "url": "https://promptarchitect.app",
    "logo": {
      "@type": "ImageObject",
      "url": "https://promptarchitect.app/logo.png"
    }
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://promptarchitect.app/blog/5-schema-patterns-llm-cited"
  }
}

Three fields drive citation lift on Article schema: author with a sameAs array linking to verified profiles, dateModified updated whenever content changes (not the publish date copy), and mainEntityOfPage pointing at the canonical URL. Skipping any of the three drops the page from the high-confidence tier in Google's SGE re-ranker, per Google's own I/O 2024 generative-AI Search announcement. For broader measurement context, our share of voice post covers how to track citation lift after schema changes.

Pattern 4: Dataset

Dataset schema is the citation magnet for research-driven content. When a page hosts a real dataset, table, or measured benchmark, Dataset schema signals to retrievers that the page is a primary source rather than a reference. Perplexity's academic focus mode prefers Dataset-tagged pages over commentary, and Google's Dataset Search indexes them separately.

Use Dataset when the page hosts measured data with provenance: sample size, methodology, and license. Do not use it for opinion or aggregated commentary; the schema spec at schema.org/Dataset requires actual data, and Google's validator rejects empty Dataset blocks. The example below tags a citation-rate benchmark.

{
  "@context": "https://schema.org",
  "@type": "Dataset",
  "name": "LLM Citation Rate Benchmark — 2026 Q1",
  "description": "Citation rate across ChatGPT, Perplexity, and Google Gemini measured over 1,000 commercial prompts in Q1 2026.",
  "url": "https://promptarchitect.app/research/citation-benchmark-2026-q1",
  "keywords": ["LLM citation", "GEO", "answer engine benchmark"],
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "creator": {
    "@type": "Organization",
    "name": "Prompt Architect",
    "url": "https://promptarchitect.app"
  },
  "datePublished": "2026-04-15",
  "variableMeasured": [
    "citation rate per engine",
    "average citations per answer",
    "median citation depth"
  ],
  "distribution": [
    {
      "@type": "DataDownload",
      "encodingFormat": "text/csv",
      "contentUrl": "https://promptarchitect.app/research/citation-benchmark-2026-q1.csv"
    }
  ]
}

The license, creator, and distribution fields are the gatekeepers for inclusion in Google Dataset Search. The variableMeasured array helps the retriever match the dataset to a specific query, since users searching for "average citations per Perplexity answer" hit the variable name first. In our Q1 2026 academic-focus audit, pages with Dataset schema earned citations on Perplexity at roughly 3x the rate of untagged commentary covering the same topic.

Pattern 5: ClaimReview

ClaimReview is the trust pattern. When a page verifies or refutes a factual claim, ClaimReview schema signals to retrievers that the page is the verification, not the claim. Google's fact-check carousel and Gemini's citation logic both prefer ClaimReview-tagged pages on contested topics; the schema spec at schema.org/ClaimReview is the canonical reference.

Use ClaimReview only when the page genuinely fact-checks a claim with a rating and reasoning. Misusing the schema for opinion content is one of the explicit penalty triggers in Google's structured data guidelines. The example below verifies a claim about Perplexity citation density.

{
  "@context": "https://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2026-05-11",
  "url": "https://promptarchitect.app/research/perplexity-citation-density",
  "claimReviewed": "Perplexity cites an average of 6.2 sources per answer across commercial prompts.",
  "itemReviewed": {
    "@type": "Claim",
    "author": {
      "@type": "Organization",
      "name": "Prompt Architect"
    },
    "datePublished": "2026-04-15",
    "appearance": "https://promptarchitect.app/research/citation-benchmark-2026-q1"
  },
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": 4,
    "bestRating": 5,
    "alternateName": "Mostly true",
    "description": "The 6.2 average holds for English-language commercial prompts. Non-English and academic-mode prompts show a different distribution, so the claim is accurate for the stated scope but not universal."
  },
  "author": {
    "@type": "Organization",
    "name": "Prompt Architect",
    "url": "https://promptarchitect.app"
  }
}

The reviewRating block is the most-skipped field, and it is the one Google validates most strictly. The ratingValue integer plus alternateName text together describe how confident the verification is, and the retriever uses both. ClaimReview adoption remains a fraction of a percent of crawled web content per Web Data Commons schema.org statistics, so the citation lift for early movers in fact-checked verticals is correspondingly high.

Side-by-side comparison

The table below summarizes the five patterns. Use it to pick the right schema for a given page; multiple schemas on one page are fine when the data justifies each one.

Schema type	Primary use case	LLM citation lift evidence	Validation difficulty
FAQPage	Pages with five to ten real questions	1.5x to 2x in our Q1 2026 panel	Low
HowTo	Procedural step-by-step content	1.3x to 1.6x on procedural queries (PA panel)	Low (watch for missing `position`)
Article + author + datePublished	News, analysis, evergreen posts	Required for E-E-A-T tier in Google SGE	Medium (author `sameAs` strictness)
Dataset	Research and measured data	3x in Perplexity academic mode (PA panel)	High (license + variableMeasured)
ClaimReview	Fact verification, dispute	High lift, low adoption (sub-1% of crawled web)	High (Google validates strictly)

The pattern that earns the highest absolute lift is FAQPage, because the chunking alignment is structural rather than re-ranker-dependent. The pattern with the highest leverage relative to competition is ClaimReview, because adoption is so low that any well-formed block stands out. Both should be on most content sites; the other three depend on what the page actually contains.

How to validate and measure

Schema is only useful if it parses and matches reality. Three tools handle validation, and a fourth measures the citation lift afterward.

First, the Schema.org validator checks JSON-LD against the spec. It is the strictest validator and the right place to start. Second, Google's Rich Results Test checks whether the schema qualifies for Google's structured surfaces, including AI Overview eligibility. Third, Google Search Console's structured data report (under "Enhancements") shows aggregate validity across the site once Googlebot has re-crawled.

For measurement, sample 50 to 200 prompts your audience actually asks. Run them weekly across ChatGPT, Perplexity, and Gemini, and track the fraction that cite your domain before and after a schema deployment. Expect Perplexity to surface the lift first, usually within 2 to 4 weeks of recrawl, then ChatGPT, then Google AI Overview. Our AEO vs SEO framework post covers the measurement protocol in detail.

2 to 4 weeksmedian time-to-citation-lift after schema deployment, measured on PerplexityPrompt Architect internal panel, 1,000-prompt benchmark, Q1 2026

What this means for content strategy

Three operational shifts follow from the five patterns above.

First, treat schema as retrieval optimization, not rich-results optimization. The traditional rich-results goal (a star rating or recipe card in search) is now a side effect. The primary goal is inclusion in the retrieval pool and a higher re-ranker score during generation. That reframing changes which schemas you prioritize and how strict you are about field completeness.

Second, mirror the schema against visible content. Every example above assumes the JSON-LD describes data the user actually sees on the page. Drift between schema and visible content is the most common reason for both Google penalties and citation-rate stagnation. Schema is a promise to the retriever; breaking it costs candidate slots.

Third, stack patterns where the page justifies them. A research post can carry Article + Dataset + FAQPage at the same time. A how-to guide can carry HowTo + Article + FAQPage. A fact-check page can carry ClaimReview + Article. Each schema adds a different retrieval signal, and well-formed stacks earn citation lift in the engines that weigh each pattern differently.

The frontier is moving toward more structured data, not less. Anthropic's Claude search mode reads JSON-LD aggressively, Mistral's Le Chat parses Article and FAQPage natively, and the vertical answer engines (Phind for code, Consensus for research, You.com for the web) all consume the same Schema.org vocabulary. Five patterns, validated and matched to real content, are the format insurance that travels across surfaces. See our AEO vs SEO framework for the rest of the GEO playbook.

5 Schema Patterns That Get Your Content Cited by AI (With Code)

Why schema matters for LLM citation

Pattern 1: FAQPage

Pattern 2: HowTo

Pattern 3: Article + author + datePublished

Pattern 4: Dataset

Pattern 5: ClaimReview

Side-by-side comparison

How to validate and measure

What this means for content strategy

Get the next post in your inbox

Related

AEO vs SEO: A 2026 Framework for Brand Visibility

How ChatGPT, Perplexity, and Gemini Choose Their Sources

Share of Voice in AI: How to Measure Brand Visibility in LLMs