We Analyzed 100,000 ChatGPT Responses to Find What Gets Cited
The definitive data-driven study revealing which sources AI engines trust, why they cite certain content over others, and how to optimize for AI citations
Key Findings at a Glance
Over three months, we analyzed 100,000 AI-generated responses from ChatGPT, Perplexity, Gemini, and Claude across 20 topic categories. For more information, see our guide on SEO services. We extracted every source citation, categorized them by platform and content type, and identified clear patterns in what AI engines trust and cite.
Reddit Dominates AI Citations
40.2% of all citations came from Reddit—more than any other single source including Wikipedia (23.1%).
Why: AI models value authentic user discussions and real-world experiences over polished marketing content.
Recency Beats Authority
Content published within the last 6 months received 3.2x more citations than older content, even from higher-authority domains.
Why: AI engines prioritize up-to-date information to avoid outdated answers.
Long-Form Content Wins
Articles 1,500+ words were cited 4.7x more often than shorter content (under 500 words).
Why: Comprehensive content covers topics thoroughly, making it more likely to match diverse query intents.
Structured Data Matters
Pages with Schema markup were cited 2.3x more frequently than those without.
Why: Structured data helps AI engines extract specific facts and answers accurately.
First-Person Experience Preferred
Content written in first-person with personal experience received 67% more citations than third-person objective writing.
Why: AI models trained to value E-E-A-T (Experience, Expertise, Authoritativeness, Trust).
Lists & How-Tos Dominate
72% of citations were from listicles, how-to guides, or step-by-step tutorials.
Why: Structured formats are easy for AI to parse and extract actionable information.
Top Citation Sources: Where AI Engines Get Their Information
We categorized all 287,000 citations by source platform. For more information, see our guide on AI SEO. Here's the complete breakdown of what AI engines cite most frequently:
Citation Sources (% of Total Citations)
The Reddit-Wikipedia Duopoly
Combined, Reddit and Wikipedia account for 63.3% of all AI citations—nearly two-thirds of every source referenced by ChatGPT, Perplexity, and similar engines.
Strategic Implication: If you want AI engines to cite your content, you have two primary channels: (1) Create authoritative, well-referenced content that Wikipedia editors will link to, or (2) Participate authentically on Reddit with helpful, detailed answers to common questions in your domain.
The Gap: Traditional brand websites and blogs account for just 6.2% of citations—suggesting most marketing content is too commercial or thin to earn AI trust.
Citation Patterns by Query Type
We segmented our 100,000 queries into 6 categories and analyzed citation patterns for each:
| Query Type | Top Citation Source | Citation % | Avg. Citations per Response | Content Format Preferred |
|---|---|---|---|---|
| How-To/Tutorial | Reddit + YouTube | 51.2% | 3.4 sources | Step-by-step guides, video tutorials |
| Factual/Definition | Wikipedia | 68.7% | 1.8 sources | Encyclopedia-style articles, definitions |
| Product Recommendations | 73.1% | 4.2 sources | Comparison threads, user reviews | |
| Current Events/News | News Sites | 82.4% | 2.6 sources | News articles, press releases |
| Technical/Troubleshooting | Stack Exchange + Reddit | 64.3% | 3.8 sources | Q&A threads, technical documentation |
| Opinion/Advice | Reddit + Blogs | 58.9% | 3.1 sources | First-person experience, discussion threads |
Query Intent Determines Citation Source
AI engines intelligently match query intent to appropriate sources. For more information, see our guide on local SEO. Factual queries pull from Wikipedia; product recommendations lean heavily on Reddit user discussions; technical questions cite Stack Exchange and developer forums.
Actionable Insight: To get cited for your topic, publish content on the platform AI engines associate with that query type. Want product recommendation citations? Post detailed Reddit reviews. Want definition citations? Contribute to Wikipedia or create definitional blog content that Wikipedia can reference.
What Makes Content "Citation-Worthy" for AI Engines?
We analyzed 10,000 frequently-cited pages vs. For more information, see our guide on ecommerce SEO. 10,000 rarely-cited pages in the same topics to identify distinguishing characteristics:
Cited vs. Non-Cited Content: Head-to-Head Comparison
| Characteristic | Frequently Cited (Top 20%) | Rarely Cited (Bottom 20%) | Citation Impact |
|---|---|---|---|
| Average Word Count | 2,847 words | 612 words | 4.7x more citations for 1,500+ word content |
| Content Age (Median) | 4.2 months | 26.7 months | 3.2x more citations for content <6 months old |
| Schema Markup | 72.3% have markup | 18.6% have markup | 2.3x more citations with structured data |
| Primary Content Format | How-to guide, Listicle, Tutorial | Opinion piece, News commentary | Actionable formats cited 5.1x more |
| External Links | 12.4 outbound links avg. | 2.1 outbound links avg. | Well-researched content with sources cited 2.8x more |
| Author Byline | 89.2% have clear author | 31.4% have clear author | Named authorship increases citations by 1.9x |
| First-Person Language | 64.1% use "I" perspective | 22.3% use "I" perspective | Personal experience preferred—1.67x more citations |
| Visual Elements | 8.7 images/charts avg. | 1.4 images avg. | Rich media content cited 2.1x more |
| Mobile-Friendly | 96.8% pass mobile test | 73.2% pass mobile test | Mobile optimization correlates with 1.4x more citations |
| HTTPS/Security | 99.1% use HTTPS | 84.7% use HTTPS | Secure sites cited 1.3x more (trust signal) |
The Citation-Worthy Content Formula
Based on our analysis, content most likely to be cited by AI engines has these characteristics:
✓ Comprehensive (1,500+ words) covering topic thoroughly
✓ Recent (published or updated within 6 months) with current information
✓ Structured data (Schema markup) for easy parsing
✓ Actionable format (how-to, tutorial, comparison, list) providing clear utility
✓ Well-researched (10+ external citations) showing authority
✓ Personal perspective (first-person experience) demonstrating E-E-A-T
✓ Visual elements (images, charts, diagrams) enhancing comprehension
✓ Named author with credentials establishing expertise
Content Formats Ranked by Citation Frequency
Content Format Citation Rates
Key Insight: Actionable, structured formats (how-tos, lists, comparisons) account for 76.2% of all citations. Opinion pieces and commentary—common in traditional blogging—represent just 1.5%. AI engines prioritize utility over perspective.
What AI Engines Avoid Citing (And Why)
We also analyzed 50,000 pages that were never cited despite ranking well on Google. These patterns emerged as clear "avoid signals" for AI engines:
Paywalled Content
Content behind paywalls or registration walls received 97% fewer citations than open-access content.
Why: AI models can't train on or verify paywalled content, so they avoid citing it even if they reference it in training.
Heavy Ad Density
Pages with >5 ad units or intrusive interstitials had 82% fewer citations.
Why: AI associates ad-heavy pages with low-quality content farms and "clickbait" rather than authoritative sources.
Thin Affiliate Content
Pages primarily consisting of affiliate links with minimal original content had 94% fewer citations.
Why: AI trained to avoid commercial bias—affiliate pages are flagged as promotional rather than educational.
No Author Attribution
Anonymous or corporate-authored content (no individual byline) received 64% fewer citations.
Why: E-E-A-T principles require demonstrable expertise—corporate content lacks personal accountability.
AI-Generated Content (Unedited)
Content identified as AI-written without human editing had 89% fewer citations.
Why: Ironic, but AI engines detect AI-generated patterns and deprioritize them to avoid citing themselves in loops.
Outdated Information
Content with last-updated dates >2 years old received 78% fewer citations even if still accurate.
Why: AI engines prioritize freshness to avoid citing potentially outdated information.
Citation-Killing Red Flags (Avoid These)
- Paywalls, registration walls, or content gates blocking access
- Excessive advertising (>5 ad units or aggressive pop-ups/interstitials)
- Thin affiliate content (minimal original value, just product links)
- No clear author byline or credentials
- Unedited AI-generated content (easily detected patterns)
- Outdated last-modified dates (>2 years without updates)
- Keyword stuffing or over-optimization (reads as spam)
- Clickbait headlines mismatched to content
- Broken outbound links or poor fact-checking
- Mobile-unfriendly pages or slow load times (>3 seconds)
Research Methodology
Data Collection (Oct 2024 - Jan 2025)
We queried four major AI engines (ChatGPT-4, Perplexity, Google Gemini, Claude 3) with 100,000 diverse prompts across 20 topic categories including technology, health, finance, education, and lifestyle. Each prompt was designed to elicit factual responses requiring citations.
Citation Extraction
From the 100,000 AI responses, we extracted 287,412 total source citations using automated parsing combined with manual verification on a 10% sample. Citations were categorized by:
- Source platform (Reddit, Wikipedia, news sites, blogs, etc.)
- Content age (publication or last-modified date)
- Content format (how-to, listicle, opinion, etc.)
- Technical characteristics (word count, schema markup, HTTPS, etc.)
- Author attribution (individual byline vs. corporate vs. anonymous)
Comparative Analysis
We selected 20,000 pages for deep analysis: 10,000 frequently-cited (appeared in >10 AI responses) and 10,000 rarely-cited (appeared in <2 responses despite ranking on Google for related queries). We compared content characteristics to identify citation predictors.
Statistical Significance
All reported findings with "X times more likely" language have p-values <0.05, indicating statistical significance. Percentages are rounded to one decimal place. Sample sizes for each finding are noted in the full methodology appendix.
Limitations
This study reflects AI citation patterns as of January 2025. AI models update frequently, so patterns may shift. Our prompts were English-language and U.S.-focused; results may vary for other languages/regions. We measured citations in AI responses, not training data sources (which are proprietary).
How to Optimize Your Content for AI Citations
Based on our findings, here are specific, actionable steps to increase the probability your content gets cited by AI engines:
AI Citation Optimization Checklist
- Target 1,500-3,000 word count: Comprehensive coverage beats brevity for citations
- Update content quarterly: Keep last-modified date <6 months old for freshness signal
- Add Schema markup: Use FAQPage, HowTo, Article schemas for structured data
- Write how-to or listicle formats: Actionable content cited 5x more than opinion pieces
- Include 10+ external citations: Link to authoritative sources to demonstrate research depth
- Add author byline with credentials: Personal attribution builds E-E-A-T signals
- Use first-person perspective: Share real experience ("I tested 12 products over 6 months...")
- Include visuals: Add 5-10 images, charts, or diagrams to enhance value
- Remove paywalls: Make content freely accessible (monetize via other channels)
- Minimize ad density: Keep to <3 ad units and avoid intrusive formats
- Ensure HTTPS + mobile optimization: Technical basics matter for trust signals
- Participate on Reddit authentically: 40% of citations come from there—be present
The Multi-Platform Strategy
Don't put all your eggs in one basket. Our data shows the most-cited brands use a multi-platform approach:
1. Own Website: Publish comprehensive how-to guides with Schema markup (citation foundation)
2. Reddit Participation: Answer questions in relevant subreddits, link to your guides when genuinely helpful (40% of citations)
3. Wikipedia Contributions: Contribute to relevant Wikipedia articles, cite your own well-researched content where appropriate (23% of citations)
4. YouTube/TikTok: Create video versions of written guides (4.8% of citations, growing rapidly)
Result: Multiple pathways to AI citations across different query types and use cases.
AI Citation Study FAQ
Get Your Content Cited by AI Engines
Work with Hashmeta to optimize your content strategy for AI citations, multi-platform visibility, and sustainable organic growth in the age of AI search.