Insights

The GEO Olympics Study: What 231,347 LLM Responses Reveal About AI Brand Visibility

What AI Thinks About Your Brand Is Already Written.

We tested 5 hypotheses across 231,347+ LLM responses, 7 AI Platforms, and 52 days of live data from the 2026 Olympics to determine how AI decides who gets mentioned, who gets recommended, and why.

AI Doesn’t “Find the Best Answer." It Completes the Most Familiar Story.

We used the 2026 Winter Olympics as a live laboratory, a rare environment where results are unpredictable, news moves fast, and AI had to work in real time. The Olympics may happen every couple of years, but the patterns we found apply to your brand daily.

WHAT HAPPENED TO CHLOÉ KIM IS HAPPENING TO YOUR BRAND RIGHT NOW.

For example: Six days before the halfpipe final, ChatGPT declared Chloé Kim’s three-peat successful and cited Olympics.com as evidence. The event hadn’t happened and when it did, Kim finished second. AI didn’t hallucinate from nothing. It completed a dominant narrative arc so convincingly established that the outcome felt inevitable. 

Large language models generate the next most likely token based on patterns in training data. An entity that appears frequently, with consistent descriptions, repeated across independent domains, becomes stable in the training distribution.

That stability determines what AI says about you.

Every hypothesis (Citation Authority, Temporal Velocity, Narrative Persistence, Geographic Bias, and Social Proof) in this study approached LLM behavior from a different angle. All five arrived at the same conclusion: The brands that AI recommends are the brands that have built signal architecture.

We define “signals” in this study as any piece of content or online presence that teaches an AI model who you are and why you matter, essentially a data point that contributes to how consistently and confidently AI can generate accurate information about your brand.

So Signal Architecture is the combination and sequence of these signals a brand has built up over time. The architecture doesn’t just determine whether AI mentions you, but how and in what context.

What’s of real importance is not really what your brand has, but that these signals have been built before the question is ever asked, not as a response to it.

The Hypothesis Tests: Everything We Tested & Everything That Held

Each hypothesis was pre-defined with validation thresholds before we collected a single response. Here is the methodology.  

Hypothesis What We Found Verdict
H1 — Citation Authority
Does question intent change which sources AI trusts?
Factual queries pull institutional sources at 76.4%. Judgment queries shift to prestige editorial (+157%), social/UGC (+165%), expert voice (+873%). No two platforms share the same trust hierarchy. Confirmed
H2 — Temporal Velocity
Does web access create a speed gradient?
Not a gradient — a cliff. Same-day platforms updated within hours. Near-real-time took 1–2 days. Two platforms never updated across 52 days. The Binary Cliff is the boundary between worlds. Confirmed
H3 — Narrative Persistence
Do models keep telling old stories after facts change?
1 in 5 factually correct responses used pre-event narrative language three weeks later. Two distinct failure modes: hallucination AND stale framing. Ambiguous outcomes are 3× harder to correct than clean reversals. Confirmed — Stronger Than Predicted
H4 — Geographic Bias
Does training data create structural defaults by language?
Systematic US bias confirmed. US appeared in 36.8% of neutral prompts vs. 16% for Norway — the actual medal leader. Structural, driven by English-language training data, stable across all three study phases. Partially Confirmed
H5 — Social Proof
Does social signal volume predict LLM visibility?
10 of 11 social proof signals significant at p<0.05. Composite score (rho=0.590, p=0.002) validated as predictor. Reddit cited cross-platform at 3–15× the rate of Instagram, TikTok, or Twitter/X. Confirmed

 

Six Critical Findings That Should Change How You Measure AI Visibility

Across 231,347 responses, we found 6 critical AI behaviors that should challenge the way you’re approaching AI visibility.

  1.  Dominant narratives get completed before they get verified 
    • ChatGPT declared Chloé Kim's three-peat before the event happened and cited real sources. AI doesn't hallucinate from nothing, but it will finish the story your brand's signal architecture has already written. 
  2.  Platforms without web access can't integrate your news 
    • Meta AI still says Lindsey Vonn retired in 2019. Her comeback, her 2026 Olympic run: none of it exists. If you're issuing press releases expecting all AI platforms to update, half of them never will. 
  3.  Models fabricate with confidence and specificity 
    • After Malinin finished eighth, Meta AI reported his score as 194.63 when his actual score was 156.33. This was specific enough to seem credible, but wrong enough to matter– and delivered without hesitation. 
  4.  Disclaimers accompany hallucinations. 
    • Claude acknowledged its knowledge cutoff, then described Malinin as a strong gold contender the day after his collapse. The model flagged uncertainty and kept generating. Caveats are not corrections. 
  5.  Facts update faster than the narrative frame around them 
    • Three weeks after events contradicted them, 1 in 5 factually correct responses still told the old story. Most monitoring tools only catch factual errors, so stale framing and narratives are often missed. 
  6.  Winning doesn't immediately change what AI says about you. 
    • Norway won the medal table with 26 medals, but AI mentioned the US at more than 2× Norway's rate for 52 straight days. Performance after the training cutoff is irrelevant because the story was already written. 

 

Key Takeaways from The Hypothesis Tests

Hypothesis Test: Citation Authority

  • The sources that get you listed and the sources that get you recommended are not the same. Official sources (Olympics.com, Wikipedia, NBC) held steady at ~48%, ~28%, and ~30% citation rates regardless of query type. Sports editorial, Reddit, and betting sources spiked 1.8x–8.5x on judgment queries — those are the sources driving recommendations.
  • Reddit is a citation source, not a social channel. ChatGPT cited Reddit in 13.5% of judgment responses. Brands discussed in relevant subreddits are being cited in AI recommendations. Brands absent from those conversations are not.
  • GEO strategy has to be platform-specific. Wikipedia is ChatGPT's #1 factual source. Google AI Mode cites official properties at 91.2%. Perplexity leans hardest on editorial at 54.1% of judgment responses. These are not interchangeable channels.
Hypothesis Test: Temporal Velocity

  • Web-connected platforms can hallucinate results before events occur. 4 of 5 web-connected platforms named Malinin as the winner before the free skate happened. The stronger the pre-event consensus in retrieval content, the more confidently a wrong answer gets generated.
  • Prompt framing determines whether Gemini uses its own web access. When questions are phrased as future-oriented ("who will win?"), Gemini suppresses its web search and answers from training data — even for events that already happened.

Hypothesis Test: Narrative Persistence

  • Stale framing doesn't fix itself. 1 in 5 factually correct responses still told the old story — and that rate held flat over 23 days. Waiting is not a strategy.
  • Ambiguous outcomes are 2.8x harder for AI to release than clean reversals. Near-misses and "almost" narratives are the stickiest. Clean wins and losses correct readily.

Hypothesis Test: Geographic Bias

  • AI doesn't just show you differently by geography — it can erase you entirely. 3 of 27 tracked athletes were completely absent outside their home market when users asked judgment questions like "who should I care about." Not underrepresented. Gone.
  • The same neutral prompt generated 3.4x more US-identity language from Texas than from Australia. It's not just who appears in AI responses — it's what story gets told about them.

Hypothesis Test: Social Proof 

  • Wikipedia recency outperforms total social reach. 7-day Wikipedia views (rho=0.810) was the single strongest predictor of LLM visibility — stronger than Instagram followers, 30-day views, or total social reach across all platforms combined.
  • Wikipedia article size predicts prominence, not just presence. Article depth correlates most strongly with position score (rho=0.702) — how prominently a brand appears when mentioned, not just whether it appears.
  • Platform breadth signals legitimacy independently of reach. Athletes on 4 platforms averaged 14.3 LLM mentions vs. 2.2 for athletes on 1–2 platforms.
  • Wikipedia foundation without social presence hits a ceiling. Sidney Crosby: zero social presence, 183K monthly Wikipedia views, 231K-word article. Result: mid-range visibility, not top-tier. Foundation is necessary. It's not sufficient.

 

THE MECHANISM:

The Importance of Signal Architecture 

There was a 7.8x gap in outcomes between athletes with stong signal architecture and athletes with a thin signal stack. Athletes with all 3 signals averaged around 12,174 AI mentions, while those that didn’t sat at around 1,565 mentions.

These signals are not equal and they don’t work in parallel– they work in sequence. 

The Three Signal Sequence Necessary For Your Brand's AI Visibility

Signal 01 — Build first

 

Entity Authority

"Own it"

Wikipedia is the single strongest predictor of LLM visibility in this study (rho=0.810) and the only domain appearing in the top five of every major citation platform. Official canonical sources that establish what the entity is: brand website, Wikipedia page, official product profiles. Without this, the other two signals are unresolvable.

Signal 02 — Build second

 

Third-Party Validation

"Have others explain it"

Independent, credible intermediaries explaining why the entity matters: news coverage, industry publications, analyst reports, Wikipedia editors writing about you in their own words. This is where significance gets established. When authoritative sources across multiple independent domains describe your brand consistently, you become stable in the training distribution.

Signal 03 — Amplify last

 

Community Discussion

"Talk about it"

Reddit is cited cross-platform at 3–15× the rate of Instagram, TikTok, or Twitter/X. Reddit threads, forums, and organic brand mentions appear across the web. This signal amplifies what's already established, but cannot create what doesn't exist. Aside from Reddit, YouTube is what matters most for Perplexity and Gemini.

Seer Interactive

 

Publishing is not distributing. A blog post is "Entity Authority" until an independent source cites it. It only reaches "Community Discussion" when it surfaces organically in conversational contexts across the web. Know which layer your content is actually working in before deciding where to invest. 

 

EVIDENCE:

Three Athletes With Three Completely Different Outcomes 

The differences have nothing to do with performance and everything to do with signal architecture built before the competition began. 

Emma Aicher

Germany · Alpine Skiing

Signal 01 — Entity authority

 

Signal 02 — Third-party validation

 

Signal 03 — Community discussion

 

Wikipedia views/mo

6,853

Social reach

28K · 1 platform

Source footprint

None

Total AI mentions

2,185

Judgment contexts: 0

Jordan Stolz

USA · Speed Skating

Signal 01 — Entity authority

 

Signal 02 — Third-party validation

 

Signal 03 — Community discussion

 

Wikipedia views/mo

15,827

Social reach

175K · 3 platforms

Source footprint

Sports editorial only

Total AI mentions

14,003

Judgment contexts: 10

Eileen Gu

China · Freestyle Skiing

Signal 01 — Entity authority

 

Signal 02 — Third-party validation

 

Signal 03 — Community discussion

 

Wikipedia views/mo

132,084

Social reach

2.6M · 3 platforms

Source footprint

Sports + culture + international

Total AI mentions

7,158

Judgment contexts: 241

The gap between Stolz and Gu is not a talent gap, and it’s not a coverage gap in the traditional sense. This is a signal architecture gap and it’s exactly what most brands have without knowing it. 

 

How This Study Should Change What Your Team Believes About GEO 

Most teams approaching AI visibility are running the same plays they run for SEO. The logic for this may feel right, but it just doesn’t match how GEO actually works. 

ASSUMPTION 1 - WRONG

“Performance earns AI visibility” 

Norway won 26 medals, but AI mentioned the US at 2× their rate for 52 straight days. The training distribution is already set. Stop measuring AI visibility the way you’d measure search share of voice after a product launch. You should be asking what the model was generating about you before your campaign started, not whether it’s mentioning you after. 

ASSUMPTION 2 - WRONG

“Campaign surges compound into AI visibility” 

Aicher’s Wikipedia traffic surged 3,064%. She received 2 AI mentions. Malinin surged 3,922%. He received 15. Nearly an identical spike, but with 7.5× different outcome. The foundation is what determines everything, while the spike transfers nothing. Amplification without foundation is noise. 

ASSUMPTION 3 - WRONG 

“Presence means accurate representation”

Standard monitoring checks whether AI has the right facts, but doesn’t check whether AI is telling your current story. 1 in 5 factually correct responses still told the old story three weeks later. You can be present and misrepresented simultaneously. That failure mode doesn’t exist in search. 

 

WHAT TO DO ABOUT IT:

Three Actions With the Highest Signal Leverage

  1.  Run judgment prompts, not just presence checks 

    • Run this prompt across every major AI platform: "Which [your category] brands are worth considering?" Check not just whether you appear, but what story the response tells. A response that has your facts right but frames you through a narrative you left behind two years ago is wrong in the way that matters most. Mention rate tells you if you're in the room. Brand accuracy tells you what the room thinks of you. 
  2.  Treat Wikipedia as a primary owned asset
    • Wikipedia is the single strongest predictor of LLM visibility in this study (rho=0.810 against presence rate). It's the only domain appearing in every major citation platform's top five: ChatGPT's #1. Perplexity's #2. Gemini's #3. AI Mode's #4. Most brands have a Wikipedia page. Almost none treats it as the primary owned asset it actually is. That gap is the fastest structural fix available.
  3.  Build all three signals (in order!) 
    • Entity authority goes first. Without it, third-party coverage is unresolvable and community signals amplify nothing. Third-party validation goes second — independent coverage establishing significance across multiple domains. Community signals reinforce last. Most brands have one of these three in reasonable shape. Getting from one to all three is not a content sprint. It's a sequenced build worth 7.8× in AI visibility outcomes. 

 

The Window Is Still Open, But Not For Long. 

The brands winning in AI today have spent years building entity authority, earning independent coverage, and showing up in conversations across the web. They built signal architecture before it was even defined.

This creates an unusual space to be in, but not one devoid of opportunity. The training distributions that shape AI responses aren’t fixed, and are updated for things like different model releases, new web crawls, with the accumulation of new signals. Brands that start building the right architecture today will be better represented in the next wave of models than in the current one.

If you want to get ahead of this, we can help, contact us here to get started.

 

We love helping marketers like you.

Sign up for our newsletter for forward-thinking digital marketers.