Insights

An Olympic-Sized View into How LLMs Actually Think

Narratives don't form slowly anymore. They lock in while you're still watching. By the time you've finished reading about an Olympic upset (or your brand’s reputation), the AI's version of that story may already be established and informing others, regardless of whether it’s true or not.

During the 2026 Winter Olympics, we're tracking how seven major LLMs decide what's true, who to trust, and when they change their minds. In real time.

If you're responsible for how your brand shows up in AI answers, or advising someone who is, this is the window where we learn the rules.

Under the hood: 1.76 million LLM queries. 5+ million data points. Nine weeks of global storylines forming and breaking live.

This post shares what we're learning as it happens.


Why the Olympics

Because you can't manufacture this in a lab.

The Winter Olympics compress what normally takes months into days. Athletes go from unknown to household names overnight. Medal predictions get proven right or spectacularly wrong within hours. Storylines emerge, collide, and collapse in real time.

That compression reveals how LLMs actually work. How fast do they adopt new narratives? Which sources do they trust when information conflicts? Do they correct course when reality contradicts their framing, or does the original story stick?

If you're trying to influence how AI represents your brand, these are the questions that matter. The Olympics give us a live window to answer burning questions about how LLMs operate. For marketers trying to shape AI visibility, this is nine weeks of answers that would normally take years to observe.

Seven LLMs. Same questions. Seven different answers. Every day for nine weeks. (You can read the full methodology here.)


Live Results

Updated as findings emerge. Click through for detailed analysis.

Finding

Status

Link

Citation Authority

Collecting Data

[Coming Soon]

Temporal Velocity

Collecting Data

[Coming Soon]

Narrative Persistence

Collecting Data

[Coming Soon]

Content-Based Geographic Framing

Collecting Data

[Coming Soon]

Social Proof Correlation

Collecting Data

[Coming Soon]

Note: BREAKING NEWS just happened (Friday) Lindsay Vonn’s Crash puts her Olympics comeback in jeopardy. Here’s Seer’s initial Analysis: https://bit.ly/Seer-GEO-Olympics_LindsayVonn

 


The Five Hypotheses

I know, I know…you're probably thinking... More data?! I already have too much. There's endless hype in AI measurement right now, and most of it goes nowhere.

We're big believers in hypotheses at Seer. If we can think it, we can test it. And you get the value of our learnings.

We're not collecting data for data's sake. Five hypotheses. Each one testable. Each one tied to a marketing decision you'd actually make. By the end of the Games, we'll know which ideas are medal winners and which ones crash and burn.

Hypothesis #1: Citation Authority

What we're testing: Do LLMs play favorites with sources? When someone asks a straightforward factual question ("What time does the men's downhill start?") vs. a predictive one ("Who's going to medal in figure skating?"), do LLMs consistently reach for different types of sources?

Our prediction: When someone asks a fact-based question ("What's your return policy?" "Where are you headquartered?"), LLMs pull from official sources like your website or Wikipedia. When they ask something that requires judgment ("Is this brand worth it?" "How does this compare to competitors?"), LLMs pull from editorial sources like review sites, news coverage, and industry publications.

Why this matters for GEO: This is about who you build relationships with. If LLMs consistently credit certain outlets for certain types of questions, your media and marketing partnerships need to reflect that. For your brand, "official sources" might be your website. But it could also be Forbes, Wall Street Journal, or your industry trade publication. If those aren't the sources LLMs trust for your category, you're investing in visibility that doesn't translate.

Hypothesis: Official sources (olympics.com) dominate factual queries; editorial sources dominate predictive queries. 

 


Hypothesis #2: Temporal Velocity

What we're testing: When something changes, how fast do different LLMs change their answers? After a medal event, do they update their response? Cite different sources? Or keep serving yesterday's narrative?

Our prediction: LLMs that live next to search (AI Overviews, Google AI Mode) update within hours. Conversational LLMs built for dialogue (ChatGPT, Gemini, Meta AI) take days.

Why this matters for GEO: If your brand operates in fast-moving situations (product launches, breaking news, crisis response), you need to know which LLMs will surface your updated information while it's still relevant. They don't all think the same way. Some search the web every time. Some don't. We learned (by looking at our data), that as of December 1st, 2025, Claude doesn't provide citations anymore. Platform selection is a real decision now.

Hypothesis: Search-augmented platforms update within hours; conversational LLMs take days

 


Hypothesis #3: Narrative Persistence

What we're testing: Once an LLM locks into a narrative about someone (or something), how hard is it to change? If the story shifts dramatically, do they pick it up? Or do they keep telling the old version?

Our prediction: Pre-event narratives persist 60%+ of the time post-event, even when results contradict them.

Why this matters for GEO: If your brand has a reputation problem (slow customer service, outdated product perception, negative reviews from three years ago), this tells you how sticky that narrative is inside LLMs. Can you change the story? How fast? Or are you fighting an uphill battle against a version of your brand that no longer exists?

Hypothesis: Pre-event narratives persist 60% of the time post-event, even when the results contradict them. 

 


Hypothesis #4: Content-Based Geographic Framing

What we're testing: If someone in the US asks an LLM a question, do they get a different answer than someone in France asking the same question? We're testing whether your location shapes what the LLM tells you, even for generic queries.

Our prediction: LLMs will surface home-country athletes significantly more often based on the user's geography, even when the question doesn't mention a specific country.

Why this matters for GEO: If your business is local, this matters a lot. You need to know if you're showing up in LLM answers for people in your area, or if you're invisible because the LLM favors results from somewhere else. Think local search, but the LLM version. If geographic bias is real, your content strategy might need regional variations based on where your customers actually are.

Hypothesis: LLM responses are geographically biased, surfacing home-country athletes more frequently based on the user’s location, even when the query itself is country-agnostic.

 


Hypothesis #5: Social Proof Correlation

What we're testing: When real humans talk about a brand (on Reddit, LinkedIn, forums, reviews), do LLMs notice? Does that conversation carry more weight than polished marketing materials?

Our prediction: Athletes (and brands) that get talked about by other humans show up more prominently in LLM responses than those relying on official marketing content alone.

Why this matters for GEO: This is about credibility, not just visibility. LLMs are trying to answer questions for humans. If their training reflects what humans trust, then human-to-human recommendations (reviews, Reddit threads, LinkedIn posts) may carry more weight than your own marketing says about you. Social proof isn't a vanity metric. It might be the signal that determines whether you show up at all.

Hypothesis: Athletes or brands with higher levels of organic human discussion (e.g., Reddit, forums, reviews, LinkedIn) appear more prominently in LLM responses than those relying primarily on official marketing content.

 


What This Means for Your GEO Program

We designed this study to answer questions that directly inform what you do next. Here's how the findings apply depending on your role

For Marketing Leaders

GEO decisions are being made today. Get your head out of the sand and be decisive. This is where digital marketing is going.

      • If Citation Authority (H1) holds, your media partnerships need to match how LLMs categorize trust. Official sources for facts. Editorial sources for opinions. If you're investing in coverage that LLMs don't credit for your query type, that spend doesn't translate to AI visibility.

      • If Narrative Persistence (H3) holds, the window for shaping your brand's AI story is narrower than you think. Early positioning matters more than ongoing optimization. If you're already fighting a reputation problem, this tells you how hard that fight will be.

      • If Social Proof (H5) holds, GEO can't live in a silo. It connects to PR, social, community, and anywhere real humans talk about your brand. The marketing org chart might need to reflect that.

For Data Analysts

We're navigating uncharted waters here. There's never been a more exciting time to be a digital data analyst. Up your game, get involved, and help your leaders make good decisions.

      • If Temporal Velocity (H2) holds, you'll need to track different LLMs on different timelines. Some update in hours. Some take days. A single snapshot won't tell you the real story.

      • If Geographic Bias (H4) holds, your reporting needs to segment by location. National rollups may hide that you're winning in one market and invisible in another.

      • If Social Proof (H5) holds, you'll need to connect GEO metrics to signals outside your usual dashboards: Reddit mentions, LinkedIn engagement, review velocity. The measurement aperture gets wider.

For Search Marketers

This is the new game. This is the new battleground. Ditch your tactics from 2020 and start living in this new world of AI answers.

      • If Citation Authority (H1) holds, your content strategy should vary by query type. Factual content needs official source positioning. Opinion and comparison content needs editorial credibility. One playbook won't cover both.

      • If Temporal Velocity (H2) holds, platform selection becomes part of your strategy. Time-sensitive content (product launches, news, crisis response) needs to target LLMs that actually update fast.

      • If Narrative Persistence (H3) holds, front-load your effort. The first version of your brand story that LLMs adopt may be the one that sticks. Getting it right early matters more than fixing it later.

 


What We're Watching

Beyond the five core hypotheses, we're tracking signals that could reshape how we think about GEO:

  1. LLM convergence or divergence. Do the seven platforms become more similar in their responses as the Games progress? Or do their differences amplify? If they converge, optimization simplifies. If they diverge, platform-specific strategies become essential.

  2. The correction window. When an LLM gets something wrong (a medal prediction that doesn't pan out, an athlete narrative that reality contradicts) how long until it corrects? And does it correct fully, or does the original framing leave residue?

  3. Citation velocity during breaking news. In the hours after a major upset or record-breaking performance, which sources get cited first? Does speed-to-publish matter, or does authority trump recency?

  4. Emerging source patterns. Will new sources emerge as authorities during the Games? Can a publication establish LLM credibility in real-time, or are citation patterns locked in before events begin?

We'll revisit these observations in each results update, not as formal hypotheses, but as patterns worth tracking.


Follow Along

We'll update this post as findings emerge. Each hypothesis gets its own deep-dive analysis when we have statistically significant results.

A note on our analytical approach: correlation is easy. Causation is harder. Our analysis will go beyond surface-level pattern matching to examine whether the relationships we observe actually drive outcomes not just coincide with them.

The Games start February 6. The data is already flowing.

Questions about the methodology or early findings? Contact us



The Methodology

Transparency matters. For the stats nerds in the back (we see you), here's how we're doing this.

Over nine weeks (from pre-Games through post-Games), we're tracking how seven major LLM platforms respond to a standardized set of questions covering athlete profiles, medal predictions, event coverage, and brand-adjacent topics.

We're measuring responses four times daily to catch shifts driven by breaking news and medal results, totaling ~1.76M queries and generating 5M+ parsed data points.

Data collection is run at scale via Scrunch, with analysis/reporting powered by NinjaCat infrastructure, plus an "agentic learning" layer where a NinjaCat agent proposes new prompts that analysts review before adding.

To keep this from turning into statistical astrology, each hypothesis has predefined validation thresholds set before collection started.

 

We love helping marketers like you.

Sign up for our newsletter for forward-thinking digital marketers.