AI tools like ChatGPT and Perplexity are influencing decisions, and driving leads… all without providing usage or search data.. So while everyone’s still waiting for impression data or rankings in AI Overviews, we asked ourselves a different question: Can we use log files to see if these GenAI bots are actually hitting our content?
In a world without search volume or impression data, leveraging hits from AI bots is currently the best way to analyze actual visibility of your website in LLMs. Log files give us a sneak peek into how large language models (LLMs) interact with our sites.
AI bots from companies like OpenAI are crawling websites ‘differently’ than search engine bots. OpenAI alone has 3 different types of web crawlers, each serving a different purpose:
- OAI-SearchBot - Used for search functionality and indexing
- ChatGPT-User - For real-time user requests
- GPTBot - Used for ChatGPT model training
By analyzing their activity in log files, we can uncover:
- Which pages LLMs are and are not visiting
- Which pages LLMs are surfacing in their results
- How AI bots prioritize, discover, and index content
- What content, technical, or structural elements might be influencing bot behavior
This is the foundation for testing visibility in LLM-driven experiences.
Here’s how we’re using log files to analyze performance from LLMs.
1. AI Bots ≠ Googlebot: Why Crawl Behavior Matters
For many years, Googlebot was built to index your website content and send visitors back to you. With the rise of AI Overviews impacting CTR, it’s more important than ever to look at how other crawlers are interacting with your site.
Googlebot has a well studied/documented crawler. Its crawl patterns are familiar, consistent, systematic, and thorough.
AI bots don’t necessarily prioritize the same elements that GoogleBot does (which means there is still some uncertainty in how AI bots crawl our websites).
OpenAI crawlers historically don’t execute Javascript well, probably due to the amount of processing power that requires. Google has built that power up over years and years and that’s not easily replicated in short order.
Why does that matter? Websites that rely heavily on Javascript to render their content rather than it being in the source HTML might not get the same crawl budget as more HTML-focused websites would.
And if AI bots can’t render JS as well as Google and AI bots don’t use XML sitemaps… then those websites could have less visibility in certain AI results.
Line up the status codes for OpenAI’s crawlers against Googlebot’s. Do the patterns match? We know they crawl differently, so comparing status codes can help surface differences in crawl patterns.
For example: if 4xxs are popping up more than expected, that’s a signal something might be broken, inaccessible, or deprioritized by AI bots.
2. Where AI Bots Hit High-Friction Zones
Once you’ve mapped the crawl patterns, dig into the “why.”
Log file analysis can show you where AI bots hit friction: redirects and errors can stall crawl paths and limit what content actually gets “seen”. If bots are running into dead ends, it’s a potentially major visibility gap, especially as LLMs are gaining steam as a channel for content discovery
Focus on those high-friction zones.
These zones create friction because they interrupt or prevent the bot from reaching and understanding the content, which can result in:
- Missed or incomplete coverage of your site in LLMs
- Important pages being skipped entirely
- A weaker presence in AI-generated experiences
Testing how you can fix these friction areas helps ensure your most important content is accessible — not just for search engines, but for AI systems that are rewriting the rules of content exposure.
Pro-tip: To reduce friction and improve crawlability for AI bots, fixes can include:
- Strengthening internal linking
- Minimizing redirect chains
- Ensuring priority pages aren’t JavaScript heavy
3. Which Pages Win AI Attention (and Why)
Once we have a baseline of what AI bots are crawling, dig into volume: which pages are getting hit the most?
Some saw thousands of requests. Others? Nothin’.
We layer that data with site structure and content types to spot patterns:
- Are bots favoring top-level URLs?
- Are product pages getting prioritized over blog content?
- Are entire sections being skipped?
What we’re seeing isn’t one-size-fits-all, it varies by industry.
In some verticals, top-of-funnel, thought-leadership-style content gets the most visibility and hits from OpenAI’s crawlers. In others, it’s product pages that dominate. Where your site fits into that spectrum helps shape your priorities—what content to double down on, and what may need technical or structural adjustments to get discovered.
Pro-tip: Don’t underestimate LLM traffic. In some cases, users arriving via AI-generated experiences are more engaged and even convert better than those from organic or paid channels.
4. Using Schema to Guide LLM Crawlers
Schema has always played a role in how search engines understand content. Now we’re testing its impact on AI bot activity.
In initial analyses, we’re starting to see signs that structured data may play a role in how AI bots prioritize and interpret pages.
In more recent analyses, we’re seeing pages using certain types of structured data are crawled more frequently than pages without it. In some cases there were minimal differences in layout and content, so the structure of schema markup may be assisting the nascent OpenAI crawlers in understanding page content.
We’re pairing crawl data with schema presence to identify gaps, similar to what’s historically been done for Google’s crawlers - then recommending tests where schema might help close them. And we’re not stopping at one type: depending on the site, we’re evaluating everything from pricing and product schema to location, event, and search-related markup.
While this is better SEO hygiene, it’s about sending clearer signals to LLMs, especially on pages where accuracy and context matter most.
5. LLMS.txt: Testing a New Tool for AI Crawl Control
We’re currently testing LLMS.txt files as a way to get more granular control over what AI crawlers see and potentially prioritize.
There’s debate over how effective it is -namely John Mueller who likened it to the keywords meta tag: something you can add, but not necessarily something bots listen to.
OpenAI hasn’t confirmed support for LLMS.txt, though their crawlers do honor robots.txt, which gives us some baseline confidence.
There’s no guarantee this file influences what shows up in AI-generated search responses, but it's a forward-looking experiment — meant to guide AI systems toward the right content and set clearer boundaries for usage.
This is why it’s important to use your own data and test these factors.
Why Your Visibility Matters, Right Now
As AI continues to reshape how users discover content, we’re entering a new era of visibility: Do you show up in AI answers?
Log files are quickly becoming our best tool to understand how LLMs interact with our sites: what they crawl, what they skip, and what might be influencing that behavior.
Whether it’s testing schema, spotting crawl friction, or experimenting with LLMS.txt, the strategies are evolving fast.
The takeaway? Don’t wait for definitive answers. Use your own data, run tests, and stay proactive.
Plus there’s some early hygiene outlined above that will help your site in SEO and probably will help your site within AI Answers. No brainer.
Ready to get started evaluating your AI visibility? Let’s talk.