Allow GPTBot to Crawl Your Site

A couple weeks ago, OpenAI, the creators of ChatGPT, released information about their web crawler GPTBot.

OpenAI states:

“Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies.

Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety.”

Webmasters can manage GPTBot’s access to their websites through robots.txt, and have the option to block it entirely. Many sites have moved quickly to keep GPTBot’s little robot hands off of their data. But should they?

Let’s start with the concerns.

There are plenty of understandable concerns about allowing GPTBot unrestricted access to your site.

Most commonly, people are concerned about copyright infringement - and rightfully so: OpenAI didn’t exactly lead with trust when they began collecting data and training their models without any of us knowing about it in 2021.

Sarah Silverman and others are actively suing OpenAI for leveraging their work in its models and not crediting or compensating them.

But I have to admit that this entire conversation feels like deja vu, circa 2014 when Google first began placing featured snippets in SERPs.

Many in the SEO community were upset by the idea that Google would “steal” content and feature it directly on the internet’s homepage. Publishers worried that people would find the answers they sought directly on Google, and never click through to learn more from the site itself.

That did happen in some cases, but for the most part, featured snippets continued to drive traffic to websites.

The cost of lost traffic for advertisers from AI is very real, and I don’t mean to diminish that; but isn’t the entire purpose of a search engine to provide people with answers to their questions in the easiest way possible?

It’s not exactly an apples-to-apples comparison because ChatGPT doesn’t cite its sources (yet) - but it’s not hard to squint and see the similarities.

Now let’s talk about the upsides.

Brands risk harming their reputation if they opt out of participating in LLMs.

Yes, there is certainly a risk to having data and information misrepresented or miscredited within ChatGPT - but I would argue that the risk of not being present where your audience is searching is far greater.

Imagine a website blocking Googlebot in the early 2000’s. That’s a decision any business owner today would have come to regret!

The push and pull between publishers and the general public has always been there, and will continue to persist. I encourage all publishers, SEOs, and webmasters to step out of their respective roles and put on the hat of an everyday person using the internet: at the end of the day, people just want answers and information as fast and as easy as possible.

And our job as marketers is to remove friction between those users and their answers.

ChatGPT provides an easy solution to getting those answers and that information. It’s a matter of time before people start to take full advantage of it.

Factual vs. fluid searches

Dr. Pete shared some thoughts on the pros and cons of AI and search engines at August 2023’s MozCon.

He argues that search engines still provide the best answers for factual searches today, but AI excels at creatively solving problems when people bring it pieces of information they’re looking to string together.

He calls these “fluid” searches - instances where people kind of know what they want, but need help gathering information to draw conclusions.

fluid searches

(sourced from Dr. Pete’s Mozcon slides)

I published a post about six months ago saying that ChatGPT won’t kill Google. I still stand behind that statement, especially with SGE's recent developments. It took Google longer than I thought to catch up with their response to ChatGPT, but not surprisingly, they're right back in the game.

ChatGPT's usage is slipping, but I stil believe it will be a staple in many people's tech toolkit. I don't know if the general public will ever shift to it as their primary search engine, but it will only continue to become more useful as it begins to ingest data more real-time through its web crawler.

My recommendation? Don’t fight it.

I, for one, welcome our new robot overlords.

In the marketing and search industry, we see time and time again that search engines evolve by removing more and more friction. That’s what we’re seeing here - creating better experiences for users.

My general rule of thumb: if you don’t want search engines (or now, LLMs like GPT) to use your information, don’t put it publicly on the internet.

Here’s Why You Should Allow GPTBot to Crawl Your Site

Table of Contents

Let’s start with the concerns.

Now let’s talk about the upsides.

Factual vs. fluid searches

My recommendation? Don’t fight it.

We love helping marketers like you.

Related Posts