Who said you couldn’t control Google?
Googlebot is one of those ethical and friendly bots that we want to come around often but some webmasters with several large sites may experience server problems when it uses way too much bandwith. How do you tell Google to decrease the amount of bandwith it uses and possibly save you some money on monthly server costs?
First, tell Googlebot to stop crawling very old pages of your site that have not been updated in over a year, have no plans to be updated soon but they are pages that must remain live on the site. Then, try adjusting the speed at which Googlebot crawls your entire site.
The folks over at Google have recently dropped some interesting knowledge on how to prevent Google from crawling very old pages of your site. While 200, 301 and 404 are widely popular HTTP status heading codes. Webmasters should start paying a bit more attention to 304-codes.
“You should configure your server to return this response (called the If-Modified-Since HTTP header) when a page hasn’t changed since the last time the requestor asked for it. This saves you bandwidth and overhead because your server can tell Googlebot that a page hasn’t changed since the last time it was crawled.”
This would be more useful for larger sites that have hundreds of pages and not small sites with only a few pages that wouldn’t be subject to the bandwith usage of a hyper-active Googlebot.
In Google Webmaster tools, you can also control the speed at which Googlebot crawls your site. In the Dashboard, under Settings, you will see a section for Crawl Rate. If you select “Set Custom Crawl Rate,” you will have the options to set Googlebot’s rate faster or slower:
To see granular details of Googlebot’s usage, Google now gives us access to information from the last 90 days in Crawl Stats under Statistics. There, you can view the number of pages crawled per day, number of kilobytes downloaded per day and the time spent downloading a page:
If you find that a slower Googlebot is helpful and lessens your bandwith usage, remember to re-adjust the settings in 90 days, otherwise Google will return to its automatic crawl rate. Again, only webmasters with increased bandwith issues would find this to be most helpful. Otherwise, unnecessary adjustments may hamper the number of indexed pages in Google.
Now, if only controlling all the other different web-bots was this easy!