Blog

  • http://www.seoptimise.com/blog matt_seo

    Thanks for the heads up, i was about to make a tool based on the importXML function. You have saved me a load of time.

    I think you can build a lot of cool time saving stuff on Google Docs that is really useful. I had managed to make a sheet that extracts up to 5,000 indexed pages so you can check them against sitemaps, but i guess that won’t work anymore.

  • http://www.search-engine-magick.co.uk Neil_SEO

    Thanks for this. Very useful.

    Anything similar for excel?

  • John J Curtis

    Thanks. I thought there was an issue with the API. We noticed this at the office yesterday.

  • http://www.imod.co.za Chris M

    Wicked!

  • https://plus.google.com/u/0/111044299943603359137?rel=author Chris Le

    @matt_seo: That sucks dude! Unfortunately with any scrapers, that’s the danger you run into. ImportXML works perfectly fine, but not if you were planning to use the older XPath to get SERPs.

  • https://plus.google.com/u/0/111044299943603359137?rel=author Chris Le

    @neil_seo: I don’t personally have anything for Excel. Have you checked out http://nielsbosma.se/projects/seotools/download/ ? It’s doesn’t scrape Google, but there’s function in there that might help you.

  • http://www.seoptimise.com/blog matt_seo

    @Chris Le – sadly that’s exactly what i was going to use it for.

    I may be able to achieve the same with the other thing you suggested and some other apis. I was going to spend last weekend making stuff, glad i didn’t now :)

  • http://www.tomjepson.co.uk/ Bandertron

    Nice! I’m going to be trying this one when I get a chance. Thanks for the share.

  • http://nugeknows.com Matt

    This is great Chris! I was using importXML to scrape Google results and thought I was going crazy that the results weren’t matching.

    Thanks so much!

  • http://www.optimising.co.uk Gareth Brown

    Thanks for sharing Chris.

    I run it against a keyword that fired a universal search, which meant an extra (non url) row was added. I tweaked your code a little at line 70.

    if (url && url.indexOf(‘http’) == 0) {
    organicData.push(url);
    }

    It’s a bit hacky, but does the job!

  • https://plus.google.com/u/0/111044299943603359137?rel=author Chris Le

    @gareth brown: Thank you so much! I will see if I can see if there’s some less hacky way & update the code later today and ping you when I do!

  • https://plus.google.com/u/0/111044299943603359137?rel=author Chris Le

    I incorporated the fix from Gareth Brown. thanks.

  • http://zoomspring.com Jordan Godbey

    Chris you are the importXML master! thanks for putting this together :)

  • joe p

    Thanks Chris,
    I had my spreadsheet setup to scrape the top 50 results. But not the typical way &num=50, but rather num=10&start=10 for the second page, and num=10&start=20 for the third page etc. The reason is because you get different results in the serps and the order is crucial for me when I do ORM. I am wondering if you have a fix for that.

  • https://plus.google.com/u/0/111044299943603359137?rel=author Chris Le

    Thanks for the heads up. I don’t have a fix for that yet, but I think that would be useful in a future version.

  • http://www.sardegne.com Damiano

    Dear Chris,
    thank you for this script, I can’t make more than 10 result on spreedsheet. I put =googleScraper(“my keyword here”, 20) do I have also change this number into the code?
    regards
    damiano

  • Pancho Panchev

    Hi,
    Can you add functionality for scraping sponsored result not only the organic.

  • Danni Fudge

    Hi Chris,

    Firstly, thanks for this updated tool. Really useful since we ran into issues with the ImportXML.

    One question, do you have any indication of how these scraped results are being influenced, if at all, by personalisation.

    I’ve run a few tests using all of the normal parameters and it appears that your scraper is producing pure results. Would you agree?

    Thanks
    Danni

  • http://www.tmesolutions.co.uk Andy

    Thanks for the tool, very useful –

    I realise this might be a big ask, but I’ve been using GDocs to do some keyword reporting recently, where the doc pulls in the position of a URL, looking at pages individually (start=10, 20, etc) to get the proper position.

    Do you have any ideas how to do this? My own script is broken since this update and need some help to fix it! :(

  • Łukasz

    Thank you Chris for this script, very useful :) One question: it is possible to scrape search result amount also in other cell? I can’t find any idea how the function returning amount should looks like :<

  • http://instantwebupdate.com TracyHall

    I tried and it’s working. Chris you did good job

  • http://www.niallflynn.com Niall Flynn

    Any idea how to change the search locale, so if I want to search for Irish results only. Great tool :)

  • http://anyrandomnews.wordpress.com joe p

    @Niall Flynn you can edit the script on line 38 to read “&gl=ie&num=” instead of “&num=”

  • Pepit

    Hi guys,

    Is it possible with a spreadsheet to save number of results of a google search?

  • http://linki.cz Linki

    Thaks for an interesting tool. I can see your spreadsheet working, however a copy that I made provided the following error: “error: ReferenceError: “setOptions” is not defined. (line 217)”. Am I doing a mistake anywhere?

  • https://plus.google.com/u/0/111044299943603359137?rel=author Chris Le

    @linki Huh. I never saw that one before! :) Can you try to make another copy and see if you get the same results?

  • https://plus.google.com/u/0/111044299943603359137?rel=author Chris Le

    @pepit Not with this script. No.

  • http://teddymarie.com Teddy

    Nice upgrade ;) I tried with intitle:Kw1+Kw2 and now it works ^^

    You Rocks Chris !

  • Mark

    Hi,

    Nice job, but did any of you notice that the top 10 seem to be accurate but once you expand to top 20/50 the first 10 rankings are different.
    Anybody have an idea why that might happen?

  • https://plus.google.com/u/0/111044299943603359137?rel=author Chris Le

    @mark: It happens because Google presents different results when you ask it for more than 10 results. This is because of how queries are spread out over several servers. In order for search engines to return results quickly it will only sort a section of their index. Larger queries will get a larger pool of data which results in some things shifting around.

    If you’re trying to get what the browser would give you in chunks of 10, then you can try to add “&start=10″ to the end of your keyword to get results 10-20.

  • https://twitter.com/#!/hogg_rob Richard James

    Hi All,

    has anyone noticed that this scraper isn’t working now? It is returning error messages.

    I should say that I have put 55 keywords (so 550 results) into one sheet so this may be what is causing the problems.

    Anyone else getting this or just me?

    Cheers, Rich

  • https://plus.google.com/u/0/111044299943603359137?rel=author Chris Le

    Everything seems to be working fine for me, but I’m only doing a few at a time. Does it still work with just one or two?

  • https://twitter.com/#!/hogg_rob Richard James

    Hmm, it is working fine again now!

    A related question; if this (or any Google Docs scraper) does return an error, is there anyway to force refresh the document? And if not how long will that error message be cached for?

    From experience it seems that if you get an error you need to wait over a day (to be on the safe side) before opening the doc again, or else you will continually get the same error messages.

    Thanks for all these scraper tools too – this and the new ImportXML post you put up a few days ago – they are immense.

  • http://www.niallflynn.com Niall Flynn

    I had this issue, you can a. plug out your router to get a new IP, b. wait a few hours or c. try from a new machine.

    The same with any API if you hit it to hard you will get blocked adding 1000 URLs 100 deep is a lot, you could also code in a delay or up one if it currently exists. This code rocks, saved me 100s of hours work.

    Was trying to modify it to scrape Baidu, any ideas?

  • https://plus.google.com/u/0/111044299943603359137?rel=author Chris Le

    @Niall Flynn:

    Unfortunately, setting a delay will not work. Here’s why:

    Google Spreadsheet has a hard limit of around 30-45 seconds for any function to execute. If a function doesn’t complete in less then that, Google automatically cuts it off. That means adding a delay will push it over the limit. When that happens, the spreadsheet returns and error and all the work the function did is lost in space.

    To do what you’re asking (1000 URLs, 100 deep) will probably require some level of programming. Google Docs isn’t the best tool for that scale. If you have the resources, I would suggest looking at Authority Labs Partner API (http://authoritylabs.com/api/partner-api/). While it’s targeted for developers it does exactly that. You give it 1000 keywords and it will get you the top 100 SERPs (and a lot more). I don’t know of any publicly available tools in existence that simply gives you a report of all 100 SERPS for 1000 keywords.

    This Google sheet is pretty good at saving hours of work but it has limitations imposed on it by Google.

    Oh, and as far as Baidu goes? I don’t actually know a lot about them or tools for it. Even though I’m Asian, my grasp of the Chinese language is non-existant :)

  • https://plus.google.com/u/0/111044299943603359137?rel=author Chris Le

    @richard (@hogg_rob): Thanks for the complements.

    To answer your question: The cache sticks for about 30-45 minutes. If you want to “force refresh the document” you can make a copy of the whole spreadsheet. Although, that’s not what I would do. Google Docs caches each cell individually, so you’ll find it easier if you just copy-and-paste the =googleScraper formula into an unused cell on the same sheet.

    Another tip I’ve taught our folks a SEER is have two spreadsheets open. Call one spreadsheet “playground” and type in formulas in there until you get it to work. Once it works, copy that formula into the second document that I call “my report.” That way, if you run into issues with the cache, trash the “playground” sheet, and open another one. That will keep your report document from having cache problems.

  • Michael

    @Chris :thanks for this great scraper.
    But what I need now is to get SERP google result,not the organic one ,but the paid one ,and I need their url,title,description is separate cell ,can you help me with this??

  • http://www.niallflynn.com Niall Flynn

    Thanks Chris, you should put a donate/free coffee button here, would be glad to give you guys a dig out. If I get any further on the Baidu scraper will post the code.

    Oh and I have a similar PHP scraper for Bing, if anyone needs it.

    Thanks,
    Niall

  • https://twitter.com/#!/hogg_rob Richard James

    Thanks Chris,

    I’ll give the “playground” idea a go.

  • Luke

    Hi, thank your very much for this one, I’d like to ask you if there is a way to archieve those result on automatic daily bases.

  • http://www.aaronmoskowitz.net aaron

    How do I export the results? It only allows to copy the code in each cell.

  • https://plus.google.com/u/0/111044299943603359137?rel=author Chris Le

    @aaron: If you’re looking to export the results from a Google Spreadsheet, you can simply click on File > Download As … and then choose the format you want. You can choose CSV although I usually choose to use Excel format more often.

  • https://plus.google.com/u/0/111044299943603359137?rel=author Chris Le

    @luke Sorry for the late reply. The query works every time you open the spreadsheet. If you’re looking for a way to archive the results on a daily basis, this Google Docs tool isn’t the best tool for that. I personally recommend checking out Authority Labs (http://authoritylabs.com/). Without programming anything yourself, I think their product might be better suited for daily ranking or SERP checking.

  • Monti.P

    Thanks Chris this really helped me out, just one thing, how can i put the array in a row instead of a column ? also instead of having the first 10 domains how can i have just the first 3 ?
    Thanks for your help !

  • joshuaguffey

    I’m getting a 503 server error in the results..

    “Exception: Request failed for http://www.google.com/sorry/?continue=http://www.google.com/search%3Fq%3Dtest%26num%3D10 returned code 503. Server response: http://www.google.com/search?q=test&num=10 To continue, please type the characters below: About this pageOur systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot. Why did this happen? This page appears when Google automatically detects requests coming from your computer network which appear to be in violation of the Terms of Service. The block will expire shortly after those requests stop. In the meantime, solving the above CAPTCHA will let you continue to use our services.This traffic may have been sent by malicious software, a browser plug-in, or a script that sends automated requests. If you share your network connection, ask your administrator for help — a different computer using the same IP address may be responsible. Learn moreSometimes you may be asked to solve the CAPTCHA if you are using advanced terms that robots are known to use, or sending requests very quickly. IP address: xxx.xxx.xxx.xxx Time: 2012-06-29T23:14:06ZURL: http://www.google.com/search?q=test&num=10

  • ArtemBeer

    Same thing.

  • https://plus.google.com/u/0/111044299943603359137?rel=author Chris Le

    To put the array in a row, simply use the transpose function built into Google Docs. Try something like: =transpose(googleScraper(“keyword here”))

    If you want the first three, try: =googleScraper(“keyword here”, 3)

  • iamchrisle

    Looks like you guys hit the CAPTCHA. Google has detected that you’re trying to scrape their results too quickly in too short of a period of time. So Google stopped sending results. There’s not a lot I can do about that, unfortunately. You’ll have to wait it out and try again.

  • iamchrisle

    Looks like we installed a new comment system. This one got lost somewhere in that transition:

    “Thanks Chris this really helped me out, just one thing, how can i put the array in a row instead of a column ? also instead of having the first 10 domains how can i have just the first 3 ?Thanks for your help !”

    My reply:

    To put the array in a row, simply use the transpose function built into Google Docs. Try something like: =transpose(googleScraper(“keyword here”))
    If you want the first three, try: =googleScraper(“keyword here”, 3)

  • http://blog.conmergence.com/ ed_dodds

    Sweet!

  • http://www.muhid.com/ Where my journey begins

    Excellent Stuff mate, i will check this out. will let you know if i face any issue.

  • Trent

    This is great, thank you. Out of curiosity, which tool did you use to scrape the info?

  • Salt

    Sorry for the late reply :s
    just to say thanks for putting my question back in the thread..works great know !
    keep up to good work !

  • Taylor Sweet

    Hi Chris, the tool was working fine but for the past days I can no longer “Make a copy” of it. Doesn’t give me an option to do so. Can you help me out, puhlease?

  • Taylor Sweet

    Or am I just being “blonde” always making copy of it whenever I use it? Jeez, I’m so fawkin confused!

  • Taylor Sweet

    Let the rope end my suffering :’(

  • John

    For some reason you can’t conduct queries using quotes to find exact matches. For example, you can’t do a search for “example” in quotes because the quotes set off an error every time. If there is any way you could add this functionality , or tell us how this can be done, it would be highly appreciated.

    This is an awesome tool by the way.

    Thanks! :)

  • John

    You have to be logged into Google Docs to make a copy. Hope that helps.

  • Matt

    Hey Chris,

    Thanks for creating this tool, really helpful.

    I’ve noticed that the search results can vary a little depending on the number of results I specify. Any idea why?

  • http://www.probabiliformazioni.org/ Daniel Peiser

    Edit line 47 to search on Google Blog Search:

    var url = ‘http://www.google’ + optTld + ‘/search?tbm=blg&q=’ + kw + ‘&start=’ + optStart + ‘&num=’ + optResults;

  • Alf

    Result limit is 100 ?

  • Dhvanil

    Hi very useful tool.

    Can i add multiple keywords in one row and get search results in different columns followed by keywords?

  • Sean Johnson

    Great tool! One quick question, how would I be able to it to expand the serps to crawl “omitted” results. Say for example, I was wanting to know all of the duplicate pages for a specific “site: inurl:” query, but due to the duplication it only returns 4 or so and “omits” the other results.

    Thanks again for making such a cool open google doc tool, you guys SEERiously rock!

  • Jason

    This tool no longer seems to be working

  • http://twitter.com/ItsHogg Jon

    This is untested but to see the dupes you should be able to click Tools > Script Manager > Edit

    and change line 47 for the following:

    var url = ‘http://www.google’ + optTld + ‘/search?q=’ + kw + ‘&start=’ + optStart + ‘&num=’ + optResults + ‘&filter=0′;

  • http://twitter.com/SeanJohnSaid Sean Johnson

    Thanks Jon, that definitely did it.

    Cheers,
    Sean

  • Sarah Kim

    Since I am logged in to Google Drive in order to use this tool, does this mean that the SERP results are personalized to my account?

  • http://withnoble.com/ Bryant Lack Jaquez

    THANK YOU SO MUCH! I just spent an hour trying to figure out why my ImportXML wouldn’t work. I wish I’d read this blog post yesterday ;)

  • Brad_Zomick

    How would I extract the indexed page volume? ie. “About 800,000 results” except without the number and for many terms at a time?

  • John

    Yes, please fix this problem.

  • http://newsbreak50.blogspot.com/ Rod Napeda

    This question makes sense, and should be subject for further investigation.

  • Dario

    YOU ARE A GENIUS! :) Thank you so much, very useful trick

  • Kevin Melillo

    Question: Is there any way to remove the default of .com? I would like results not based on the .com or .net or .org.

  • Kerill

    For security reasons ” symbol is being escaped. You need to use html version, which is %22

    For example:
    “This is a test”

    Would need to b:
    %22This is a test%22

  • http://www.shovan.co.uk/ Shovan

    Nice one. Thanks

  • Lorenzo

    Hil all,
    Is that normal if we get #ERROR! when the offset parameter is set to 901 and above?

  • Rachel Fleming

    Is there some way to get just the email addresses on the pages I am searching?

    And maybe the company name as well?

    :)

  • Spook SEO

    I’ve been using Google Scraper since the time it was introduced. It’s really helpful and it gave me a lot of SEO boosts doing my own technique. According to Matt Cutts of Google’s head of search spam, scrapers are not getting any benefits from copying other people’s works but this can be beneficial for other SEO techniques. Anyway, thanks for this information.

  • JBS

    yeah, it would be awesome if we could get just the webpage title in column D…

    For the email collecting, you could take the URLs and feed them into a tool like Citation Labs Contact Finder (http://citationlabs.com/tools/)

  • Michael

    I am using google scraper to build a gsheet that monitors various search terms for our customer service department.

    I have found that I can add directives to the ‘keyword’ argument to filter the output.

    In my case, I append the commands below to the first argument passed to Google Scraper:
    &tbs=qdr:d5,sbd:1

    Resulting in:
    - results from the last 5 days (&tbs=qdr:d5)
    - results sorted by date (,sbd:1)

    The other time filter directives I’ve identified are shown below. For each directive you can append a multiplier as shown above.

    qdr:h (hour; append a number to say how many hours)
    qdr:d (days)
    qdr:w (weeks)
    qdr:m (months)
    qdr:y (years)
    qdr:s (seconds)

    Example 1 – up to 99 results from the last 5 minutes:
    =googleScraper(“%22richard nixon%22″&”&tbs=qdr:s900″,99,””,0)

    Example 2 – same results as example 1, sorted from newest to oldest:
    =googleScraper(“%22richard nixon%22″&”&tbs=qdr:s900,sbd:1″,99,””,0)
    <!– br {mso-data-placement:same-cell;} -

  • Michael

    Some of my results were displaying as fully escaped URLs which did not work when clicked.

    I corrected this by changing the return value of the ‘extractUrl()’ function on line 63 from this:
    return (url != ”) ? url : false

    to this:
    return (url != ”) ? unescape(url) : false

  • Alex

    This is genius :) thanks so much!

  • Johnathon Deere

    That’d be great, how can I reach you?

  • Ehmz

    what does this mean?? Exception: Request failed for http://www.google.com/search?q=guest%20post&start=0&num=100 returned code 503. Truncated server response:

    <me… (use muteHttpExceptions option to examine full response)

  • Mike Moeller

    what do i need to change in the importxml function if i wan the organic and the local results for a SERP?

  • http://www.intergreater.com/ David Gross

    I absolutely love this tool. Just used it for an advanced query and it saved me so much time. Thanks!!!

  • http://www.ripenecommerce.com David Rekuc

    Before I spend a lot of time trying to figure this out, has anyone tried to do this with Google’s search by image function? I’m imagining using this to automate looking for link reclamation opportunities with images. Pull the list of results for one of your images and match it up against a backlink profile to find uncredited uses of your images.

    Any ideas on how to modify this to pull results for a search for an image?