SEER Blog

  • http://www.tomanthony.co.uk Tom Anthony

    Awesome post – loads of great stuff here! Will be going through this carefully tomorrow in the Distilled office. :)

  • http://www.paultyler.co.uk/ Paul Tyler

    Such a great list and will be very useful, thanks for sharing this and saving the rest of us so much time!

  • Paul Rogers

    Awesome post – Bookmarked!

    This is more digestible/actionable than a lot of the other xmlimport blog posts I’ve read, making it more suitable for amateurs.

  • http://www.seerinteractive.com Chris Le

    Thanks Paul! I learned how to use XPath functions too. I included a few tricks there so this might even be a good refresher for the seasoned folks too…

  • http://www.seerinteractive.com Chris Le

    You’re very welcome! You guys at Distilled started me down this path — so thank you!

  • http://ryanboots.com Ryan

    Great post. FYI: I built an online builder to help put together these strings quickly at xpathbuilder.com.

    I’ll work on getting these incorporated into the tool over time. :)

  • http://www.seerinteractive.com Chris Le

    xpathbuilder.com is an awesome tool Ryan! Thanks for Sharing!

  • http://www.evolvingseo.com Dan Shure

    Hey Chris

    Yes, this is a great reference (as you know from my tweet). I was just working on something myself and finally posted it here: http://www.evolvingseo.com/2011/09/19/schizophrenic-local-serp-import-xml-screencast/

    I gave a shout to your article. Love to hear what you think!

    -Dan

  • http://www.joydeepdeb.com/ Joydeep

    Great post Chris,I will try all these one at a time.

    One question ImportXML only works with Google Docs and not with MS Excel?

  • http://www.seerinteractive.com Chris Le

    That’s correct. ImportXML function is only available in Google Docs. With Excel, this is the closest I could find so far: http://bit.ly/po4GTH . But it’s not like ImportXML in Google Docs. I think the best option would be to use Google Docs, then save to Excel.

  • Brian Greenberg

    Very useful content… I can’t wait to try all your suggestions.

  • http://www.analyticsseo.com Steve Lock

    Outrageously awesome post!

    The way agile tools in Excel are being harnessed is making me rethink my own learnings with programming languages etc.

    Amazing what you can now do with little to no knowledge.

    :-)

  • http://www.soloeyewear.com craig

    Thank you for this. I know how I am going to use this, but It would be great to see how someone as savvy as yourself is using this info. If you follow up on this, please tweet me.

    -@cdstern

  • http://www.poweredbysearch.com Dev Basu

    This is just brilliant. I’ll be playing with these over the weekend.

  • http://www.tomjepson.co.uk/ Bandertron

    Gotta love ImportXML. Awesome contribution to the community Chris. Thanks for sharing!

  • Sebastian

    Hi Chris,
    great post! thank you for sharing. I had some problems when trying to scrape from blekko I would appreciate any help you could provide me. Cheers

  • Chris Le

    Sure. Would you mind putting your function here or on Twitter (@djchrisle)? That way, the whole community can benefit from any troubleshooting they might also run into.

  • http://laughinglemonpie.com Lacy

    Wow. This is such amazing info (and a little over my head!).

    I’m looking to return search results from websites that don’t have their own search function, but use a google custom search. Is there a way to write the search string query to search just a particular site?

    On google itself, I would type “Keyword site:http://domain.com” to get that. Can I do something similar in the importxml code?

    Thanks in advance for any help!

  • Howard

    Although I was originally quite skeptical of Google Spreadsheets having worked for so long with Excel, I’ve recently begun to understand the incredible power and convenience of the importxml function. Chris, your posts have been an amazingly helpful jumping-off point for those who are new to the tool — thank you very much.

    A site that I regularly work with has undergone some major (somewhat rushed) design changes within the last few weeks. I’ve been using Google Spreadsheets to audit every page to ensure that all links are directing to the appropriate pages (this site has three versions of each page in three different languages – so you can imagine that the task is a little daunting) and I’ve run into a bit of a snag.

    I’ve had no problem pulling the anchor text of every page link with the following formula:

    =importxml($A$1,”//a”) (Where cell A1 contains the domain)

    and I’m able to pull the link for each specific anchor text with the formula

    =importxml($A$1,”//a[contains(.,'About Us')]/@href”) (using the “About Us” page as an example)

    What I’m attempting to do at this point is dynamically replace the anchor text, so I can theoretically place the formula

    =importxml($A$1,”//a”)

    in cell B1 – which will populate column B with all of the anchor text on a given page – and place the formula:

    =importxml($A$1,”//a[contains(.,$B1)]/@href”) in cell C1 – so I can quickly drag this function down and see each link that corresponds to a specific anchor text.

    I’ve found, however, that when you put this function into place, it returns every link on the page, rather than just those that have the text specified in cell B1. I’m hoping that this is simply a syntax issue, and not something that Spreadsheets is incapable of doing. If anyone can offer any advice on possible workarounds, I would be eternally grateful (or at least grateful for a week or so — an eternity is a long time). Thanks so much, and Chris, thanks again for your guidance!

  • http://twitter.com/MarketerGraham Graham Hunter

    I love it! This could totally turn into a great forum or whitepaper.
    I just finished a formula for the top search result on youtube. I figured I would share it…


    =CONCATENATE("http://youtube.com",(index(ImportXML((CONCATENATE("http://youtube.com/results?search_query=",{KEYWORD})), "//ol[@id='search-results']//h3/a/@href"),1)))

    - @MarketerGraham
    http://marketergraham.com

  • http://twitter.com/Archiref Archiref Images

    good ressources, some xpath has changes thought.
    And in google doc, it seemed that it is a semi colon that separate argument and not comma.

  • Fred

    This is really help!

    Would it be possible to extract the “job title” for the linkedin profile. I have a database of 1000s and it would be a lifesaver.

    Much appreciated!

  • Joe Schembri

    =ImportXML{“the-url-here”}, “//p[@class='headline-title title']“)
    For example:
    =ImportXML(“http://www.linkedin.com/in/joeschembri”, “//p[@class='headline-title title']“)

  • http://LawFirmSearchEngine.com/ LawFirmSearchEngine

    Thank you for posting the Alexa recipes, but they don’t seem to be working now. Any guidance you have would be appreciated!

  • Sarfraz

    Hi. How to add custom range for google Serp scraper

  • Siva

    HI,
    Is there anyone tell me how to scrape a linkedin account search result using google Sheet..

    Thanks,

  • Raj

    Wish to extract following data fields from links similar to http://www.tripadvisor.in/Attraction_Review-g255060-d3571665-Reviews-Bailey_s_Sydney_Private_Tours-Sydney_New_South_Wales.html

    Email ID

    Type: Sightseeing Tours

    http://www.tripadvisor.in/Attraction_Review-g255060-d2141993-Reviews-Blue_Sky_Helicopters-Sydney_New_South_Wales.html

    Type: Tours, Adventure

    I am not able to gather both values of “Type” sepatated by comma; similarly email id fields.

    Anyone having idea?