More ImportXML Cookbooks

5 Scrapers for the price of 1 blog post! Here's a new collection of 64 new ImportXML formulas and 5 spreadsheets to jump start your scrape-fest.

I wanted to update my original ImportXML Cookbook post so I recently asked twitter for requests (DJ Chris Le.. get it?! HAhaha ah h ... ok, nevermind.) The majority of requests were for social. No surprise, right? Well, here they all are! Let's get cooking!

Preparations

1. You'll need a new Google Docs Spreadsheet

2. In cell A1, type in the URL you want to scrape.

3. In any other cell copy-and-paste the recipe. Press enter.

Noob tip: Use the example sheets as your starting point. Pro tip: Use these recipes on any existing Google Doc! No code. Made with 100% pure formulas.

Quora: Followers of a Topic

Try the Quora starter spreadsheet

First up, by request (@dan_shure): Quora! The best way to get this URL is simply to find the topic on Quora, clicking on to followers and then cutting and pasting that URL into A1.

Example URL for cell A1: http://www.quora.com/Search-Engine-Optimization-SEO/followers

User's name =ImportXML(A1, "//a[@class='user']")

URL to user's profile =ImportXML(A1, "//a[@class='user']/@href")

Number of answers =ImportXML(A1, "//a[contains(@href, 'answers')]")

(Pro tip): To get around Google Doc's 50 ImportXML limit convert cells into values. First, select a bunch of cells, then hit copy, and then "paste as values."

This will permanently record the result into the cell and free up some ImportXMLs for you. I usually leave one row untouched so that I can use it later to get more data before converting those into data.

Quora: User Profiles

Example URL for cell A1 http://www.quora.com/Ian-Lurie

Photo =ImportXML(A1, "//meta[@property='og:image']/@content")

Name =ImportXML(A1, "//meta[@property='og:title']/@content")

Twitter profile =ImportXML(A1, "//a[@class='sn_icon' and contains(@href, 'twitter.com')]/@href")

Facebook profile =ImportXML(A1, "//a[@class='sn_icon' and contains(@href, 'facebook.com')]/@href")

Self description =ImportXML(A1, "//meta[@property='og:description']/@content")

Canonical URL on Quora =ImportXML(A1, "//link[@rel='canonical']/@href")

RSS feed on Quora =ImportXML(A1, "//link[@rel='alternate']/@href")

Topics =ImportXML(A1, "//a[@class='topic_name']//span[@class='name_text']/span")

Number of followers =ImportXML(A1, "//a[@class='link_label' and contains(@href, 'followers')]/span")

Number of topics =ImportXML(A1, "//a[@class='link_label' and contains(@href, 'topics')]/span")

Number of questions =ImportXML(A1, "//a[@class='link_label' and contains(@href, 'questions')]/span")

Number of answers =ImportXML(A1, "//a[@class='link_label' and contains(@href, 'answers')]/span")

Number of edits =ImportXML(A1, "//a[@class='link_label' and contains(@href, 'log')]/span")

Quora: User's Followers

Example URL for cell A1 http://www.quora.com/Ian-Lurie/followers

User's name =ImportXML(A1, "//a[@class='user']")

URL to user's profile =ImportXML(A1, "//a[@class='user']/@href")

Number of answers =ImportXML(A1, "//a[contains(@href, 'answers')]")

Reddit scraper starter spreadsheet

The URL for Reddit is best typed in manually instead of copying what you see in your browser. Replace "importxml" in the URL with your own search term.

Example URL for cell A1 http://www.reddit.com/search?q=importxml&sort=new

Title =ImportXML(A1, "//p[@class='title']/a")

URL =ImportXML(A1, "//p[@class='title']/a/@href")

Submitted on (as UTC) =ImportXML(A1, "//p[@class='title']/../p[@class='tagline']/time/@title")

Author URL =ImportXML(A1, "//p[@class='title']/../p[@class='tagline']/a[@href][1]/@href")

Posted to =ImportXML(A1, "//p[@class='title']/../p[@class='tagline']/a[@href][2]/@href")

Topsy

Topsy scraper starter spreadsheet

Example URL for cell A1 http://topsy.com/s?q=importxml

Title =ImportXML(A1, "//h3[@class='title']")

URL =ImportXML(A1, "//h3[@class='title']/a/@href")

Tweeted by name =ImportXML(A1, "//div[@class='author-bar']/a")

Tweet =ImportXML(A1, "//span[contains(@class, 'twitter-post-text')]")

Original tweet URL =ImportXML(A1, "//div[@class='actions']/a[contains(@class, 'date-link')]/@href[1]")

Trackback count =ImportXML(A1, "//div[@class='actions']/a[contains(@class, 'trackback-link')]")

Topsy: Twitter Trackback

Stats on Influential people who RT'ed you spreadsheet

This is where the fun really is. Combine lots of these ImportXML's to make some really great spreadsheets. This was fun. First, find the page you want to get trackbacks on, copy that URL into A1 and get going!

Example URL for cell A1 http://topsy.com/www.seerinteractive.com/blog/importxml-cookbook

Posts count =ImportXML(A1, "//div[@class='retweet-button-box']/span[@class='count']")

Tags list =ImportXML(A1, "//ul[@class='tags-list']/li[@class='tag']")

Related (influential): twitter name =ImportXML(A1, "//span[@class='author-bar']/*[contains(text(), 'Influential')]/../a")

Related (influential): tweet =ImportXML(A1, "//span[@class='author-bar']/*[contains(text(), 'Influential')]/../../span[@class='twitter-post-text']")

Related (influential): tweet date =ImportXML(A1, "//span[@class='author-bar']/*[contains(text(), 'Influential')]/../../../div[@class='actions']/a[contains(@class, 'date-link')]")

Related (influential): tweet URL =ImportXML(A1, "//span[@class='author-bar']/*[contains(text(), 'Influential')]/../../../div[@class='actions']/a[contains(@class, 'date-link')]/@href")

Related posts: twitter name =ImportXML(A1, "//div[@class='twitter-post-big']//a[contains(@class, 'author-link')]")

Related posts: tweet =ImportXML(A1, "//div[@class='twitter-post-big']//span[contains(@class, 'twitter-post-text')]")

Related posts: tweet date =ImportXML(A1, "//div[@class='twitter-post-big']//div[contains(@class, 'actions')]/a[contains(@class, 'date-link')]")

Related posts: tweet URL =ImportXML(A1, "//div[@class='twitter-post-big']//div[contains(@class, 'actions')]/a[contains(@class, 'date-link')]/@href")

(Pro tip): Interested in stalking people for links? (Links to slide deck by @wilreynolds)

Combine these ones with ImportXMLs from Twitter to devastating effect. Use Google Docs automate your hit list. You can be creepy and efficient!

LinkedIn: User's profile

Here are some updates to the original LinkedIn ImportXML formulas from the first ImportXML Cookbook.

Example URL for cell A1 http://www.linkedin.com/in/wilreynolds

Current =ImportXML(A1, "//dd[@class='summary-current']//ul[@class='current']/*")

Number of connections =ImportXML(A1, "//dd[@class='overview-connections']//strong")

Profile summary =ImportXML(A1, "//p[@class=' description summary']")

Twitter

Again are some more updates and additions to the original formulas from the first ImportXML Cookbook.

Example URL for cell A1 http://www.twitter.com/djchrisle

Bio =ImportXML(A1, "//span[@class='bio']")

Location =ImportXML(A1, "//span[@class='adr']")

Website =ImportXML(A1, "//a[@class='url' and @rel='me nofollow']/@href")

Following =ImportXML(A1, "//span[@id='following_count']")

Following =ImportXML(A1, "//span[@id='follower_count']")

Lists count =ImportXML(A1, "//span[@id='lists_count']")

Number of Tweets =ImportXML(A1, "//span[@id='update_count']")

Randomly selected followed names =ImportXML(A1, "//div[@id='following_list']/span/a/@title")

Randomly selected followed URLs (add http://twitter.com) =ImportXML(A1, "//div[@id='following_list']/span/a/@href")

RSS Feed of tweets =ImportXML(A1, "//div[@id='rssfeed']/*[contains(@href, 'statuses')]/@href")

RSS Feed of favorites =ImportXML(A1, "//div[@id='rssfeed']/*[contains(@href, 'favorites')]/@href")

Tweets (recent) =ImportXML(A1, "//span[@class='status-body']//span[@class='status-content']")

Tweets (timestamp) =ImportXML(A1, "//span[@class='status-body']//span[@class='published timestamp']/@data")

Users recently mentioned in tweets =ImportXML(A1, "//span[@class='entry-content']/a[@class='tweet-url username']")

Google Insights

Google Insights scraper starter spreadsheet (say that 3 times fast)

And the last song for tonight, this one goes out to @dsottimano and @dohertyjf! Google Insights!

Example URL for cell A1 http://www.google.com/insights/search/overviewReport?q=seo%20company&content=1

Headline Trends =ImportXML(A1, "//td[@class='trends-headline-headline']/a")

Headline Trends URLs =ImportXML(A1, "//td[@class='trends-headline-headline']/a/@href")

Rising search terms =ImportXML(A1, "//div[@id='overviewRisingSearchesDiv']/table/tr[2]//table[@class='trends-table-data']/tr/td[2]/a[1]")

Rising search percent =ImportXML(A1, "//div[@id='overviewRisingSearchesDiv']/table/tr[2]//table[@class='trends-table-data']/tr/td[3]")

Top search terms =ImportXML(A1, "//div[@id='overviewTopSearchesDiv']/table/tr[2]//table[@class='trends-table-data']/tr/td[2]/a[1]")

Trivia: The trick to finding this one is that the content is loaded when you specify the query string "content=1." By turning on "Log XMLHttpRequests" in Chrome's Inspect Elements window, you can watch JavaScript make AJAX calls and then start figuring out the right URLs to make calls to.