Seer Blog Screaming Frog
SEO

Screaming Frog Guide to Doing Almost Anything: 55+ Ways of Looking at a Tool

So, I admit it: I love technical SEO audits. Some of you may cringe at the thought of combing through a site for potential architecture issues, but it’s one of my favorite activities—an SEO treasure hunt, if you will.

For normal people, the overall site audit process can be daunting and time-consuming, but with tools like the Screaming Frog SEO Spider, the task can be made easier for newbs and pros alike. With a very user-friendly interface, Screaming Frog can be a breeze to work with, but the breadth of configuration options and functionality can make it hard to know where to begin.

With that in mind, I put together this comprehensive guide to Screaming Frog to showcase the various ways that SEO, PPC and other marketing folks can use the tool for site audits, keyword research, competitive analysis, link building and more!

To get started, simply select what it is that you are looking to do:

Basic Crawling

Internal Links

Site Content

Meta Data and Directives

Sitemap

General Troubleshooting

PPC & Analytics

Scraping

URL Rewriting

Keyword Research

Link Building

Bonus Round

Basic Crawling

How to crawl an entire site

By default, Screaming Frog only crawls the subdomain that you enter. Any additional subdomains that the spider encounters will be viewed as external links. In order to crawl additional subdomains, you must change the settings in the Spider Configuration menu. By checking ‘Crawl All Subdomains’, you will ensure that the spider crawls any links that it encounters to other subdomains on your site.

Step 1:

Step 2:

To make your crawl go faster, don’t check images, CSS, JavaScript, SWF, or external links.

How to crawl a single subdirectory

If you wish limit your crawl to a single folder, simply enter the URL and press start without changing any of the default settings. If you’ve overwritten the original default settings, reset the default configuration within the ‘File’ menu.

If you wish to start your crawl in a specific folder, but want to continue crawling to the rest of the subdomain, be sure to select ‘Crawl Outside Of Start Folder’ in the Spider Configuration settings before entering your specific starting URL.

How to crawl a specific set of subdomains or subdirectories

If you wish to limit your crawl to a specific set of subdomains or subdirectories, you can use RegEx to set those rules in the Include or Exclude settings in the Configuration menu.

Exclusion:

In this example, we crawled every page on havaianas.com excluding the ‘about’ pages on every subdomain.

Step 1:

Step 2:

Inclusion:

In the example below, we only wanted to crawl the English-language subdomains on havaianas.com.

I want a list of all of the pages on my site

By default, Screaming Frog is set to crawl all images, JavaScript, CSS and flash files that the spider encounters. To crawl HTML only, you’ll have to deselect ‘Check Images’, ‘Check CSS’, ‘Check JavaScript’ and ‘Check SWF’ in the Spider Configuration menu. Running the spider with these settings unchecked will, in effect, provide you with a list of all of the pages on your site that have internal links pointing to them. Once the crawl is finished, go to the ‘Internal’ tab and filter your results by ‘HTML’. Click ‘Export’, and you’ll have the full list in CSV format.

PRO Tip:

If you tend to use the same settings for each crawl, Screaming Frog now allows you to save your configuration settings:

I want a list of all of the pages in a specific subdirectory

In addition to de-selecting ‘Check Images’, ‘Check CSS’, ‘Check JavaScript’ and ‘Check SWF’, you’ll also want to de-select ‘Check Links Outside Folder’ in the Spider Configuration settings. Running the spider with these settings unchecked will, in effect, give you a list of all of the pages in your starting folder (as long as they are not orphaned pages).

How to find a list of domains that my client is currently redirecting to their money site

Enter the money site URL into ReverseInternet, then click the links in the top table to find sites that share the same IP address, nameservers, or GA code.

From here, you can gather your list of URLs using the Google Chrome extension Scraper to find all of the links with the anchor text ‘visit site.’ If Scraper is already installed, you can access it by right-clicking anywhere on the page and selecting ‘Scrape similar…’. In the pop-up window, you’ll need to change your XPath query to:

//a[text()=’visit site’]/@href

Next, press ‘Scrape’ and then ‘Export to Google Docs.’ From the Google Doc, you can then download the list as a .csv file.

Upload the .csv file to Screaming Frog, then use ‘List’ mode to check the list of URLs.

When the spider is finished, you’ll see the status codes in the ‘Internal’ tab, or you can look in the ‘Response Codes’ tab and filter by ‘Redirection’ to view all of the domains that are being redirected to your money site or elsewhere.

NB: When uploading the .csv into Screaming Frog, you must select ‘CSV’ as the filetype, otherwise the program will close in error.

PRO Tip:

You can also use this method to identify domains that your competitors own, and how they are being used. Check out what else you can learn about competitor sites below.

How to find all of the subdomains on a site and verify internal links.

Enter the root domain URL into ReverseInternet, then click on the ‘Subdomains’ tab to view a list of subdomains.

Then, use Scrape Similar to gather the list of URLs, using the XPath query:

//a[text()=’visit site’]/@href

Export your results into a CSV, then load the CSV into Screaming Frog using ‘List’ mode. Once the spider has finished running, you’ll be able to see status codes, as well as any links on the subdomain homepages, anchor text and duplicate page titles among other things.

How to crawl an e-commerce site or other large site

Screaming Frog is not built to crawl hundreds of thousands of pages, but there are a couple of things that you can do to avoid breaking the program when crawling large sites. First, you can increase the memory allocation of the spider. Second, you can break down the crawl by subdirectory or only crawl certain parts of the site using your Include/Exclude settings. Third, you can choose not to crawl images, JavaScript, CSS and flash. By deselecting these options in the Configuration menu, you can save memory by crawling HTML only.

PRO Tip:

Until recently, you might have found that your crawls timed out on large sites, however with Screaming Frog Version 2.11, you can tell the program to pause on high memory usage. This fail-safe setting helps to keep the program from crashing before you have the opportunity to save the data or increase the memory allocation. This is currently a default setting, but if you are planning on crawling a large site, be sure that ‘Pause On High Memory Usage’ is checked in the ‘Advanced’ tab of Spider Configuration menu.

How to crawl a site hosted on an older server

In some cases, older servers may not be able to handle the default number of URL requests per second. To change your crawl speed, choose ‘Speed’ in the Configuration menu, and in the pop-up window, select the maximum number of threads that should run concurrently. From this menu, you can also choose the maximum number of URLs requested per second.

PRO Tip:

If you find that your crawl is resulting in a lot of server errors, go to the ‘Advanced’ tab in the Spider Configuration menu, and increase the value of the ‘Response Timeout’ and of the ‘5xx Response Retries’ to get better results.

How to crawl a site that requires cookies

Although search bots don’t accept cookies, if you are crawling a site and need to allow cookies, simply select ‘Allow Cookies’ in the ‘Advanced’ tab of the Spider Configuration menu.

How to crawl using a proxy or a different user-agent

To crawl using a proxy, select ‘Proxy’ in the ‘Configuration’ menu, and enter your proxy information.

To crawl using a different user agent, select ‘User Agent’ in the ‘Configuration’ menu, then select a search bot from the drop-down or type in your desired user agent strings.

How to crawl pages that require authentication

When the Screaming Frog spider comes across a page that is password-protected, a pop-up box will appear, in which you can enter the required username and password.

In order to turn off authentication requests, deselect ‘Request Authentication’ in the ‘Advanced’ tab of the Spider Configuration menu.

Internal Links

I want information about all of the internal and external links on my site (anchor text, directives, links per page etc.)

If you do not need to check the images, JavaScript, flash or CSS on the site, de-select these options in the Spider Configuration menu to save processing time and memory.

Once the spider has finished crawling, use the Advanced Export menu to export a CSV of ‘All Links’. This will provide you with all of the link locations, as well as the corresponding anchor text, directives, etc.

For a quick tally of the number of links on each page, go to the ‘Internal’ tab and sort by ‘Outlinks’. Anything over 100, might need to be reviewed.

Need something a little more processed? Check out this tutorial on visualizing internal link data with pivot tables by @JoshuaTitsworth and this one about using NodeXL with Screaming Frog to visualize your internal link graph by @aleyda.

How to find broken internal links on a page or site

If you do not need to check the images, JavaScript, flash or CSS of the site, de-select these options in the Spider Configuration menu to save processing time and memory.

Once the spider has finished crawling, sort the ‘Internal’ tab results by ‘Status Code’. Any 404’s, 301’s or other status codes will be easily viewable.

Upon clicking on any individual URL in the crawl results, you’ll see information change in the bottom window of the program. By clicking on the ‘In Links’ tab in the bottom window, you’ll find a list of pages that are linking to the selected URL, as well as anchor text and directives used on those links. You can use this feature to identify pages where internal links need to be updated.

To export the full list of pages that include broken or redirected links, choose ‘Redirection (3xx) In Links’ or ‘Client Error (4xx) In Links’ or ‘Server Error (5xx) In Links’ in the ‘Advanced Export’ menu, and you’ll get a CSV export of the data.

How to find broken outbound links on a page or site (or all outbound links in general)

After de-selecting ‘Check Images’, ‘Check CSS’, ‘Check JavaScript’ and ‘Check SWF’ in the Spider Configuration settings, make sure that ‘Check External Links’ remains selected.

After the spider is finished crawling, click on the ‘External’ tab in the top window, sort by ‘Status Code’ and you’ll easily be able to find URLs with status codes other than 200. Upon clicking on any individual URL in the crawl results and then clicking on the ‘In Links’ tab in the bottom window, you’ll find a list of pages that are pointing to the selected URL. You can use this feature to identify pages where outbound links need to be updated.

To export your full list of outbound links, click ‘Export’ on the internal tab. You can also set the filter to export links to external image files, external JavaScript, external CSS, external Flash files, and external PDFs. To limit your export to pages, filter by ‘HTML’.

For a complete listing of all the locations and anchor text of outbound links, select ‘All Out Links’ in the ‘Advanced Export’ menu, then filter the ‘Destination’ column in the exported CSV to exclude your domain.

How to find links that are being redirected

After the spider has finished crawling, select the ‘Response Codes’ tab in the top window, then filter by ‘Redirection (3xx)’. This will provide you with a list of any internal links and outbound links that are redirecting. Sort by ‘Status Code’, and you’ll be able to break the results down by type. Click on the ‘In Links’ tab in the bottom window to view all of the pages where the redirecting link is used.

If you export directly from this tab, you will only see the data that is shown in the top window (original URL, status code, and where it redirects to).

To export the full list of pages that include redirected links, you will have to choose ‘Redirection (3xx) In Links’ in the ‘Advanced Export’ menu. This will return a CSV that includes the location of all your redirected links. To show internal redirects only, filter the ‘Destination’ column in the CSV to include only your domain.

PRO Tip:

Use a VLOOKUP between the 2 export files above to match the Source and Destination columns with the final URL location.

Sample formula:

=VLOOKUP([@Destination],’response_codes_redirection_(3xx).csv’!$A$3:$F$50,6,FALSE)

(Where ‘response_codes_redirection_(3xx).csv’ is the CSV file that contains the redirect URLs and ‘50’ is the number of rows in that file.)

Need to find and fix redirect chains? @dan_shure gives the breakdown on how to do it here.

I am looking for internal linking opportunities

Scaling Internal Link Building with Screaming Frog & Majestic by @JHTScherck. ‘Nuff said.

Site Content

How to identify pages with thin content

After the spider has finished crawling, go to the ‘Internal’ tab, filter by HTML, then scroll to the right to the ‘Word Count’ column. Sort the ‘Word Count’ column from low to high to find pages with low text content. You can drag and drop the ‘Word Count’ column to the left to better match the low word count values to the appropriate URLs. Click ‘Export’ in the ‘Internal’ tab if you prefer to manipulate the data in a CSV instead.

PRO Tip for E-commerce Sites:

While the word count method above will quantify the actual text on the page, there’s still no way to tell if the text found is just product names or if the text is in a keyword-optimized copy block. To figure out the word count of your text blocks, use ImportXML2 by @iamchrisle to scrape the text blocks on any list of pages, then count the characters from there. If xPath queries aren’t your strong suit, the xPath Helper Chrome extension does a pretty solid job at figuring out the xPath for you. Obviously, you can also use these scraped text blocks to begin to understand the overall word usage on the site in question, but that, my friends, is another post…

I want a list of the image links on a particular page

If you’ve already crawled a whole site or subfolder, simply select the page in the top window, then click on ‘Image Info’ tab in the bottom window to view all of the images that were found on that page. The images will be listed in the ‘To’ column.

PRO Tip:

Right click on any entry in the bottom window to copy or open a URL.

Alternately, you can also view the images on a single page by crawling just that URL. Make sure that your crawl depth is set to ‘1’ in the Spider Configuration settings, then once the page is crawled, click on the ‘Images’ tab, and you’ll see any images that the spider found.

Finally, if you prefer a CSV, use the ‘Advanced Export’ menu to export ‘All Image Alt Text’ to see the full list of images, where they are located and any associated alt text.

How to find images that are missing alt text or images that have lengthy alt text

First, you’ll want to make sure that ‘Check Images’ is selected in the Spider Configuration menu. After the spider has finished crawling, go to the ‘Images’ tab and filter by ‘Missing Alt Text’ or ‘Alt Text Over 100 Characters’. You can find the pages where any image is located by clicking on the ‘Image Info’ tab in the bottom window. The pages will be listed in the ‘From’ column.

Alternately, in the ‘Advanced Export’ menu, you can save time and export, ‘All Image Alt Text’ or ‘Images Missing Alt Text’ into a CSV. The resulting file will show you all of the pages where each image is used on the site.

How to find every CSS file on my site

In the Spider Configuration menu, select ‘Check CSS’ before crawling, then when the crawl is finished, filter the results in the ‘Internal’ tab by ‘CSS’.

How to find every JavaScript file on my site

In the Spider Configuration menu, select ‘Check JavaScript’ before crawling, then when the crawl is finished, filter the results in the ‘Internal’ tab by ‘JavaScript’.

How to identify all of the jQuery plugins used on the site and what pages they are being used on

First, make sure that ‘Check JavaScript’ is selected in the Spider Configuration menu. After the spider has finished crawling, filter the ‘Internal’ tab by ‘JavaScript’, then search for ‘jquery’. This will provide you with a list of plugin files. Sort the list by the ‘Address’ for easier viewing if needed, then view ‘InLinks’ in the bottom window or export the data into a CSV to find the pages where the file is used. These will be in the ‘From’ column.

Alternately, you can use the ‘Advanced Export’ menu to export a CSV of ‘All Links’ and filter the ‘Destination’ column to show only URLs with ‘jquery’.

PRO Tip:

Not all jQuery plugins are bad for SEO. If you see that a site uses jQuery, the best practice is to make sure that the content that you want indexed is included in the page source and is served when the page is loaded, not afterward. If you are still unsure, Google the plugin for more information on how it works.

How to find where flash is embedded on-site

In the Spider Configuration menu, select ‘Check SWF’ before crawling, then when the crawl is finished, filter the results in the ‘Internal’ tab by ‘Flash’.

NB: This method will only find .SWF files that are linked on a page. If the flash is pulled in through JavaScript, you’ll need to use a custom filter.

How to find any internal PDFs that are linked on-site

After the spider has finished crawling, filter the results in the ‘Internal’ tab by ‘PDF’.

How to understand content segmentation within a site or group of pages

If you want to find pages on your site that contain a specific type of content, set a custom filter for an HTML footprint that is unique to that page. This needs to be set *before* running the spider. @stephpchang has a great tutorial on segmenting syndicated content from original content using custom filters.

How to find pages that have social sharing buttons

To find pages that contain social sharing buttons, you’ll need to set a custom filter before running the spider. To set a custom filter, go into the Configuration menu and click ‘Custom’. From there, enter any snippet of code from the page source.

In the example above, I wanted to find pages that contain a Facebook ‘like’ button, so I created a filter for http://www.facebook.com/plugins/like.php.

How to find pages that are using iframes

To find pages that use iframes, set a custom filter for <iframe before running the spider.

How to find pages that contain embedded video or audio content

To find pages that contain embedded video or audio content, set a custom filter for a snippet of the embed code for Youtube, or any other media player that is used on the site.

Meta Data and Directives

How to identify pages with lengthy page titles, meta descriptions, or URLs

After the spider has finished crawling, go to the ‘Page Titles’ tab and filter by ‘Over 70 Characters’ to see the page titles that are too long. You can do the same in the ‘Meta Description’ tab or in the ‘URI’ tab.

How to find duplicate page titles, meta descriptions, or URLs

After the spider has finished crawling, go to the ‘Page Titles’ tab, then filter by ‘Duplicate’. You can do the same thing in the ‘Meta Description’ or ‘URI’ tabs.

How to find duplicate content and/or URLs that need to be rewritten/redirected/canonicalized

After the spider has finished crawling, go to the ‘URI’ tab, then filter by ‘Underscores’, ‘Uppercase’ or ‘Non ASCII Characters’ to view URLs that could potentially be rewritten to a more standard structure. Filter by ‘Duplicate’ and you’ll see all pages that have multiple URL versions. Filter by ‘Dynamic’ and you’ll see URLs that include parameters.

Additionally, if you go to the ‘Internal’ tab, filter by ‘HTML’ and scroll the to ‘Hash’ column on the far right, you’ll see a unique series of letters and numbers for every page. If you click ‘Export’, you can use conditional formatting in Excel to highlight the duplicated values in this column, ultimately showing you pages that are identical and need to be addressed.

How to identify all of the pages that include meta directives e.g.: nofollow/noindex/noodp/canonical etc.

After the spider has finished crawling, click on the ‘Directives’ tab. To see the type of directive, simply scroll to the right to see which columns are filled, or use the filter to find any of the following tags:

  • index
  • noindex
  • follow
  • nofollow
  • noarchive
  • nosnippet
  • noodp
  • noydir
  • noimageindex
  • notranslate
  • unavailable_after
  • refresh
  • canonical

How to verify that my robots.txt file is functioning as desired

By default, Screaming Frog will comply with robots.txt. As a priority, it will follow directives made specifically for the Screaming Frog user agent. If there are no directives specifically for the Screaming Frog user agent, then the spider will follow any directives for Googlebot, and if there are no specific directives for Googlebot, the spider will follow global directives for all user agents. The spider will only follow one set of directives, so if there are rules set specifically for Screaming Frog it will only follow those rules, and not the rules for Googlebot or any global rules. If you wish to block certain parts of the site from the spider, use the regular robots.txt syntax with the user agent ‘Screaming Frog SEO Spider’. If you wish to ignore robots.txt, simply select that option in the Spider Configuration settings.

How to find or verify Schema markup or other microdata on my site

To find every page that contains Schema markup or any other microdata, you need to use custom filters. Simply click on ‘Custom’ in the Configuration Menu and enter the footprint that you are looking for.

To find every page that contains Schema markup, simply add the following snippet of code to a custom filter: itemtype=http://schema.org

To find a specific type of markup, you’ll have to be more specific. For example, using a custom filter for ‹span itemprop=”ratingValue”› will get you all of the pages that contain Schema markup for ratings.

You can enter up to 5 different filters per crawl. Finally, press OK and proceed with crawling the site or list of pages.

When the spider has finished crawling, select the ‘Custom’ tab in the top window to view all of the pages that contain your footprint. If you entered more than one custom filter, you can view each one by changing the filter on the results.

Sitemap

How to create an XML Sitemap

After the spider has finished crawling your site, click on the ‘Advanced Export’ menu and select ‘XML Sitemap’.

Save your sitemap, then open it with Excel. Select ‘Read Only’ and open the file ‘As an XML table’. You may receive an alert that certain schema cannot be mapped to a worksheet. Just press ‘Yes’.

Now that your Sitemap is in table form, you can easily edit the change frequency, priority and other values. Be sure to double-check that the Sitemap only includes a single, preferred (canonical) version of each URL, without parameters or other duplicating factors. Once any changes have been made, re-save your file as an XML file.

How to check my existing XML Sitemap

First, you’ll need to have a copy of the Sitemap saved on your computer. You can save any live Sitemap by visiting the URL and saving the file, or by importing it into Excel. @RichardBaxter actually has great instructions for importing your Sitemap into Excel and checking it using SEOTools, but since we are talking about Screaming Frog, read on:

Once you have the XML file saved to your computer, go to the ‘Mode’ menu in Screaming Frog and select ‘List’. Then, click on ‘Select File’ at the top of the screen, choose your file and start the crawl. Once the spider has finished crawling, you’ll be able to find any redirects, 404 errors, duplicated URLs and more “Sitemap dirt” in the ‘Internal’ tab.

General Troubleshooting

How to identify why certain sections of my site aren’t being indexed or aren’t ranking

Wondering why certain pages aren’t being indexed? First, make sure that they weren’t accidentally put into the robots.txt or tagged as noindex. Next, you’ll want to make sure that spiders can reach the pages by checking your internal links. Once the spider has crawled your site, simply export the list of internal URLs as a .CSV file, using the ‘HTML’ filter in the ‘Internal’ tab.

Open up the CSV file, and in a second sheet, paste the list of URLs that aren’t being indexed or aren’t ranking well. Use a VLOOKUP to see if the URLs in your list on the second sheet were found in the crawl.

PRO tip:

If you really want to be fancy, try using my Pages Not Indexed Google Doc/Excel tool, which, in a couple of minutes, can provide you with the possible reasons why particular pages aren’t indexed or ranking.

How to check if my site migration/redesign was successful

@ipullrank has an excellent Whiteboard Friday on this topic, but the general idea is that you can use Screaming Frog to check whether or not old URLs are being redirected by using the ‘List’ mode to check status codes. If the old URLs are throwing 404’s, then you’ll know which URLs still need to be redirected.

How to find slow loading pages on my site

After the spider has finished crawling, go to the ‘Response Codes’ tab and sort by the ‘Response Time’ column from high to low to find pages that may be suffering from a slow loading speed.

How to find malware or spam on my site

First, you’ll need to identify the footprint of the malware or the spam. Next, in the Configuration menu, click on ‘Custom’ and enter the footprint that you are looking for.

You can enter up to 5 different footprints per crawl. Finally, press OK and proceed with crawling the site or list of pages.

When the spider has finished crawling, select the ‘Custom’ tab in the top window to view all of the pages that contain your footprint. If you entered more than one custom filter, you can view each one by changing the filter on the results.

PPC & Analytics

How to verify that my Google Analytics code is on every page, or on a specific set of pages on my site

SEER Analytics star @RachaelGerson wrote a killer post on this subject: Use Screaming Frog to Verify Google Analytics Code. Check it out!

How to validate a list of PPC URLs in bulk

Save your list in .txt or .csv format, then change your ‘Mode’ settings to ‘List’.

Next, select your file to upload, and press ‘Start’. See the status code of each page by looking at the ‘Internal’ tab.

To check if your pages contain your GA code, check out this post on using custom filters to verify Google Analytics code by @RachaelGerson.

Scraping

How to scrape the meta data for a list of pages

So, you’ve harvested a bunch of URLs, but you need more information about them? Set your mode to ‘List’, then upload your list of URLs in .txt or .csv format. After the spider is done, you’ll be able to see status codes, outbound links, word counts, and of course, meta data for each page in your list.

How to scrape a site for all of the pages that contain a specific footprint

First, you’ll need to identify the footprint. Next, in the Configuration menu, click on ‘Custom’ and enter the footprint that you are looking for.

You can enter up to 5 different footprints per crawl. Finally, press OK and proceed with crawling the site or list of pages. In the example below, I wanted to find all of the pages that say ‘Please Call’ in the pricing section, so I found and copied the HTML code from the page source.

When the spider has finished crawling, select the ‘Custom’ tab in the top window to view all of the pages that contain your footprint. If you entered more than one custom filter, you can view each one by changing the filter on the results.

PRO Tip:

If you are pulling product data from a client site, you could save yourself some time by asking the client to pull the data directly from their database. The method above is meant for sites that you don’t have direct access to.

URL Rewriting

How to find and remove session id or other parameters from my crawled URLs

To identify URLs with session ids or other parameters, simply crawl your site with the default settings. When the spider is finished, click on the ‘URI’ tab and filter to ‘Dynamic’ to view all of the URLs that include parameters.

To remove parameters from being shown for the URLs that you crawl, select ‘URL Rewriting’ in the configuration menu, then in the ‘Remove Parameters’ tab, click ‘Add’ to add any parameters that you want removed from the URLs, and press ‘OK.’ You’ll have to run the spider again with these settings in order for the rewriting to occur.

How to rewrite the crawled URLs (e.g: replace .com with .co.uk, or write all URLs in lowercase)

To rewrite any URL that you crawl, select ‘URL Rewriting’ in the Configuration menu, then in the ‘Regex Replace’ tab, click ‘Add’ to add the RegEx for what you want to replace.

Once you’ve added all of the desired rules, you can test your rules in the ‘Test’ tab by entering a test URL in the space labeled ‘URL before rewriting’. The ‘URL after rewriting’ will be updated automatically according to your rules.

If you wish to set a rule that all URLs are returned in lowercase, simply select ‘Lowercase discovered URLs’ in the ‘Options’ tab. This will remove any duplication by capitalized URLs in the crawl.

Remember that you’ll have to actually run the spider with these settings in order for the URL rewriting to occur.

Keyword Research

How to know which pages my competitors value most

Generally speaking, competitors will try to spread link popularity and drive traffic to their most valuable pages by linking to them internally. Any SEO-minded competitor will probably also link to important pages from their company blog. Find your competitor’s prized pages by crawling their site, then sorting the ‘Internal’ tab by the ‘Inlinks’ column from highest to lowest, to see which pages have the most internal links.

To view pages linked from your competitor’s blog, deselect ‘Check links outside folder’ in the Spider Configuration menu and crawl the blog folder/subdomain. Then, in the ‘External’ tab, filter your results using a search for the URL of the main domain. Scroll to the far right and sort the list by the ‘Inlinks’ column to see which pages are linked most often.

PRO Tip:

Drag and drop columns to the left or right to improve your view of the data.

How to know what anchor text my competitors are using for internal linking

In the ‘Advanced Export’ menu, select ‘All Anchor Text’ to export a CSV containing all of the anchor text on the site, where it is used and what it’s linked to.

How to know which meta keywords (if any) my competitors have added to their pages

After the spider has finished running, look at the ‘Meta Keywords’ tab to see any meta keywords found for each page. Sort by the ‘Meta Keyword 1’ column to alphabetize the list and visually separate the blank entries, or simply export the whole list.

Link Building

How to analyze a list of prospective link locations

If you’ve scraped or otherwise come up with a list of URLs that needs to be vetted, you can upload and crawl them in ‘List’ mode to gather more information about the pages. When the spider is finished crawling, check for status codes in the ‘Response Codes’ tab, and review outbound links, link types, anchor text and nofollow directives in the ‘Out Links’ tab in the bottom window. This will give you an idea of what kinds of sites those pages link to and how. To review the ‘Out Links’ tab, be sure that your URL of interest is selected in the top window.

Of course you’ll want to use a custom filter to determine whether or not those pages are linking to you already.

You can also export the full list of out links by clicking on ‘All Out Links’ in the ‘Advanced Export Menu’. This will not only provide you with the links going to external sites, but it will also show all internal links on the individual pages in your list.

For more great ideas for link building, check out these two awesome posts on link reclamation and using Link Prospector with Screaming Frog by SEER’s own @EthanLyon and @JHTScherck.

How to find broken links for outreach opportunities

So, you found a site that you would like a link from? Use Screaming Frog to find broken links on the desired page or on the site as a whole, then contact the site owner, suggesting your site as a replacement for the broken link where applicable, or just offer the broken link as a token of good will.

How to verify my backlinks and view the anchor text

Upload your list of backlinks and run the spider in ‘List’ mode. Then, export the full list of outbound links by clicking on ‘All Out Links’ in the ‘Advanced Export Menu’. This will provide you with the URLs and anchor text/alt text for all links on those pages. You can then use a filter on the ‘Destination’ column of the CSV to determine if your site is linked and what anchor text/alt text is included.

@JustinRBriggs has a nice tidbit on checking infographic backlinks with Screaming Frog. Check out the other 17 link building tools that he mentioned, too.

How to make sure that I’m not part of a link network

Want to figure out if a group of sites are linking to each other? Check out this tutorial on visualizing link networks using Screaming Frog and Fusion Tables by @EthanLyon.

I am in the process of cleaning up my backlinks and need to verify that links are being removed as requested

Set a custom filter that contains your root domain URL, then upload your list of backlinks and run the spider in ‘List’ mode. When the spider has finished crawling, select the ‘Custom’ tab to view all of the pages that are still linking to you.

Bonus Round

           

Did you know that by right-clicking on any URL in the top window of your results, you could do any of the following?

  • Copy or open the URL
  • Re-crawl the URL or remove it from your crawl
  • Export URL Info, In Links, Out Links, or Image Info for that page
  • Check indexation of the page in Google, Bing and Yahoo
  • Check backlinks of the page in Majestic, OSE, Ahrefs and Blekko
  • Look at the cached version/cache date of the page
  • See older versions of the page
  • Validate the HTML of the page
  • Open robots.txt for the domain where the page is located
  • Search for other domains on the same IP

Likewise, in the bottom window, with a right-click, you can:

  • Copy or open the URL in the ‘To’ for ‘From’ column for the selected row

Tell us what else you’ve discovered!

Final Remarks

In closing, I hope that this guide gives you a better idea of what Screaming Frog can do for you. It has saved me countless hours, so I hope that it helps you, too!

By the way, I am not affiliated with Screaming Frog; I just think that it’s an awesome tool.

More about me:

Aichlee Bushnell is an SEO Associate at SEER Interactive. Follow her on Twitter!
Check out our open positions!