At Pubcon Vegas, Matt Cutts announced the link disavow tool. Some say to use it & have proof it works, some say you’re providing the enemy with information & doing the gruntwork for them. It seems like a last resort for many sites, but I wanted to look at the disavow tool from a different angle; just another way to significantly hurt your site.
Starting with the disavow, I’ll touch on some of the top ways a site can effectively be killed off with ease or by accident.
INCORRECT USE OF THE CANONICAL
At one point, SEER relaunched our blog, but during the relaunch we did something by mistake with the canonical tag. The SEER blog has over 500 posts in it, seven to each page. Over 70 “older post” (paginated) pages. When we relaunched the blog, on every page/2, page/3, page/4 all the way to page/70, it had the canonical pointing that page back to just /blog.
This, combined with the relaunch & new URL structure effectively started deindexing any post that wasn’t on the blog homepage. Yikes. Luckily this was caught within the first few days of the relaunch and all is well now.
Summary: If you’re implementing a canonical tag, make sure there’s a good reason for it and that it’s being used properly. (What is the canonical?)
DISAVOW ALL THE LINKS
Don’t use the tool unless you are sure you need to use it. “use caution with this tool….please start slow.” is the exact quote from Cutts at Pubcon.
Disavowing links you think might impact your site is dangerous, as those may be the exact links that are helping drive value to your site. Even if they look spammy, do a quick check before disavowing any.
Adam’s Quick Check:
1. Do a site: search for the site you’re evaluating. Example here. Do any pages show up? If not, HUGE indicator this is a bad place to be.
2. How many pages are indexed? I wrote about article directories being significantly devalued. If you’re only seeing the homepage indexed for a site you know has mountains of pages, another indicator it could be a bad place to be.
3. Are there significant ads above the fold? Blatant “Paid Links” section? Does the site have hundreds of exact match backlinks pointing to it? All indicators that compound into problems.
302 REDIRECT (OR NO REDIRECT)
When launching a new site and you’re using a brand new URL, your SEO company should play a critical role in a redirect strategy. It’s less common, but once in a while we see sites that launch without redirecting any of the previous sites’ pages. More common, we’ll see some 302 redirects implemented, which simply put doesn’t pass value. No value = no rankings = you ain’t makin any money OR you’re overspending to compensate through PPC.
If you’re rebranding & launch a new site, it’s a necessity to have your old site 301 redirect to the appropriate pages. If you’ve purchased another company, merged, etc. you need to make sure sites that aren’t going to be used and have value are 301 redirected to the appropriate place.
Summary: A brand new site is just that, brand new to Google. There is very little, if any, trust from Google. Implementing 301 redirects from a site that has history & related content will help that new site have a much more fluid & valuable launch.
SEER has a robots.txt file for our site found at seerinteractive.com/robots.txt. In it, we disallow our wordpress login pages. We don’t need the engines to index those, so we use this to let the engines know.
There are some great reasons why robots.txt is used and why the engines abide by the disallow command it provides. One of the largest that comes to mind is if you’re relaunching your site and have a dev site for testing. It could be password protected so the engines couldn’t index it, but many times we’ll see dev sites use the robots.txt disallow all command to make sure their dev site isn’t leaked. Another use would be to save crawl budget if the engines are indexing search pages or filter pages that provide no extra value if found in results (edit - Tom Conte had a great comment below that a page can still show up in engines, poorly, with only using the robots.txt exclude. Just using the meta noindex should remove the result from SERPs)
If we were to just have disallow: /, this would let the engines know that we do not want them to index our site. There is so much power in just that little slash and it can completely wipe away your presence on the interwebs. Be mindful of what folders and the use of the wildcard too, as this has the potential to remove all pages OR block spiders from ever accessing a page.
1. Robots.txt exclude brand pages for a site that contain individual products.
2. In product categories, paginated pages like p=2, p=3, contain the canonical pointing back to the main category page.
3. Product pages may now only be indexed if they’re in a sitemap OR linked to externally.
Summary: In the hands of someone with a little bit of knowledge, this can be extremely dangerous. Through robots.txt, you can easily tell the engines to stop indexing parts or all of your site. Use this wisely. (What is robots.txt?)
There are probably 20 different ways to kill your site by accident, but I wanted to keep a few of these front of mind. Often times we look into spammy links, quality content, internal linking, and other issues where it’s actually a site killing problem that is the heart of the problem.
Please add more ways that come to mind or read more about our case studies at seerinteractive.com to understand how SEER has helped companies avoid these mistakes.