100% Panda Recovery: What We Learned to Identify Issues & Get Your Traffic Back
February 24th, 2011: A day that won’t soon be forgotten by myself and, I’m sure, the majority of the SEO community. On the same day the late Steve Jobs celebrated his birthday and I finally remembered to get a Valentine’s Day gift (kidding!), Google rolled out an algorithm update that was reported to affect 12% of US searches. Let’s take a moment to put that into perspective; in July 2011 comScore reported that Americans conducted 12.5 billion searches on Google alone (source). That comes out to about 390 million searches per day. So in one day, this algorithm update affected 46.8 million searches in the US.
We certainly weren’t the only company to get calls asking for help. After a grueling 2 weeks of intense digging, gallons of coffee, and an all-nighter in the SEERplex that reminded me of my college days, we were able to produce a report that we were confident would address these issues. After 8 months of patience and persistence, we learned last week that the client experienced a 100% recovery of the traffic lost as a result of Panda.
The nice part about this strategy is that, on the surface, it’s pretty straightforward! If you produce unique content that provides a value-add for users, Google will reward you for it. The strategies outlined below are intended to help you identify if you’ve been hit by Panda and present opportunities to improve the inherent quality of your site, thus making it more relevant and useful for users, which in turn should be reflected in the Google SERPs.
Duplicate Content
Many SEOs (myself included) believe that Panda was a ratio-based penalty. This essentially means that even if you have unique, valuable content on your top pages, if the majority of your pages are low value or duplicate content the overall domain will still suffer as a result. A good way to see the ratio of your duplicate pages is a simple site: search. For example, let’s say your site:website-hit-by-panda.com results bring back 15,800 indexed pages. This would be fine, until you nail down to some deeper SERP pages and find this:
If Google is telling you they are omitting all but 980 pages because they are “very similar” to what has already been included, that’s a pretty good indication that the majority of your content is duplicate or low quality in c0mparison.
Template Content
I’ve researched quite a few Panda-stricken sites, and one of the biggest issues I’m seeing is duplicate content. For example, let’s say you run a business that sells ATM machines. Your ATM machines are available in all 50 states and over 25,000 cities and towns. You know that Google likes to reward the most relevant page for a query, so in order to capitalize on local queries you’ve created a page for each city and town you do business in. To save time and resources, you templated out your local content so that it includes the same details on every page and just swaps out the city/town. Your content might look like this:
Looking for ATM machines in Philadelphia? ATM Machine Depot is your #1 source for Philadelphia ATM machines and services. We charge an industry-best 1% on withdrawal transactions and have local service professional available to restock your machines within 24 hours. ATM Machine Depot is license by the FDIC as an insured money teller, so your cash is protected in the event of machine malfunction, theft, or fraud. Whether you own a Philadelphia small business, restaurant, nightclub, or even a concert hall, ATM Machine Depot is here to help. Turn to ATM Machine Depot for all your portable bank telling needs.
Don’t think Google Want proof that Google picks this up? Perform a search query for the following:
site:atmmachinedepot.com “ATM Machine Depot is your #1 source for * ATM machines and services. We charge an industry-best 1% on withdrawal transactions and have local service professional available to restock your machines within 24 hours”
By using the * wildcard, you’re asking Google to find any block of content that matches the above regardless of what city name you’ve injected. If Google is finding 25,000 pages on your site with this same block of content, they are not going to reward you with rankings.
The Solution
To put it bluntly, write unique content! However, while that isn’t always possible there are ways to work around this. I recently wrote a post about using Excel to generate pseudo-unique content by combining various sentences with unique variables and randomizing the order. While this is not an ideal situation it does provide your pages with content that is, for the most part, unique to that page. Furthermore, the more unique qualifiers you can inject into the page, such as local venues your ATMs are currently installed at, reinforces the local theme and improves the perception of those pages in the eyes of the engines.
Disclaimer: Google is constantly working to detect this type of content strategy. As the semantic algorithms evolve, I’m certain they will (eventually) be able to identify duplicate content holistically on a website. The best solution is always to create unique, value-adding content for the pages.
If you cannot produce unique content for these pages, it is still important to make sure that Google does not continue to penalize you for them. Some steps for addressing this:
- Identify the least valuable pages on your site, presumably based on conversion rate or traffic.
- Block these pages from Google via the NOINDEX tag or the Robots.txt file.
- Concentrate on the most valuable pages first and produce content that genuinely adds to the user experience.
- As you create unique content for the blocked pages, reinclude them in the index through Sitemaps, “Submit to Index” option in WMT, and internal linking.
Organizational Structure
Along the same veins as duplicate content, make sure your URL structure is such that duplicate pages are not being generated by your CMS. For example, it is (unfortunately) not uncommon to see a site generate duplicate pages as a result of the navigation path. In keeping with our ATM Machines example, you might see pages generated as follows:
- http://www.atmmachinedepot.com/pa/philadelphia
- http://www.atmmachinedepot.com/philadelphia-atm-machines
- http://www.atmmachinedepot.com/philadelphia-pa-atms
- http://www.atmmachinedepot.com/city/philadelphia
Logically, when Google is searching your site for the best page about Philadelphia ATMs, you want to make sure that all the signals your giving point to one page. The more pages we have that cater to a particular theme, the harder it is for Google to identify the best page. Ideally, you would implement a 301 redirect for all duplicate pages back to the preferred URL. If this does not work, you can also consider the rel=canonical tag.
Sitemaps
Particularly with XML sitemaps, many websites will auto-generate them based on the pages live on your site. If you have duplicate pages being created and included in your XML sitemap, you’re not only hosting duplicate content on your site but you’re actively promoting this content to Google! Make sure that your XML sitemaps include only those pages that you want Google to visit regularly based on their value to your site as a whole.
Linking
Personally, I’m a believer that the Panda update was not a linking-based change in the algorithm. However, there are definitely ways to leverage linking to help improve your Panda struggles.
- Concentrate on building links to deep pages on your site, which shows Google that these hyper-specific pages are valuable to the end user.
- Audit your internal link structure to make sure that you’re passing adequate value throughout the site from your top-level pages.
- Limit the number of internal links on any given internal page so that more link juice is transferred to the destination page.
At the end of the day…help yourself by helping your users!
Essentially, attack Panda problems the same way you would any SEO project:
- Clean site architecture that appropriately highlights the pages providing the most value-add to the user.
- Quality, unique, keyword-rich content that provides a value-add to the user.
- Generate links to deeper pages that indicate an endorsement of those pages as a value-add to the user.
If you’ve seen any Panda clues or suggestions I may have missed, let me know in the comments or on Twitter.
Posted: 10.25.11


CPC_Andrew:
Great post. Definitely worth sharing. Thanks
Brett Snyder:
Thanks Andrew!
Darren Shaw:
Working on a Panda issue for a client right now, and hot damn, you have just laid out a detailed plan of attack for me. Thank you! I love SEER. You guys are the brightest in the biz in my opinion.
Julian:
You think blocking with robots.txt is still a good way? Do you know some test cases with big pages?
Peter:
If you want to definitely block SE’s from any specific URL, by far the most effective method is .htaccess with a password (imho). If another site links to a robot.txt URL, Google will still find it because of that site’s external link. If anybody has a different idea, please share. Great article!
Brett Snyder:
@Darren Thanks for the flattering words, appreciate it!
@Julian Robots.txt is definitely not a perfect solution, but it is an indicator to the engines that this content should not be included. The idea here is just to provide almost a “We’re working on it!” signal while the new content is created. As both you and Peter alluded too, Google can still find and index this content but at the same time we’re providing indicators that this content should not be included which should help stave off some of the Panda penalties.
@Peter Haven’t actually used the htaccess w/ password for blocking pages…but if I’m understanding correctly that would still block users from accessing the page? My thoughts here were to still allow the pages to exist in the overall site structure (as users might use internal search or internal navigation to still reach these pages) while still buying some time to create new content. Liking the idea though, will work on drawing up some tests with it in the future!
Thanks all!
nobody:
Very interesting what you say.
My website was blocked on 28 September 2011 and hasn’t recovered yet. I am doing my best to improve and analyze the website that have been hit and the ones that haven’t been hit yet. My life was changed that day. I tried everything and did everything I knew, but since I learned a lot of things.
First question, if you have the time to answer – do you think that websites that have been hit on 28 September could recover this year? I have examples of websites that were hit on 28 September and recoered a month after, so it must be possible.
What if this Panda (September – October one) is more about users blocking the entire website when they don’t like a page. But how many of them actually do that? I don’t really know.
What if I delete/noindex those pages that are blocked and then my website came back? Do you have any experience with this?
More ideas about this Panda to share?
Thanks!
Peter:
Hi,
Please correct me if I’m in error but I think you might be confusing the goal of the Panda algo and the discussion that follows regarding duplicate content.
The overarching goal of Panda was to increase the end-user experience when using Google’s search engine. Nobody, including myself, likes to invest time and effort when conducting research and end up clicking on a spam site. Google has not yet monetized search, it’s still free. Google needs to deliver a top-shelf user experience so the searcher keeps coming back to Google. More Google users equates to more Ad Revenue for Google.
How can Google weed out spammy sites? It would be impossible to click on each site one-by-one. Hence, the Panda algo. By far, one of the easiest forensic footprints left by a spammy site is Duplicate Content. The catch-22 is that some sites that are not SPAM also have duplicate content. Sadly, some really good sites got hit by Panda because the site was not properly optimized to eliminate duplicate content.
If you follow the above guidelines your site should come back in short order. More importantly, do everything possible to keep your visitors returning. Give them a reason to come back to your site. Don’t just concentrate on SEO, the human element is by far the most important. Good solid unique interesting content will drive them back in hordes.
Plus, it’s what Google wants lol…
Good Luck!
Brett Snyder:
@nobody Unfortunately there’s no way to say with any real certainty when a site will come back. In the example here, it took over 6 months before we saw recovery, but to your point if you saw others come back then I’d certainly say its possible. The best thing you can do is identify the causes for the penalty and work to correct them.
I’m not sure I understand the question about users blocking the website? When I wrote about blocking pages, it was to prevent the engines from indexing content that could be causing your penalty while you fix it. We did see this as something that was helpful in recovering from Panda, but honestly just blocking nthe pages may not be enough. I would suggest you block any duplicate pages but at the same time work to improve the content on those pages to be unique and provide a value-add.
@Peter My apologies if the above was confusing! I agree completely that the bottom line here is improving the user experience and (as I mentioned in my comment above) to provide a value-add to the user. We all know that Google is a relevancy-based algorithm so they want to return pages that address the users needs (and to you point…that is not done with duplicate content!)
Thanks for the thoughts!
Darko:
Great post for a beginner like me :)
Ryan:
My SEO philosophy is the same as my girl philosophy, quality over quantity.
Anthony Baker:
Good article. I got slapped hard by the Panda on April of 2011 and I have fully recovered as of April 19th, 2012. I already had unique quality content that was written for the visitor, not Google. However, my problem was the package. I had to repackage my content for Google.
One year of hell. But in the end I am left with a much better website that is friends with Panda and Penguin :-) Hopefully this will help other that have gone through the same hell.
This is what I did to recover: (All of it was necessary)
1.) I condense 500 pages down to 70 content packed pages!
2.) I don’t have a single page on my site with thin content.
3.) I shaved off a whole bunch of advertising, no above the fold advertising. This has definitely costs me income from the site, but better to work on the traffic, then the income will slowly come back I hope.
4.) I’ve used these tools extensively for every single page on my site and made every single page as fast as I possibly could.
http://developers.google.com/pagespeed/
http://gtmetrix.com/
http://tools.pingdom.com > (this one I used the most, very helpful data!)
5.) I’ve cleaned up my old html and used css formatting for all my text and menus.
6.) I added 404 nd 301 redirects. (THIS WAS EXTREMELY IMPORTANT!)
404 redirect sends a visitor to a custom “not found page” if they are trying to get to a page that does not exist the site any longer, this also helps Google quickly figure out which pages I have done away with. This was something I was lacking and was a major help! This was also extremely important because I have deleted so many pages. 100’s of deleted pages! I needed to make sure Google figured that out!
301 redirect points everything to the non-www version of my domain.
This was a huge fix as well. I did not even know this was an issue until much studying up on the subject.
Most shared webhosting package keep both the www and non-www version of the domain live which produces duplicate content in the eyes of Google. Extremely important to get this fixed! You also want to 301 redirect and index.html page or whatever your homepage format is to redirect to just http://domain.com.
Again this was a problem with duplicate content for example:
The 301 redirect cures this problem.
7.) I optimized every single image on the site. (Still have more work to do on this)
8.) I found all the websites/blogs I could find that copied my content and had them remove it. This took months of hard work!
9.) I have created a Facebook and Google + page for the website and have been active on both, heavily active on the Facebook page.
10.) Added just the right amount of social media activity on the site. Too much of this slows down you page load times.
11.) Added no-index to many pages that I felt Google did not need to index.
12.) I have also worked hard on de-optimizing content where I had over optimized in the past.