February 24th, 2011: A day that won't soon be forgotten by myself and, I'm sure, the majority of the SEO community. On the same day the late Steve Jobs celebrated his birthday and I finally remembered to get a Valentine's Day gift (kidding!), Google rolled out an algorithm update that was reported to affect 12% of US searches. Let's take a moment to put that into perspective; in July 2011 comScore reported that Americans conducted 12.5 billion searches on Google alone (source). That comes out to about 390 million searches per day. So in one day, this algorithm update affected 46.8 million searches in the US.
We certainly weren't the only company to get calls asking for help. After a grueling 2 weeks of intense digging, gallons of coffee, and an all-nighter in the SEERplex that reminded me of my college days, we were able to produce a report that we were confident would address these issues. After 8 months of patience and persistence, we learned last week that the client experienced a 100% recovery of the traffic lost as a result of Panda.
The nice part about this strategy is that, on the surface, it's pretty straightforward! If you produce unique content that provides a value-add for users, Google will reward you for it. The strategies outlined below are intended to help you identify if you've been hit by Panda and present opportunities to improve the inherent quality of your site, thus making it more relevant and useful for users, which in turn should be reflected in the Google SERPs.
Many SEOs (myself included) believe that Panda was a ratio-based penalty. This essentially means that even if you have unique, valuable content on your top pages, if the majority of your pages are low value or duplicate content the overall domain will still suffer as a result. A good way to see the ratio of your duplicate pages is a simple site: search. For example, let's say your site:website-hit-by-panda.com results bring back 15,800 indexed pages. This would be fine, until you nail down to some deeper SERP pages and find this:
If Google is telling you they are omitting all but 980 pages because they are "very similar" to what has already been included, that's a pretty good indication that the majority of your content is duplicate or low quality in c0mparison.
I've researched quite a few Panda-stricken sites, and one of the biggest issues I'm seeing is duplicate content. For example, let's say you run a business that sells ATM machines. Your ATM machines are available in all 50 states and over 25,000 cities and towns. You know that Google likes to reward the most relevant page for a query, so in order to capitalize on local queries you've created a page for each city and town you do business in. To save time and resources, you templated out your local content so that it includes the same details on every page and just swaps out the city/town. Your content might look like this:
Looking for ATM machines in Philadelphia? ATM Machine Depot is your #1 source for Philadelphia ATM machines and services. We charge an industry-best 1% on withdrawal transactions and have local service professional available to restock your machines within 24 hours. ATM Machine Depot is license by the FDIC as an insured money teller, so your cash is protected in the event of machine malfunction, theft, or fraud. Whether you own a Philadelphia small business, restaurant, nightclub, or even a concert hall, ATM Machine Depot is here to help. Turn to ATM Machine Depot for all your portable bank telling needs.
Don't think Google Want proof that Google picks this up? Perform a search query for the following:
site:atmmachinedepot.com "ATM Machine Depot is your #1 source for * ATM machines and services. We charge an industry-best 1% on withdrawal transactions and have local service professional available to restock your machines within 24 hours"
By using the * wildcard, you're asking Google to find any block of content that matches the above regardless of what city name you've injected. If Google is finding 25,000 pages on your site with this same block of content, they are not going to reward you with rankings.
To put it bluntly, write unique content! However, while that isn't always possible there are ways to work around this. I recently wrote a post about using Excel to generate pseudo-unique content by combining various sentences with unique variables and randomizing the order. While this is not an ideal situation it does provide your pages with content that is, for the most part, unique to that page. Furthermore, the more unique qualifiers you can inject into the page, such as local venues your ATMs are currently installed at, reinforces the local theme and improves the perception of those pages in the eyes of the engines.
Disclaimer: Google is constantly working to detect this type of content strategy. As the semantic algorithms evolve, I'm certain they will (eventually) be able to identify duplicate content holistically on a website. The best solution is always to create unique, value-adding content for the pages.
If you cannot produce unique content for these pages, it is still important to make sure that Google does not continue to penalize you for them. Some steps for addressing this:
- Identify the least valuable pages on your site, presumably based on conversion rate or traffic.
- Block these pages from Google via the NOINDEX tag or the Robots.txt file.
- Concentrate on the most valuable pages first and produce content that genuinely adds to the user experience.
- As you create unique content for the blocked pages, reinclude them in the index through Sitemaps, "Submit to Index" option in WMT, and internal linking.
Along the same veins as duplicate content, make sure your URL structure is such that duplicate pages are not being generated by your CMS. For example, it is (unfortunately) not uncommon to see a site generate duplicate pages as a result of the navigation path. In keeping with our ATM Machines example, you might see pages generated as follows:
Logically, when Google is searching your site for the best page about Philadelphia ATMs, you want to make sure that all the signals your giving point to one page. The more pages we have that cater to a particular theme, the harder it is for Google to identify the best page. Ideally, you would implement a 301 redirect for all duplicate pages back to the preferred URL. If this does not work, you can also consider the rel=canonical tag.
Particularly with XML sitemaps, many websites will auto-generate them based on the pages live on your site. If you have duplicate pages being created and included in your XML sitemap, you're not only hosting duplicate content on your site but you're actively promoting this content to Google! Make sure that your XML sitemaps include only those pages that you want Google to visit regularly based on their value to your site as a whole.
Personally, I'm a believer that the Panda update was not a linking-based change in the algorithm. However, there are definitely ways to leverage linking to help improve your Panda struggles.
- Concentrate on building links to deep pages on your site, which shows Google that these hyper-specific pages are valuable to the end user.
- Audit your internal link structure to make sure that you're passing adequate value throughout the site from your top-level pages.
- Limit the number of internal links on any given internal page so that more link juice is transferred to the destination page.
At the end of the day...help yourself by helping your users!
Essentially, attack Panda problems the same way you would any SEO project:
- Clean site architecture that appropriately highlights the pages providing the most value-add to the user.
- Quality, unique, keyword-rich content that provides a value-add to the user.
- Generate links to deeper pages that indicate an endorsement of those pages as a value-add to the user.
If you've seen any Panda clues or suggestions I may have missed, let me know in the comments or on Twitter.