Throughout the life cycle of a growing company, you’re undoubtedly going to go through a few site migrations and redesigns. As technologies change or you have more capital to invest on a customized design, you’ll want to update and upgrade your website with the latest and greatest. Unfortunately, this can often lead to issues with your SEO as often there’s a disconnect between the project team assigned to the redesign and the SEO team.
Whether your site is big or small, if it’s older than a few years chances are you’ve “misplaced” some pages along the way. Often we’re so concerned with getting fresh, new links that we forget to make sure we’re keeping the old ones as well. I’m not talking about making sure you’re old links are still active from external sites (though that’s still important). What we’re talking about is far easier… making sure the pages you have links pointing to on your site are still active. It’s so often overlooked but can present a pretty big “leaky-bucket” problem if you don’t address it. Whatever the reason for the lost value, deleted pages, broken redirect changes, etc. below we’ll look at a few ways we can plug the holes in your website to reclaim that value.
Let’s talk big impact first. The first thing to check is that all of your “Top Pages” are still active and passing value to your site. I use Open Site Explorer however Ahrefs has a great tool as well. These are pages that have been identified in hierarchical order as important.
- Type your domain into Open Site Explorer
- Click on the “Top Pages” tab
- Sort by Page Authority (highest to lowest) – or sort by Inbound links however the two metrics often correlate
- Look in the HTTP status column for any 404 errors. If you see a lot of these near the top of your sorted list, you have an big opportunity to reclaim that value.
- You can export a CSV containing all of your top pages that you can slice and dice in excel.
From here you can take a number of actions to reclaim the value. If the page is still relevant, you can re-institute it. Some times, pages get accidentally deleted during a migration when you had no intention to do so. Another route is to simply apply a 301 redirect to the next most relevant page. This will pass the value (how much is often a debate) to another page on your site but still keep it on a active domain URL.
OSE is a great tool to find valuable pages but it’s not as extensive and all-inclusive as a tool like Google Analytics. If you want to dig a bit deeper, pull a landing page report from GA for as far back as your data goes.
- Set the date range as far back as you have GA data for – for bigger sites this can take some time.
- Access the “Landing Pages” report by drilling down into Behavior > Site Content > Landing Pages
- Now you’ll see a list of all the landing pages on your site that have received traffic in the time period you set. Before we export this, we’ll want to grab all of the URLs in GA. In the lower right corner click “Show Rows” and set to the max “5000.”
- There’s a trick here if you have more than 5000 pages in GA. Once the 5000 sorted page loads, go to the address bar and to the end of the URL string. There you’ll find the number 5000, simply change that to the number of total landing pages GA has stored (in this case 9971).
- Now we’ll want to export this master list to Excel.
- The next step is to crawl these URLs via Screaming Frog to find any pages that might be broken (with a 404 error). Before we do that however, we need a clean list of URLs to work with. Since GA exports the paths of the URL, we need to open the document in Excel and add the domain before the URL path. You can do this by concatenating your domain with column “A” and copying this formula down.
- Now simply copy the URLs in column “B” into a .txt document (with notepad or your favorite text editor) and you’re ready to import them into Screaming Frog via list mode and crawl the heck out of them! You can export all of the 404 pages through the “Response Codes” tab. For a complete guide on Screaming Frog, check out Aichlee’s EXCELLENT all-in-one guide.
- Finally once you have your list of 404’d pages that once drove traffic, you can decide if it’s worth your while to re-institute these pages or simply 301 them to the next most relevant page.
Okay, now we’re getting kind of crazy. But if you want to be extra sure you’ve covered all of your bases and you have some older pages that maybe didn’t get traffic (or you forgot to install your GA code on them), there’s a little trick with Archive.org’s Wayback Machine. Warning: Technical Jargon Ahead.
- Archive.org’s Wayback Machine shows you older, cached versions of web pages but that’s not all. By typing in your domain, a “/”, and an asterisk, you’ll be able to see a list of all of the pages the wayback machine has cached for your site. (Note: there’s also a shortcut if you go to this URL: https://web.archive.org/web/*/http://www.seerinteractive.com/*)
- I did find this page a little hard to scrape, however Archive.org has a JSON API that you can pull this same data from. The URL for this is: http://web.archive.org/cdx/search/cdx?url=seerinteractive.com/*&output=json&limit=6635 – you can set your limit parameter to the amount of URLs captured for your domain.
- From here if you want a CSV file, there are a number of JSON to CSV converters out there – or you can simply implement the JSON feed into your workflow if it makes sense. Follow the same advice as above in regards to re-instituting pages or redirecting.
Perhaps the best advice for plugging these holes if having a redirect policy moving forward. As your site grows and more and more people are tasked with maintaining portions of the site, it’s inevitable that some pages will get lost, redirects will get broken, and value will start leaking. Having a firm policy in place ahead of time will keep everyone on the same page to ensure you proactively redirect and re-purpose pages from day one.