Crawling it all
If you’re in SEO and haven’t heard of or used Screaming Frog, do you even lift, bro?
Back in 2011, there wasn’t a need for anything more than a laptop and a moderate amount of RAM. Much like everything, the websites have evolved, and so have the size and complexity of the web.
I’ve also got a severe case of Fear Of Missing Out (FOMO), and when I crawl a website, I have an overwhelming desire to crawl the entire thing; business laptops just weren’t cutting it.
The Downside of Sampling
Screaming Frog is a Java program – it’ll eat as much RAM as you’ll allow it.
As the application evolved, you could choose either RAM or database mode, the latter being stored on your hard drive as a database.
Screaming Frog themselves goes into much more detail in How To Crawl Large Websites Using The SEO Spider.
However, most of the posts still lean towards how to limit, cut down or exclude whole sections of the site (sampling), which, for various reasons, takes time and feeds my FOMO.
I Built My Solution
Most Screaming Frog enthusiasts have heard of running Screaming Frog in the cloud. If you don’t have a powerful machine with gobs of RAM and a solid-state drive, it’s a viable option. Posts and guides detailing how to get a Linux instance with Screaming Frog running have existed since 2014.
- Here’s Mike King’s from January 2016: How To Run Screaming Frog And URL Profiler On Amazon Web Services
- Fili Wiese wrote this way back in 2014 and updated it in 2016: HOW TO RUN SCREAMING FROG ON GOOGLE CLOUD [UPDATED]
- Most recently, Screaming Frog has posted about running it in the cloud.
Years ago, I did a cost-benefit analysis on a napkin and decided to front-load my costs rather than pay for a subscription or pay-per-use model. I chose old server hardware to keep costs low, got to work, and started purchasing parts to build “The Phoenix.” Memory was the most significant expense, and that cost has dropped dramatically!
- Motherboard x8dtn+-f $79.97
- Various parts from a previous build iteration (The Beast, R.I.P)+upgrades ~$242.83
- Memory (12x16GB DDR3): $380.65
Total hardware cost: $703.45
Don’t judge the wiring.
Screaming Frog Cloud vs. Running Your Server
For my usage, scaling isn’t a critical limiting factor. While a cloud environment can easily be scaled to whatever you need, The Phoenix, with dual CPUs and 200GB of RAM, is overkill at crawling and being a Windows machine, I can do other things on it pretty quickly.
The Phoenix had an initial upfront cost, but you’ll need to factor in ongoing costs, especially power consumption and internet usage. When using cloud-based services, you may only need to account for resources you use while you use them with none of the physical maintenance and upkeep costs, all while the price scales depending on how much or how little you use it. However, forgetting to spin an instance down or a crawl gets out of control could be costly.
I’ve always been around computers and building them for as long as I can remember, so this came naturally to me. Physically running your own “server” and having a cloud setup have pros and cons. Ultimately, remoting to an existing Windows desktop and working directly in Screaming Frog was more advantageous for my needs.