Bot Farms, Fraudulent Websites, and Display Advertising

The most profitable campaigns in digital marketing are not managed by advertisers, whether in-house, freelancer, or global agencies. Rather, the most profitable campaigns in digital marketing are managed by black hat hackers committing ad fraud.

In 2015, the Security Firm White Ops exposed Methbot, a suspected Russian criminal syndicate making an estimated $3-$5 million a day from display advertising fraud. The criminals did this by creating hundreds of thousands distinct URLs designed to trick display advertising algorithms into buying their fraudulent web space. They then employed over 500,000 bots to click on those ads and ultimately profit from exploiting the pay per click system.

Today, digital marketers have stepped up against ad fraud with initiatives such as ads.txt. However, digital ad fraud may be worse than ever as ad fraud syndicates have also innovated. According to e-Marketer’s Digital Ad Fraud Report, money lost from digital ad fraud is estimated to “vary from $6.5 billion to as high as $19 billion, a range that points to the difficulty in measuring fraud’s true impact.

While display advertising platforms are constantly scrubbing the placements they offer of poor quality sites, ad frauders are also creating thousands of new domains aimed at tricking their algorithms.

What you can do

If you have ever asked your display advertising vendor for a placement performance report, they often become evasive. When I recently asked a programmatic display representative for said report they responded, “Sure we can pull that for you, but the data set is very large and will be hard to comb through, are you sure?”

Here at Seer Interactive, however, we love working with big data. Considering how widespread ad fraud is, we asked: how can we analyze display advertising placement data at scale in order to provide valuable insights to our clients and save them as much money as possible. So we developed a dashboard in Power BI to do just that.

We theorized that the sites ad frauders are creating are probably poor quality. Even a cursory examination can reveal poor quality sites such as these fine examples pulled from a google display network (GDN) campaign:

However, placement performance alone gives no indication of site quality. Thus, we decided to synthesize PPC and SEO data to give us a comprehensive idea of site quality. Working with our SEO division, we used SEMrush’s authority score and trust score to gauge the quality of a site.

Methodology overview:

  • First we downloaded one year’s worth of placement performance data from one of our larger clients advertising on the GDN. This data set had over 80,000 unique URLs, which we felt was representative of the sites available on the GDN for our clients.
  • Next we used the SEMRush API backlinks_domain report to pull authority score and trust score for those 80,000 URLs.
  • Finally, we bridged the data in Power BI and built data visualizations and various filters

According to SEMRush, “In general, somewhere over a score of 20 (for both metrics) can be considered healthy.” When we filtered for placements below that “quality site” threshold, we found $17k in spend!

Export the results, negate the placements, and job done. Right?

Not quite. If ad frauders are constantly creating new domains, what good is a static exclusion list? Even if we ran this analysis on a monthly basis, there may still be new poor quality placements popping up each month and thus more wasted spend.

We then hypothesized, what if there are consistent themes with these poor quality domains and how can we identify them? In order to answer this question with the data available, we adjusted the Power BI dashboard to segment performance by top level domain (TLD) and N-Gram (a contiguous string of n items from text). Sure enough we found TLDs and N-Grams that consistently had poor trust scores.

We then used the TLD and N-Gram Data with an Adwords (AKA Google Ads) script by Frederick Vallaeys that automatically excludes GDN placements based on text criteria. If you are conducting this analysis with a programmatic display vendor, ask them if they have similar capabilities to black list certain TLDs or N-Grams from your placements.

Moving forward

This is by no means an end-all solution for the fight against ad fraud. For the foreseeable future, hackers will continue to innovate their tactics, while display advertisers do their best to react. If you are concerned about where your ads are showing after running this analysis, consider running your ads only on a vetted list of high quality placements (aka managed placements campaigns). Additionally, adding a recaptcha to your landing page can be an effective tactic to reduce bot conversions (this is particularly useful for lead generation campaigns).

Although advertisers have seen many benefits from the algorithmic machine learning innovations in audience targeting, you cannot always trust that display advertising platforms (even those as seemingly trustworthy as Google) will show your ads on trustworthy sites.

If you’re looking for more PowerBI tips and tricks for digital marketing, check out our guide or hit us up!


We love helping marketers like you.

Sign up for our newsletter to receive updates and more: