If you’re a client at Seer, or have been following any of our data analysis work, you’ve definitely heard or seen the term “n-gram” before. You can find n-grams in almost all of the big data work that we do. In this post we’ll break down what n-grams are, why they’re important for you as a business, and some potential use cases.
N-Grams are words, or combinations of words, broken out by the number of words in that combination. As an outline:
- Unigrams: one word
- Bigrams: two words
- Trigrams: three words
- And so forth
To further explore n-grams, we can break down the sentence below:
“Hi there everyone, we’re exploring n-grams today.”
- Unigram: hi | there | everyone, etc…
- Bigram: hi there | exploring n-grams | etc…
- Trigram: hi there everyone | exploring n-grams today | etc…
Note that the words must follow sequentially to be an n-gram.
N-Grams are useful for turning written language into data, and breaking down larger portions of search data into more meaningful segments that help to identify the root cause behind trends. That was a mouthful, so we’ll dig a bit deeper below.
N-grams cut out the noise from the data in your analyses.
The search industry is centered around search term data, therefore n-grams play a valuable role in cutting down the noise of thousands of rows of data on individual search terms. By aggregating data at the n-gram level, we can instantly pull out themes that would otherwise be impossible to identify when analyzing search terms in their entirety.
From a business perspective, this means that we can easily slice through the millions of data points that you have to answer questions about how you operate, and how your audience talks about your brand. Some questions n-grams can help answer are:
- What topics are our competitors speaking about in digital video content?
- Are we more likely to convert on a certain product organically in one location vs another?
- Where can we increase our paid spend confidently without losing efficiency?
- How did our recent TV spot affect our paid campaign targeting users in the Southeast US?
- What topics convert well for our paid users but we haven’t yet capitalized on for our organic users?
- What’s the difference between how users search for our brand vs how we talk about the brand on our website?
- How can I improve profitability and efficiency by eliminating as much waste from my paid search campaigns as possible?
Let’s use this last question and breakdown some real data from Seer’s paid search efforts to walk you through an example below:
A raw search term report looks something like this. Can you easily find inefficient searches, or new investment opportunities from this screenshot?
There are 15 search terms in this screenshot, would you be able to find inefficiency as easily if there were 100? What about 1,000? What about 100,000? Sounds like a job for n-grams.
We can see the same words like premium and salesforce occurring multiple times across separate search terms. This is easy enough to gather from the screenshot, but a lot more difficult when our average analysis contains at least 30,000 unique search terms. When we run this analysis for Seer (and our clients) we want to know:
- Which individual words does Seer’s target audience use the most?
- Which individual words are being used by people who are not Seer’s target audience?
- How do we know what content topics are driving more conversions?
- How do we know what topics drive conversions most efficiently?
- How do we know whether we should invest more resources in premium or salesforce keywords?
It would be incredibly tedious and take up significant time counting each search term and aggregating its data, or you make a guess about what your data might be telling you. At Seer, we don’t guess.
When we look at the n-gram premium vs the n-gram salesforce we know that salesforce is costing us almost double the spend that premium is costing us, however, our CPA is almost 5x better for premium than salesforce.
We’re getting conversions for premium at ⅕ the cost of salesforce. Maybe we should invest more heavily in content and ads related to premium. Maybe we should improve our landing pages and ad copy for salesforce.
With the right context, n-grams can help us better decide what has the most ROI and where we should invest our dollars at a highly strategic level.
Without n-grams, you have to rely on cherry-picked terms to guess what content topics might yield conversions efficiently and risk allocating your budget on a hunch. You can miss the opportunity to capture hundreds of leads, or thousands of dollars in revenue, when you don’t aggregate your data and pull out insights quickly and efficiently using n-grams.
The beautiful thing about n-grams is that they are a tool, not an answer. If the data you’re dealing with is text, you have n-grams. That means everyone has them. But what everyone doesn’t have is a team of experienced data strategists that can effectively leverage n-grams to slice your data and deliver insights that shape strategy and move that performance needle.
At Seer, we’re coming up with new ways to use n-grams every day to find value and drive strategy for our clients. If you’ve already got an overwhelming amount of text data, or are looking to leverage your search data in new ways both within and outside of PPC/SEO, Seer and our use of n-grams just might be able to help.
Asking the same questions at your organization? Find out if Seer can help.