Posted by PJ_Howland

This post was originally published on the STAT blog.



Even if it’s not making People’s Sexiest Person of the Year, the benefits of TF-IDF for SEO are too unreal not to share. Which is why I’m sharing a breakdown of how to apply this under-used, underappreciated SEO tactic to help you win major traffic.

Read on to learn how to leverage TF-IDF and tools like STAT to get insight into what your competitors are up to and what high-quality, relevant content you need to create for searchers.

But first…

TF-ID-What?

TF-IDF stands for term frequency-inverse document frequency. It’s a text analysis technique that Google uses as a ranking factor — it signifies how important a word or phrase is to a document in a corpus (i.e. a blog on the internet). When used for SEO-purposes, it helps you look beyond keywords and into relevant content that can reach your audience.

It does this by doing two different things very well.

First, it tells you how often a word appears in a document — this is the “term frequency” portion of TF-IDF. Then, it tells you how important that term is, and it does this with “inverse term frequency,” which weighs down the words that appear frequently (such as “the” or “a”) and scales up the more unique words. This adjusts for the fact that some words appear more often than others and contribute little relevance.

This weighting score tells us how relevant our keywords are, which is cool in and of itself, but especially handy when you apply it to SEO.

Why SEOs should care about TF-IDF

Google is smart. It knows when you’re trying to find the name of the song you can’t get out of your head and also when content isn’t valuable or is teetering on the verge of spammy. It does the latter through its all-powerful algorithm, which actually includes a TF-IDF-like analysis to ensure that content is relevant to the topic being searched.

This is key for SEOs, because while Google has gotten pretty good at thinking like a human, it’s still just an algorithm that is constantly refining its metrics. And while we know some of those metrics, some of the metrics Google uses we might not know, and those can be things like the right words being used in an article.

So, whether you’re hoping to get a wider reach with your content, increase traffic without getting penalized by Google, or know what Google qualifies for the top SERP spot, you need to do what Google does — you need to get your TF-IDF on.

How to conduct a TF-IDF analysis

Basically, you need something to uncover the semantic relevance of words. TF-IDF will help you get an idea of what content Google values in sites that are performing really well.

Google understands the exact metrics that drive user engagement and does a pretty good job of indicating whether or not searchers are pleased with a result. And the good news? Leveraging TF-IDF can give you insight into those metrics. You’ll see what your competitors are up to, but also get an idea of what high-quality, relevant content you need to create for your searchers.

Let’s say I have a client in the health and wellness space who has content they want to rank for the keywords “coconut oil.” I know that traditional keyword research will reveal words like “coconut oil uses,” “benefits of coconut oil,” and “coconut oil for hair.” But I also want to know what topics are being commonly discussed by other high-ranking articles.

To find out, I simply take a curated keyword list and plug it into STAT to see the top ten pages that rank for my client’s keywords.

Once I have the top ten sites, I can export all that data from STAT into any tool of choice. I prefer to use Ryte, which analyses the site’s pages for my keyword of choice — in this instance, “coconut oil” — and then calculates the TF-IDF value. The results will provide me with a way to compare the content on my client’s page with their competitor pages. From there, I can choose which keywords have a higher search volume and lower competition.

Standard keyword research shows us what people are searching for when they’re looking for coconut oil. What it won’t reveal are the related keywords and themes that your competitors are using in their articles. Which means your content, no matter how well-articulated, risks going unseen.

A TF-IDF analysis for “coconut oil,” on the other hand, will reveal words that are semantically related to your keyword.

Don’t be surprised if what it surfaces is a bit of a head-scratcher. Remember: TF-IDF won’t reveal words like your keyword. It’s going to reveal words associated with your keyword. For instance, my keyword, “coconut oil,” surfaced “diaper rash.” Who’da thunk?

But that’s what great about TF-IDF — it’s giving you the topics that Google has deemed important, so you get that extra layer of clarity to create better content and rank higher on the SERPs.

Next steps: Comparing results

Now that I know that “diaper rash” and “coconut oil” are (apparently) hot topics, my next step is to understand how this, and other highly scored TF-IDF results, compare against the top 10 URLs that my health and wellness client is competing against.

To figure this out, I simply plug my client’s competitor URLs into Ryte and with one click of a button, a detailed breakdown populates, depicting which of my highly-scored keywords my client’s competitors are using and how frequently they’re appearing in their content. This will help me determine my next steps.

For instance, if my client wants to up their chances of claiming the top spot on the SERPs, then they best be thinking about how to incorporate “diaper rash” in their content, since its high search volume clearly indicates that this is what the searchers are after.

Or, I can also suggest that my client goes after the low-hanging fruit, snagging only the high search volume keywords that have low competition. This would help them jump the ranks faster since they wouldn’t be battling it out with big brands over those high-ranking keywords. Either strategy is viable and will help my client improve their topic relevancy.

Remember, though. If it doesn’t make sense for your page, don’t use it. Once you uncover why TF-IDF is uncovering words XYZ, find a content strategy that will authentically place those words into your article.

The benefits of TF-IDF

The cool thing is that once you’ve optimized your content, it’s usually very quickly that your keyword rankings go up. Better keyword ranking is always going to mean more traffic, more traffic means time-on-page and conversions, and more dollars in the pocket, and that’s something that I think we’re all here for.

Another added bonus of TF-IDF analysis is that it drastically levels up your featured snippet game. Like, a lot.

That’s because content that has undergone a TF-IDF analysis helps prime pages for featured snippets — it includes the words and phrases that Google already wants to see in order for it to be a qualifier. So, if you were also hoping to cross grabbing snippets off your bucket list this year, then TF-IDF may be your new bestie.

Take all that data and track it

Once I’m done optimizing my content with keywords I know that Google (and searchers) want to see, I can track their performance over time in STAT. This lets me know if my strategy has been successful. 

For instance, I can look at new pages to see how they are performing. If I want to do some additional A/B testing, I just segment my keywords into meaningful tags, such as keywords from my core keyword research and related terms (from our TF-IDF analysis). This helps me easily compare how the ranks are differing between the pages I’ve optimized with TF-IDF results and the pages that I didn’t.

And that’s it! 

Do you have any neat-o TF-IDF strategies? Share them with me in the comments, below!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!