What is Topic Modeling?

Topic Modeling For SEO InfographicThe infographic provided by Marketmuse.com

Long has creating content around specific keywords gone.  Google has increasingly updated their search engine algorithms with Hummingbird, Vincent’s Change and RankBrain.  You can no longer depend on creating content without it being topic relevant.

If you are following who ranks on page one of Google many times the specific keyword your searching for is not contained in the results being displayed in the Google search results.

To give you a better understanding I ran across this article on MarketMuse about topic modeling.  The author did a fantastic job addressing what this means. A thumbs up on this detailed article.

Topic Modeling for SEO Explained

Posted by Rebecca Bakken at Market Muse

Search engines like Google have a vested interest in concealing exactly how they rank content. But there’s only so much you can hide in the information age. It’s known that search algorithms use topic models to sort and prioritize the 130 trillion pages on the web. While we may never eliminate the unknowns of SEO, we can use what we do know to an advantage.

Search algorithms are getting increasingly intelligent. The introduction of Hummingbird made that abundantly clear. Writing high-ranking content is no longer a matter of using as many keywords as possible. Instead, the algorithm employs models that measure the topical comprehensiveness of a page. It then matches it to a search query.

As a result, comprehensiveness has become a proxy by which search engines measure content quality. Moreover, Hummingbird made it easier to determine how Google ranks content. Fortunately for us, it provided a baseline for experimentation. Comparing rankings before and after the update has proven to be insightful.

How Do We Know Search Engines Use Topic Modeling?

Using a nascent version of MarketMuse, Neil Patel’s data science team assessed the rankings of nearly 10 million words of content. Their goal was to see how Hummingbird was prioritizing pages

They found that the No. 1 factor for predicting high rankings is topic comprehensiveness. It’s even more important than page authority and backlinks.

Topic modeling is an integral part of search algorithms. We’re not the only ones who think so.

If you’ve got some time on your hands, you can read this extensive research paper by the University of Maryland. It details the many applications of topic models. These include query expansion, information retrieval, and search personalization.

It’s difficult to envision a way to efficiently produce SERPs without topic modeling. There’s too many pages on the web. The way in which queries are entered is vast and complex. There are various on-page SEO factors taken into account for each search.

So it’s safe to assume that topic modeling is a requirement for providing fast, relevant results.

Which means content marketers should care. Here’s why.

Developing a content strategy that produces results begins with understanding search engines. But you don’t need to be a data scientist to crack the code.

Although later on we’ll discuss the history of topic modeling. Then we’ll explore the different types of algorithms for data-curious content marketers.

What SEOs Need to Know About Topic Models

Google’s algorithm utilizes topic modeling to prioritize pages that have deep coverage of a given subject. So the best way to rank is to:

  • make your content easily readable by the algorithm
  • create in-depth, broad coverage of your focus topics.

Enter, topic clusters. These are groups of content that contain pillar pages that broadly cover your focus topics. They are, in turn, supported and linked to by pages that deeply cover topics related to your pillars. Topic clusters give you breadth and depth in a way that’s easily navigated by both humans and search algos.

HubSpot did an experiment showing how interlinked topic clusters resulted in better SERP rankings. It’s likely that the clusters made HubSpot’s content easier to crawl. That allowed the algorithm to quickly find the pages relevant to a query.

The interlinked clusters signal breadth and depth of a topic. It can lead users through a seamless journey that answers their questions. After all, that’s the whole point of search. Getting those questions answered is called searcher task accomplishment. It contributes to higher ranking by increasing the authority of your pages. Every time a user visits and doesn’t bounce, that sends a positive signal to Google.

Topic Clusters and User Intent

Searcher task accomplishment is a relatively new industry term. But the concept itself is not new. It’s what happens when you focus on satisfying user intent. You aim to provide as many answers as possible with your content in an easily navigable way. In other words, creating topic clusters.

Optimizing content around user intent involves some critical thinking. You need to determine the potential questions a person may ask. However, throwing stuff at the wall to see what sticks isn’t a great way to strategize. It’s a lesson many content marketers have learned the hard way.

Creating topic clusters is best done with a solution that thinks like a search algorithm. MarketMuse takes a keyword, what we prefer to call a focus topic, for one page. Then it takes it and analyzes tens of thousands of other related pages. In doing so, it identifies subtopics, questions to answer, and user personas to address with your content. It does all this by using artificial intelligence to generate detailed content suggestions.

The software helps produce an outline of what your content should look like. It removes much of the guesswork for your writers. We’re not the only company that provides this value, but we do it better than the competition. For that, we have an ensemble of natural language processing algorithms, information theory, neural networks, and semantic analysis to thank.

Like Google, we’re not about to give away our trade secrets. But we can break down for you how more rudimentary topic modeling algorithms work. This should illuminate the differences between simpler tools and sophisticated software platforms.

Term Frequency-Inverse Document Frequency

Introduced in 1972, TF-IDF analyzes keyword frequency in a document compared to a set of documents. It measures the number of times a word or combination of words appears in a body of text. Then it determines the degree of relevance the text has to that term by comparing it to a collection of other documentsBut its greatest downfall is that it can’t account for relationships, semantics, or syntactics. That’s why it’s not very useful in today’s complex world of SEO.

Latent Semantic Analysis

Developed in 1988, latent semantic analysis (LSA) looks at the relationship between a set of documents and the terms they contain. Specifically, it produces a set of concepts related to the document and terms. LSA gets us closer to discovering synonyms and semantically related words. But it still can’t identify relationships between topics.

Latent Dirichlet Allocation

This topic model, created in 2003, is commonly used to identify topical probability and relationships between topic and subtopics. Latent Dirichlet Allocation (LDA) analyzes the connections between words in a corpus of documents. It’s able to cluster words with similar meaning. As a result you have a more in-depth semantic analysis than earlier topic models. LDA also utilizes a Bayesian inference model to identify terms related to a topic within a document. It improves those assumptions each time a new document is analyzed. Using LDA, you can get a reasonably precise assessment of the topics discussed in a document.

Article curated, original article is here

This article supports the evidence we see with Authority Snooper.  If you don’t already have Authority Snooper head over to Authority Snooper and pick the software that shows the evidence backing up topic driven content.

For years after LSI, Authority Model and implementation of RankBrain (Artificial Intelligence) creating content with the topic in mind has dominated the results.

The curated article above focuses prior to RankBrain the technology Google has been implementing.  The evidence is in the search results and focusing your topics around what Google expects from page one candidates is critical.

Bottom line to all of this is find what is working in the search results and model it.

This entry was posted in Slick Time Savers. Bookmark the permalink.