{"id":821,"date":"2018-10-08T12:57:42","date_gmt":"2018-10-08T12:57:42","guid":{"rendered":"http:\/\/slicktimesavers.com\/blog\/?p=821"},"modified":"2019-08-11T11:11:49","modified_gmt":"2019-08-11T11:11:49","slug":"what-is-topic-modeling","status":"publish","type":"post","link":"https:\/\/slicktimesavers.com\/blog\/what-is-topic-modeling\/","title":{"rendered":"What is Topic Modeling?"},"content":{"rendered":"<p><a href=\"http:\/\/blog.marketmuse.com\/topic-modeling-for-seo-explained\"><img decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/cdn2.hubspot.net\/hubfs\/2942375\/Blog\/Topic%20Modeling%20for%20SEO%20Explained.jpg\" alt=\"Topic Modeling For SEO Infographic\" width=\"600px\" border=\"0\" \/><\/a><strong>The infographic provided by <a href=\"https:\/\/blog.marketmuse.com\/topic-modeling-for-seo-explained\">Marketmuse.com<\/a><br \/>\n<\/strong><\/p>\n<h3>Long has creating content around specific keywords gone.\u00a0 Google has increasingly updated their search engine algorithms with Hummingbird, Vincent&#8217;s Change and RankBrain.\u00a0 You can no longer depend on creating content without it being topic relevant.<\/h3>\n<h3>If you are following who ranks on page one of Google many times the specific keyword your searching for is not contained in the results being displayed in the Google search results.<\/h3>\n<h3>To give you a better understanding I ran across this article on MarketMuse about topic modeling.\u00a0 The author did a fantastic job addressing what this means. A thumbs up on this detailed article.<\/h3>\n<h1><span id=\"hs_cos_wrapper_name\" class=\"hs_cos_wrapper hs_cos_wrapper_meta_field hs_cos_wrapper_type_text\" data-hs-cos-general-type=\"meta_field\" data-hs-cos-type=\"text\">Topic Modeling for SEO Explained<\/span><\/h1>\n<div id=\"hubspot-author_data\" class=\"hubspot-editable\" data-hubspot-form-id=\"author_data\" data-hubspot-name=\"Blog Author\"><span class=\"hs-author-label\">Posted by<\/span> <a class=\"author-link\" href=\"https:\/\/blog.marketmuse.com\/author\/rebecca\">Rebecca Bakken<\/a> at Market Muse<\/div>\n<p>Search engines like Google have a vested interest in concealing exactly how they rank content. But there\u2019s only so much you can hide in the information age. It\u2019s known that search algorithms use topic models to sort and prioritize the <a href=\"https:\/\/searchengineland.com\/googles-search-indexes-hits-130-trillion-pages-documents-263378\" target=\"_blank\" rel=\"noopener noreferrer\">130 trillion pages<\/a> on the web. While we may never eliminate the unknowns of SEO, we can use what we do know to an advantage.<\/p>\n<p><span data-offset-key=\"6ks71-0-0\">Search algorithms are getting <\/span><span class=\"adverb\"><span data-offset-key=\"6ks71-1-0\">increasingly<\/span><\/span><span data-offset-key=\"6ks71-2-0\"> intelligent. The introduction of Hummingbird made that <\/span><span class=\"adverb\"><span data-offset-key=\"6ks71-3-0\">abundantly<\/span><\/span><span data-offset-key=\"6ks71-4-0\"> clear. <\/span> Writing <a href=\"https:\/\/blog.marketmuse.com\/relevant-terms-are-the-2-ranking-factor-on-google\" target=\"_blank\" rel=\"noopener noreferrer\">high-ranking content<\/a> is no longer a matter of using as many keywords as possible. Instead, the algorithm employs\u00a0models that measure the topical comprehensiveness of a page. It then matches it to a <a href=\"https:\/\/blog.marketmuse.com\/the-hidden-success-metric-in-seo-and-content-marketing\">search query<\/a>.<\/p>\n<p>As a result, comprehensiveness has become a proxy by which search engines measure content quality. Moreover, Hummingbird made it easier to determine how Google ranks\u00a0content.\u00a0<span class=\"adverb\"><span data-offset-key=\"ak720-0-0\">Fortunately<\/span><\/span><span data-offset-key=\"ak720-1-0\"> for us, it provided a baseline for experimentation. Comparing rankings before and after the update has proven to be insightful.<\/span><\/p>\n<p>How Do We Know Search Engines Use Topic Modeling?<\/p>\n<p><span class=\"hardreadability\"><span data-offset-key=\"b9ve0-0-0\">Using a nascent version of MarketMuse, Neil Patel&#8217;s data science team assessed the rankings of <\/span><\/span><span class=\"adverb\"><span data-offset-key=\"b9ve0-1-0\">nearly<\/span><\/span><span class=\"hardreadability\"><span data-offset-key=\"b9ve0-2-0\"> 10 million words of content<\/span><\/span><span data-offset-key=\"b9ve0-3-0\">. Their goal was to see <\/span><a href=\"https:\/\/neilpatel.com\/blog\/how-google-hummingbird-really-works-what-we-learned-by-analyzing-9-93-million-words-of-content\/\"><span data-offset-key=\"b9ve0-4-0\">how Hummingbird was prioritizing pages<\/span><\/a><span data-offset-key=\"b9ve0-5-0\">.\u00a0<\/span><\/p>\n<p><span data-offset-key=\"b9ve0-5-0\">They found that the No. 1 factor for predicting high rankings is topic comprehensiveness. It&#8217;s even more important than page authority and backlinks.<\/span><\/p>\n<p>Topic modeling is an integral part of search algorithms. We\u2019re not the only ones who think so.<\/p>\n<p>If you\u2019ve got some time on your hands, you can read this extensive <a href=\"https:\/\/mimno.infosci.cornell.edu\/papers\/2017_fntir_tm_applications.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">research paper by the University of Maryland<\/a>. It details the many applications of topic models. These include query expansion, information retrieval, and search personalization.<\/p>\n<p><span data-offset-key=\"btms2-0-0\">It\u2019s difficult to envision a way to <\/span><span class=\"adverb\"><span data-offset-key=\"btms2-1-0\">efficiently<\/span><\/span><span data-offset-key=\"btms2-2-0\"> produce SERPs without topic modeling. There&#8217;s too many pages on the web. The way in which queries <\/span><span class=\"passivevoice\"><span data-offset-key=\"btms2-3-0\">are entered<\/span><\/span><span data-offset-key=\"btms2-4-0\"> is vast and complex. There are various <\/span><a href=\"https:\/\/blog.marketmuse.com\/how-to-plan-your-editorial-calendar-with-marketmuse\"><span data-offset-key=\"btms2-5-0\">on-page SEO<\/span><\/a><span data-offset-key=\"btms2-6-0\"> factors taken into account for each search.<\/span><\/p>\n<div class=\"\" data-block=\"true\" data-editor=\"17kf4\" data-offset-key=\"4tp27-0-0\">\n<p class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"4tp27-0-0\"><span class=\"hardreadability\"><span data-offset-key=\"4tp27-0-0\">So it&#8217;s safe to assume that topic modeling is a <\/span><\/span><span class=\"complexword\"><span data-offset-key=\"4tp27-1-0\">requirement<\/span><\/span><span class=\"hardreadability\"><span data-offset-key=\"4tp27-2-0\"> for providing fast, <\/span><\/span><a href=\"https:\/\/blog.marketmuse.com\/why-measure-keyword-relevance\"><span data-offset-key=\"4tp27-3-0\">relevant results<\/span><\/a><span data-offset-key=\"4tp27-4-0\">.<\/span><\/p>\n<p class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"4tp27-0-0\"><span data-offset-key=\"83jbr-0-0\">Which means content marketers should care. Here&#8217;s why.<\/span><span data-offset-key=\"fk7cn-0-0\"><br \/>\n<\/span><\/p>\n<\/div>\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"bih2d-0-0\"><span data-offset-key=\"bih2d-0-0\">Developing a content strategy that produces results begins with understanding search engines. But you don\u2019t need to be a data scientist to crack the code.<br \/>\n<\/span><\/div>\n<div class=\"\" data-block=\"true\" data-editor=\"17kf4\" data-offset-key=\"1fbbr-0-0\">\n<p class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"1fbbr-0-0\"><span data-offset-key=\"1fbbr-0-0\">Although later on we\u2019ll discuss the history of topic modeling. Then we&#8217;ll explore the different types of algorithms for data-curious content marketers.<\/span><\/p>\n<\/div>\n<h2>What SEOs Need to Know About Topic Models<\/h2>\n<div class=\"\" data-block=\"true\" data-editor=\"17kf4\" data-offset-key=\"7r629-0-0\">\n<p class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"7r629-0-0\"><span class=\"hardreadability\"><span data-offset-key=\"7r629-0-0\">Google\u2019s algorithm utilizes <\/span><\/span><a href=\"https:\/\/blog.marketmuse.com\/why-topical-authority-is-the-new-seo-optimization-strategy\"><span data-offset-key=\"7r629-1-0\">topic modeling to prioritize pages<\/span><\/a><span class=\"hardreadability\"><span data-offset-key=\"7r629-2-0\"> that have deep coverage of a given subject<\/span><\/span><span data-offset-key=\"7r629-3-0\">. So the best way to rank is to:<\/span><\/p>\n<\/div>\n<div class=\"\" data-block=\"true\" data-editor=\"17kf4\" data-offset-key=\"eesgs-0-0\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"eesgs-0-0\"><span data-offset-key=\"eesgs-0-0\">make your content <\/span><span class=\"adverb\"><span data-offset-key=\"eesgs-1-0\">easily<\/span><\/span><span data-offset-key=\"eesgs-2-0\"> readable by the algorithm<\/span><\/li>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" data-offset-key=\"eesgs-0-0\">create in-depth, broad coverage of your focus topics.<\/li>\n<\/ul>\n<\/div>\n<p><span data-offset-key=\"9rtmk-0-0\">Enter, topic clusters. <\/span><span class=\"hardreadability\"><span data-offset-key=\"9rtmk-1-0\">These are groups of content that contain pillar pages that <\/span><\/span><span class=\"adverb\"><span data-offset-key=\"9rtmk-2-0\">broadly<\/span><\/span><span class=\"hardreadability\"><span data-offset-key=\"9rtmk-3-0\"> cover your focus topics<\/span><\/span><span data-offset-key=\"9rtmk-4-0\">. They are, in turn, supported and linked to by pages that <\/span><span class=\"adverb\"><span data-offset-key=\"9rtmk-5-0\">deeply<\/span><\/span><span data-offset-key=\"9rtmk-6-0\"> cover topics related to your pillars. Topic clusters give you breadth and depth in a way that\u2019s <\/span><span class=\"adverb\"><span data-offset-key=\"9rtmk-7-0\">easily<\/span><\/span><span data-offset-key=\"9rtmk-8-0\"> navigated by both humans and search algos. <\/span><\/p>\n<p><span class=\"veryhardreadability\"><span data-offset-key=\"9rtmk-9-0\">HubSpot did an experiment showing how interlinked <\/span><\/span><a href=\"https:\/\/research.hubspot.com\/topic-clusters-seo\"><span data-offset-key=\"9rtmk-10-0\">topic clusters resulted in better SERP rankings<\/span><\/a><span data-offset-key=\"9rtmk-11-0\">. It&#8217;s likely that the clusters made HubSpot\u2019s content easier to crawl<\/span><span data-offset-key=\"9rtmk-13-0\">. That allowed the algorithm to <\/span><span class=\"adverb\"><span data-offset-key=\"9rtmk-14-0\">quickly<\/span><\/span><span data-offset-key=\"9rtmk-15-0\"> find the pages relevant to a query. <\/span><\/p>\n<p><span data-offset-key=\"9rtmk-15-0\">The interlinked clusters signal breadth and depth of a topic. It can lead users through a seamless journey that answers their questions. After all, that\u2019s the whole point of <\/span><a href=\"https:\/\/blog.marketmuse.com\/how-to-optimize-your-content-for-local-searcher-intent\"><span data-offset-key=\"9rtmk-16-0\">search<\/span><\/a><span data-offset-key=\"9rtmk-17-0\">. Getting those questions answered <\/span><span class=\"passivevoice\"><span data-offset-key=\"9rtmk-18-0\">is called<\/span><\/span> <a href=\"https:\/\/www.searchenginejournal.com\/goal-first-searcher-task-accomplishment\/240777\/\"><span data-offset-key=\"9rtmk-20-0\">searcher task accomplishment<\/span><\/a><span data-offset-key=\"9rtmk-21-0\">.\u00a0It contributes to higher ranking by increasing the authority of your pages. Every time a user visits and doesn\u2019t bounce, that sends a positive signal to Google.<\/span><\/p>\n<h2>Topic Clusters and User Intent<\/h2>\n<p><span data-offset-key=\"1kqkm-0-0\">Searcher task accomplishment is a <\/span><span class=\"adverb\"><span data-offset-key=\"1kqkm-1-0\">relatively<\/span><\/span><span data-offset-key=\"1kqkm-2-0\"> new industry term. But the concept itself is not new. It\u2019s what happens when you focus on satisfying user intent. You aim to provide as many answers as possible with your content in an <\/span><span class=\"adverb\"><span data-offset-key=\"1kqkm-3-0\">easily<\/span><\/span><span data-offset-key=\"1kqkm-4-0\"> navigable way. In other words, creating topic clusters. <\/span><\/p>\n<p><span data-offset-key=\"6om2v-0-0\">Optimizing content around user intent involves some critical thinking. You need to determine the potential questions a person may ask. <\/span><span class=\"complexword\"><span data-offset-key=\"6om2v-1-0\">However<\/span><\/span><span data-offset-key=\"6om2v-2-0\">, throwing stuff at the wall to see what sticks isn\u2019t a great way to <\/span><span class=\"complexword\"><span data-offset-key=\"6om2v-3-0\">strategize<\/span><\/span><span data-offset-key=\"6om2v-4-0\">. It&#8217;s a lesson many content marketers have learned the hard way.<\/span><\/p>\n<p><span data-offset-key=\"3munc-0-0\">Creating topic clusters is best done with a solution that thinks like a search algorithm. MarketMuse takes a keyword, what we prefer to call a focus topic, for one page. Then it takes it and analyzes tens of thousands of other related pages. <\/span><span class=\"hardreadability\"><span data-offset-key=\"3munc-1-0\">In doing so, it identifies subtopics, questions to answer, and user personas to address with your content<\/span><\/span><span data-offset-key=\"3munc-2-0\">. It does all this by using <\/span><a href=\"https:\/\/blog.marketmuse.com\/how-ai-is-changing-seo-and-content-strategy\" target=\"_blank\" rel=\"noopener noreferrer\"><span data-offset-key=\"3munc-3-0\">artificial intelligence<\/span><\/a><span data-offset-key=\"3munc-4-0\"> to generate detailed content suggestions. <\/span><\/p>\n<p><span data-offset-key=\"507gj-0-0\">The software helps produce an outline of what your content should look like. It removes much of the guesswork for your writers. We\u2019re not the only company that provides this value, but we do it better than the competition. <\/span><span class=\"veryhardreadability\"><span data-offset-key=\"507gj-1-0\">For that, we have an ensemble of natural language processing algorithms, information theory, neural networks, and semantic analysis to thank<\/span><\/span><span data-offset-key=\"507gj-2-0\">. <\/span><\/p>\n<p>Like Google, we\u2019re not about to give away our trade secrets. But we can break down for you how more rudimentary topic modeling algorithms work. This should illuminate the differences between simpler tools and sophisticated software platforms.<\/p>\n<h2>Term Frequency-Inverse Document Frequency<\/h2>\n<p><span class=\"hs_cos_wrapper hs_cos_wrapper_meta_field hs_cos_wrapper_type_rich_text\" data-hs-cos-general-type=\"meta_field\" data-hs-cos-type=\"rich_text\"><span class=\"hardreadability\"><span data-offset-key=\"9fcr5-0-0\">Introduced in 1972, <\/span><\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Tf%E2%80%93idf\"><span data-offset-key=\"9fcr5-1-0\">TF-IDF<\/span><\/a><span class=\"hardreadability\"><span data-offset-key=\"9fcr5-2-0\"> analyzes keyword frequency in a document compared to a set of documents<\/span><\/span><span data-offset-key=\"9fcr5-3-0\">. It measures the number of times a word or combination of words appears in a body of text. <\/span><span class=\"hardreadability\"><span data-offset-key=\"9fcr5-4-0\">Then it determines the degree of relevance the text has to that term by comparing it to a collection of other documents<\/span><\/span><span data-offset-key=\"9fcr5-5-0\">.\u00a0<span class=\"hardreadability\">But its greatest downfall is that it can\u2019t account for relationships, semantics, or syntactics<\/span>. That&#8217;s why it&#8217;s not very useful in today&#8217;s complex world of SEO.<\/span><br \/>\n<\/span><\/p>\n<h2>Latent Semantic Analysis<\/h2>\n<p><span class=\"hs_cos_wrapper hs_cos_wrapper_meta_field hs_cos_wrapper_type_rich_text\" data-hs-cos-general-type=\"meta_field\" data-hs-cos-type=\"rich_text\"><span class=\"hardreadability\"><span data-offset-key=\"ej6of-0-0\">Developed in 1988, <\/span><\/span><a href=\"http:\/\/lsa.colorado.edu\/whatis.html\"><span data-offset-key=\"ej6of-1-0\">latent semantic analysis<\/span><\/a><span class=\"hardreadability\"><span data-offset-key=\"ej6of-2-0\">\u00a0(LSA) looks at the relationship between a set of documents and the terms they contain<\/span><\/span><span data-offset-key=\"ej6of-3-0\">. <\/span><span class=\"adverb\"><span data-offset-key=\"ej6of-4-0\">Specifically<\/span><\/span><span data-offset-key=\"ej6of-5-0\">, it produces a set of concepts related to the document and terms. LSA gets us closer to discovering synonyms and <\/span><span class=\"adverb\"><span data-offset-key=\"ej6of-6-0\">semantically<\/span><\/span><span data-offset-key=\"ej6of-7-0\"> related words. But it still can\u2019t identify relationships between topics.<\/span><br \/>\n<\/span><\/p>\n<h2>Latent Dirichlet Allocation<\/h2>\n<p><span id=\"hs_cos_wrapper_post_body\" class=\"hs_cos_wrapper hs_cos_wrapper_meta_field hs_cos_wrapper_type_rich_text\" data-hs-cos-general-type=\"meta_field\" data-hs-cos-type=\"rich_text\"><span class=\"veryhardreadability\"><span data-offset-key=\"4cnlr-0-0\">This topic model, created in 2003, is <\/span><\/span><span class=\"adverb\"><span data-offset-key=\"4cnlr-1-0\">commonly<\/span><\/span><span class=\"veryhardreadability\"><span data-offset-key=\"4cnlr-2-0\"> used to identify topical probability and relationships between topic and subtopics<\/span><\/span><span data-offset-key=\"4cnlr-3-0\">. <\/span><a href=\"http:\/\/www.jmlr.org\/papers\/volume3\/blei03a\/blei03a.pdf\"><span data-offset-key=\"4cnlr-4-0\">Latent Dirichlet Allocation <\/span><\/a><span class=\"hardreadability\"><span data-offset-key=\"4cnlr-5-0\">(LDA) analyzes the connections between words in a corpus of documents<\/span><\/span><span data-offset-key=\"4cnlr-6-0\">. It&#8217;s able to cluster words with similar meaning. As a result you have a more in-depth semantic analysis than earlier topic models. <\/span><span class=\"hardreadability\"><span data-offset-key=\"4cnlr-7-0\">LDA also utilizes a Bayesian inference model to identify terms related to a topic within a document<\/span><\/span><span data-offset-key=\"4cnlr-8-0\">. It improves those assumptions each time a new document <\/span><span class=\"passivevoice\"><span data-offset-key=\"4cnlr-9-0\">is analyzed<\/span><\/span><span data-offset-key=\"4cnlr-10-0\">. Using LDA, you can get a <\/span><span class=\"adverb\"><span data-offset-key=\"4cnlr-11-0\">reasonably<\/span><\/span><span data-offset-key=\"4cnlr-12-0\"> precise assessment of the topics discussed in a document. <\/span><\/span><\/p>\n<h3>Article curated, original article is <a href=\"https:\/\/blog.marketmuse.com\/topic-modeling-for-seo-explained\">here<\/a><\/h3>\n<h3>This article supports the evidence we see with Authority Snooper.\u00a0 If you don&#8217;t already have Authority Snooper head over to <a href=\"https:\/\/www.slicktimesavers.com\/authoritysnooper\/\">Authority Snooper<\/a> and pick the software that shows the evidence backing up topic driven content.<\/h3>\n<h3>For years after LSI, Authority Model and implementation of RankBrain (Artificial Intelligence) creating content with the topic in mind has dominated the results.<\/h3>\n<h3>The curated article above focuses prior to RankBrain the technology Google has been implementing.\u00a0 The evidence is in the search results and focusing your topics around what Google expects from page one candidates is critical.<\/h3>\n<h3>Bottom line to all of this is find what is working in the search results and model it.<\/h3>\n","protected":false},"excerpt":{"rendered":"<p>The infographic provided by Marketmuse.com Long has creating content around specific keywords gone.\u00a0 Google has increasingly updated their search engine algorithms with Hummingbird, Vincent&#8217;s Change and RankBrain.\u00a0 You can no longer depend on creating content without it being topic relevant. If you are following who ranks on page one of<span class=\"more-link\"><a href=\"https:\/\/slicktimesavers.com\/blog\/what-is-topic-modeling\/\">Continue Reading<\/a><\/span><\/p>\n","protected":false},"author":2,"featured_media":812,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["entry","author-slick","post-821","post","type-post","status-publish","format-standard","has-post-thumbnail","category-slick-time-savers"],"_links":{"self":[{"href":"https:\/\/slicktimesavers.com\/blog\/wp-json\/wp\/v2\/posts\/821","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/slicktimesavers.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/slicktimesavers.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/slicktimesavers.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/slicktimesavers.com\/blog\/wp-json\/wp\/v2\/comments?post=821"}],"version-history":[{"count":4,"href":"https:\/\/slicktimesavers.com\/blog\/wp-json\/wp\/v2\/posts\/821\/revisions"}],"predecessor-version":[{"id":878,"href":"https:\/\/slicktimesavers.com\/blog\/wp-json\/wp\/v2\/posts\/821\/revisions\/878"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/slicktimesavers.com\/blog\/wp-json\/wp\/v2\/media\/812"}],"wp:attachment":[{"href":"https:\/\/slicktimesavers.com\/blog\/wp-json\/wp\/v2\/media?parent=821"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/slicktimesavers.com\/blog\/wp-json\/wp\/v2\/categories?post=821"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/slicktimesavers.com\/blog\/wp-json\/wp\/v2\/tags?post=821"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}