How To Create A Bag of Words Cloud to Optimize Landing Pages Ranking on 2nd Page

How To Create A Bag of Words Cloud to Optimize Landing Pages Ranking on 2nd Page

    What is Bag of Words

    The Bag of Words (BoW) is a technique commonly used in natural language processing and information retrieval. In this model, a text (such as a sentence or a document) is represented as an unordered set of its words, disregarding grammar and word order but keeping multiplicity.

    bag of words for ranking

    Features of Bag of Words

    Tokenization: It breaks the text into individual words or tokens.

    Vocabulary Building: Builds a vocabulary of unique words from the entire set of documents.

    Vectorization: Each document is represented as a vector where each dimension corresponds to a word in the vocabulary. The value in each dimension can be:

    • Binary (0 or 1), indicating the absence or presence of the word.
    • The count of the number of times the word appears in the document.
    • A weighted value using methods like TF-IDF (Term Frequency-Inverse Document Frequency).

    How does it help with SEO?

    Keyword Analysis: The BoW model can help in analyzing the keyword density in a document or a set of documents. It helps in identifying the most important and relevant keywords that a website should focus on to improve its visibility on search engines.

    Content Optimization: By analyzing the content through the BoW model, one can identify the gaps in the content and optimize it by incorporating the relevant keywords, thereby improving the search engine rankings.

    Competitor Analysis: One can analyze the content of competitors to identify the keywords they are targeting. This information can be used to modify the content strategy to compete better in the search engine rankings.

    Topic Modeling: The BoW model is used in topic modeling, which helps in identifying the main topics discussed in a set of documents. This information can be used to create content that is relevant and interesting to the target audience.

    Content Recommendation: The BoW model can be used to develop content recommendation systems. By analyzing the content through the BoW model, one can recommend similar content to the users, enhancing the user experience and increasing user engagement.

    Meta Data Optimization: Using BoW, it is possible to optimize the meta data (like meta descriptions and tags) of web pages to include relevant keywords, which can help in improving the search engine rankings.

    Main Objective

    The main objective of creating a Bag of Words (BoW) cloud is to enhance the organic visibility of a Webpage through refined keyword strategy and content strategy.

    The Major SEO Tasks that can Utlize a Bag of Words Cloud:

    • Keyword Optimization
    • Content Quality Enhancement
    • Improved Meta Data
    • Semantic Analysis
    • Competitor Analysis
    • Enhanced User Experience


    1. Data Collection

    Web Scraping: Extract content from the target URL and competitor URLs. Tools like BeautifulSoup or Scrapy in Python can be helpful.

    2. Text Processing

    Preprocessing: Clean the content by removing HTML tags, JavaScript, CSS, and other non-textual data. Convert all words to lowercase, and remove punctuation and stopwords (common words like “and”, “the”, etc. that don’t contribute much to the content’s meaning).

    Tokenization: Convert the cleaned content into individual words or tokens.

    3. Bag of Words Representation

    Vocabulary Building: Create a vocabulary of unique words from both the target URL and competitor URLs.

    Vectorization: Represent each URL’s content as a vector based on the vocabulary.

    4. Visualization: Word Cloud

    Use the word frequencies from the BoW representation to generate a word cloud for each URL. Python libraries like wordcloud can be used for this.

    5. Analysis and Recommendations

    Keyword Comparison: Compare the most frequent words in the target URL with those in the competitor URLs. Identify gaps or potential opportunities.

    Recommendation: Suggest words that are prominent in competitor URLs but are lacking or underrepresented in the target URL.

    Run the Below Code

    import requests

    from bs4 import BeautifulSoup

    from wordcloud import WordCloud

    import matplotlib.pyplot as plt

    from nltk.corpus import stopwords

    from nltk.tokenize import word_tokenize

    from collections import Counter

    import string

    import numpy as np

    def fetch_content_from_url(url):

        “””Fetch content from the given URL.”””


            response = requests.get(url, timeout=10)


            soup = BeautifulSoup(response.text, ‘html.parser’)

            return ‘ ‘.join([p.text for p in soup.find_all(‘p’)])

        except requests.RequestException as e:

            print(f”Error fetching content from {url}. Error: {e}”)

            return “”

    def preprocess_text(text):

        “””Preprocess the content – remove punctuation, lowercase, remove stopwords.”””

        tokens = word_tokenize(text)

        tokens = [word.lower() for word in tokens if word.isalpha()]

        tokens = [word for word in tokens if word not in stopwords.words(‘english’) and word not in string.punctuation]

        return tokens

    def generate_wordcloud_from_tokens(tokens, title):

        “””Generate a word cloud from given tokens.”””

        wordcloud = WordCloud(width=800, height=400, background_color=”white”).generate(” “.join(tokens))

        plt.figure(figsize=(10, 5))

        plt.imshow(wordcloud, interpolation=’bilinear’)



    def suggest_keywords(target_counter, competitor_counter, limit=20):

        “””Suggest top ‘limit’ keywords that are in competitor’s content but not in target’s content.”””

        suggestions = []

        for word, count in competitor_counter.most_common():

            if word not in target_counter:

                suggestions.append((word, count))

            if len(suggestions) == limit:


        return suggestions

    def visualize_suggestions(suggestions):

        “””Visualize the suggested keywords using a bar graph.”””

        words = [word[0] for word in suggestions]

        frequencies = [word[1] for word in suggestions]

        sorted_indices = np.argsort(frequencies)

        words = np.array(words)[sorted_indices]

        frequencies = np.array(frequencies)[sorted_indices]

        plt.figure(figsize=(10, 7))

        plt.barh(words, frequencies, color=’skyblue’)

        plt.xlabel(‘Frequency in Competitor Content’)

        plt.ylabel(‘Suggested Keywords’)

        plt.title(‘Top Suggested Keywords to Optimize Target URL Content’)

    def main():

        target_url = input(“Enter the target URL: “)

        print(“Enter all competitor URLs. Type ‘done’ when finished.”)

        competitor_urls = []

        while True:

            url = input(“Enter a competitor URL: “)

            if url.lower() == ‘done’:



        target_content = fetch_content_from_url(target_url)

        competitor_contents = [fetch_content_from_url(url) for url in competitor_urls]

        target_tokens = preprocess_text(target_content)

        competitor_tokens = []

        for content in competitor_contents:


        generate_wordcloud_from_tokens(target_tokens, “Target URL WordCloud”)

        generate_wordcloud_from_tokens(competitor_tokens, “Competitors WordCloud”)

        target_counter = Counter(target_tokens)

        competitor_counter = Counter(competitor_tokens)

        suggestions = suggest_keywords(target_counter, competitor_counter)

        print(“\nSuggested keywords to optimize target URL content:”)

        for word, freq in suggestions:



    if __name__ == “__main__”:


    Run the Following Command in Terminal

    pip install beautifulsoup4 requests wordcloud matplotlib nltk


    Sample Test:

    Enter the target URL:

    Enter all competitor URLs. Type ‘done’ when finished.

    Enter a competitor URL:

    Enter a competitor URL:

    Enter a competitor URL:

    Enter a competitor URL: done



    Using the Suggested List of Terms using Bag of Words we can improve the SEO Ranking and the organic visibility of our Landing Pages that are ranking within Striking Distance of the First Page.

    Leave a Reply

    Your email address will not be published. Required fields are marked *