Blog

Deep Crawl : The Definitive Guide

1. Duplicate Pages including Primary:

Possible issues:

All of those pages that share an identical title, description and nearly identical content with the other pages may harm your ranking. 

Recommendation:

Title, description, and content should unique from each other.

2. Duplicate Page Set:

Possible issues:

A set of duplicate pages that shares an identical title, description and nearly identical content which may affect the page ranking.

Recommendation:

Title, description, and content should unique from each other.

3. Paginated 2+ Pages:

Possible issues:

The Pagination breaks down show all pages which were found in paginated sets in the crawl. First pages are those which do not have a prev link, and paginated are all others.

Recommendation:

Add pagination carefully.

4. Canonicalized Pages:

Possible issues:

Pages with URLs that are different to the canonical URL specified in the canonical tag in either the HTML or HTTP header.

Recommendation:

Canonical version of the page should be specified in the header.

5. 301 Redirects:

Possible issues:

URL that redirects to another URL with 301 HTTP status codes, which is also used for permanent redirect.

Recommendation:

Redirect should be kept in minimum. Too many redirects may slow down the site and also creates problem in crawling.

6. 5xx Errors:

Possible issues:

URL that returns any 5xx server error HTTP status code, such as a 503, often caused by a temporary server performance problem or a permanent issue.

Recommendation:

This error indicates server error which can only be fixed internally or by reconfiguring host.

7. Broken Pages (4xx Errors):

Possible issues:

URLs that return a 4xx status code, such as a 404, indicating a valid page could not be returned by the server because it doesn’t exist.

Recommendation:

Remove pages that have a 4xx error or redirect the page to another page.

8. Unauthorized Pages:

Possible issues:

Pages which return a 401, 403, or 407 HTTP response code, indicating that the content could not be served, and therefore will not be indexed in search engines.

Recommendation:

Analyze your site before indexing, so that there’s no page that returns 401, 403, 407 error, remove or redirect those pages with 401, 403, 407. These pages may harm your ranking in the search result.

9. Failed URLs:

Possible issues:

URLs which were crawled, but did not respond within the Deep Crawl timeout period of 9 seconds.

Recommendation:

Blocked by Robots, pages with noindex, sitemap priority set to zero these are the reasons that may lead to crawling issues.

10. Disallowed URLs (Uncrawled):

Possible issues:

All URLs which were disallowed in the robots.txt file on the live site, or from a custom robots.txt file. Contact a seo company to know more in details.

Recommendation:

Only necessary URLs should be disallowed in the robots.txt file, else the disallowed URL will not be crawled.

11. Missing Titles:

Possible issues:

Pages with missing title tags may cause bad ranking in the SERP cause title tag holds importance in the ranking factor which contains the money keyword. 

Recommendation:

Title tag must be present on every page.

12. Short Titles:

Possible issues:

Pages title less than require minimum character will affect the ranking.

Recommendation:

Minimum character requirement is 35.

13. Max Title Length:

Possible issues:

Pages title with over than require maximum character will affect the ranking.

Recommendation:

Maximum character requirement is 65.

14. Pages with Duplicate Titles:

Possible issues:

Page’s title tag matches with other page’s title tag.

Recommendation:

Give a unique page title.

15. Missing Descriptions:

Possible issues:

Page’s description tag is missing. Meta description tag is one of the most important tags which contains the money keywords.

Recommendation:

Add meta description tag and also add keyword which you want to rank.

16. Short Descriptions:

Possible issues:

Page’s description length is too short. Meta description tag is one of the most important tags which contains the money keywords. Keeping it short may affect your ranking.

Recommendation:

Keep the meta description to a recommended length. Minimum length: 70 characters.

17. Max Descriptions Length:

Possible issues:

Page’s description length is over the recommended length. Meta description tag is one of the most important tags which contain the money keywords. Keeping it short may affect your ranking.

Recommendation:

Keep the meta description to a recommended length. Max length: 155 characters.

18. Duplicate Meta Descriptions:

Possible issues:

Duplicate meta descriptions used to mean Google saw the pages as the same or competing with each other, this may create problem in your site ranking.  Duplicate meta descriptions should be avoided at all costs.

Recommendation:

Every page on your site should have a unique meta description.

19. Thin pages:

Possible issues:

Having low content in your site may lead to discouragement to your user. Basically Thin content is content that has little or no value to the user. Google considers these kinds of pages as low-quality pages.

Recommendation:

Every page on your site should have at least 300 character minimum.

20. Missing h1 tag:

Possible issues:

The <H1> tag is usually the first header tag visible on a page.  One of the most complicated tasks of search engines is to understand the meaning (context) of a page. A crawler may not found what the page is all about if the h1 is missing.  H1 is one of the most valuable tags which directly affect your ranking.

Recommendation:

Every page must have an h1 tag.

21. Multiple h1 tags in a page:

Possible issues:

The <H1> tag is usually the first header tag visible on a page.  One of the most complicated tasks of search engines is to understand the meaning (context) of a page. A crawler may get confused what the page is all about because of multiple h1 tags.  H1 is one of the most valuable tags which directly affect your ranking.

Recommendation:

Every page must have only 1  h1 tag.

22. Broken JS/CSS:

Possible issues:

CSS and javascript which all returns non-200 status code are to be considered as broken. This may harm your site’s loading speed.

Recommendation:

Review the file if the code is written are all proper or not, and also defer parsing of JavaScript may help.

23. Broken Images:

Possible issues:

Images that all return 4xx status code is to be considered as broken. This may harm your site’s loading speed.

Recommendation:

Find the broken image link if it’s internal then it can be fixed easily by replacing the proper URL and if it’s external then it may create problem cause the proper URL may not be provided.

24. Pages without valid Canonical Tag:

Possible issues:

Pages which are missing a canonical tag, or with conflicting canonical URLs in the HTTP header and HTML head. Contact a seo expert for more info.

Recommendation:

A canonical tag should be included on every page to prevent duplication issues.

25. Orphaned Canonical Pages:

Possible issues:

All pages referenced as a canonical page from another page, which is not linked from any other page.

Recommendation:

Pages which are marked as a canonical page should probably be linked from at least one other linked page.

26. Multiple Canonical Links:

Possible issues:

Pages which have more than five other pages referencing them as their canonical URL.

Recommendation:

This can be normal if there is a high amount of duplication on a website, but it can also be a sign of misconfiguration and accidental canonicalization.

Review these pages to ensure that there are no potential issues.

27. Conflicting Canonical tags:

Possible issues:

Pages with multiple canonical tags containing different URLs.

Recommendation:

Google will ignore all canonical tags if they are not consistent, so these pages are effectively missing a valid canonical tag.

28. Canonical to Non-200:

Possible issues:

Canonicalized URLs that don’t return 200.

Recommendation:

Find those URLs and replace with proper URL or remove the URL.

29. Redirect Chains:

Possible issues:

When there is more than one redirect between the initial URL and the destination URL. Unnecessary redirects also make it more difficult for Google to crawl the site, which can affect how well pages are indexed. This may also lead to slower site speed.

Recommendation:

When a URL is redirected, it should have a single 301 redirect in place. Remove Unnecessary redirects if not needed.

30. Excessive Redirects:

Possible issues:

Pages which have more than 30 redirects in from other URLs.  Too many URLs redirecting to a single URL may cause the destination URL to be treated as a soft 404 by search engines.

Recommendation:

Try to keep the redirects to a minimum.

31. All broken Redirects:

Possible issues:

URLs which return a redirect to a URL with a 4xx/5xx HTTP status code.  These redirects will result in poor user experience and waste crawl budget, so the redirects should be changed to a target URL which returns a 200 status.

Recommendation:

Find and remove all the broken links.

32. Redirects Loops:

Possible issues:

A redirect loop occurs when a page ends up redirecting back to itself. This can lead to a longer time being taken for a full crawl of the site, which will reduce the freshness of the website’s pages in a search engine’s index.

Recommendation:

Identify the redirect loops and eliminate those loops and remove unnecessary redirects.

33. Mixed content:

Possible issues:

Mixed content refers to a mix of secure and non-secure resources found on a webpage.

Recommendation:

Using SSL or in WordPress, there are plugins available.

34. Non Secure Form Fields:

Possible issues:

The Chrome browser will flag any pages containing an input form field on HTTP pages as insecure.

Recommendation:

These pages should be moved to HTTPS, or the fields should be removed from the page to avoid the error.

35. Non Secure Form Fields:

Possible issues:

The Chrome browser will flag any pages containing an input form field on HTTP pages as insecure.

Recommendation:

These pages should be moved to HTTPS, or the fields should be removed from the page to avoid the error.

36. Pages with HSTS:

Possible issues:

HTTP Strict Transport Security (HSTS) is a web security policy mechanism that helps to protect websites against protocol downgrade attacks and cookie hijacking. This should instruct any crawler to use the HTTPS version of the URL instead.

Recommendation:

Include an HSTS header field in the HTTP response headers.

37. Pages without Hreflang Tags:

Possible issues:

hreflang tags are a method to mark up pages that are similar in meaning but aimed at different languages and/or regions.

These pages will not show any alternate URL in search results that might be more appropriate for a user in a different region with a different language setting. These can be reviewed to find pages you are expecting to have an hreflang alternative page.

Recommendation:

Implement hreflang to every page

Hreflang can be implemented in 3 different ways.

  1. link element in HTML <head>

  2. HTTP header

  3. XML sitemaps

rel=”alternate” hreflang=”x”

38. Not Supported Hreflang Links:

Possible issues:

Hreflang tags with unrecognized language and region codes are not valid and can’t be shown as alternates in search results, so these can be reviewed for a valid alternative.

Recommendation:

Implement Hreflang tags according to the region and code should proper.

39. Broken Hreflang Links:

Possible issues:

All hreflang links which point to a URL that returns a broken status code (4xx, 5xx). These hreflang tags will be ignored by search engines, so the correct URL should be identified, or the hreflang tag removed.

Recommendation:

Identify the hreflang links which point to a broken URL, either replace the URL or remove it.

40. All Hreflang Combinations:

Possible issues:

All unique combinations of language and region codes used in hreflang tags across all pages.

Recommendation:

By examining the combinations of languages and regions used, you can understand the patterns of connected pages, and possible issues with incorrect, or inconsistent hreflang tags.

41. URLs with a double slash:

Possible issues:

URL paths which contain two or more slashes next to each other.

Recommendation:

Some web servers consider two slashes next to each other to be the same as a single slash, this can result in duplication when webservers crawl the page.

42. Max URLs Length:

Possible issues:

URLs that exceed the maximum URL length specified. URLs exceeding 1024 characters may not work in web browsers or be indexed by search engines.

Recommendation:

Advanced settings > Report settings (default: 1024 characters).

Browser     Address bar document .location

or anchor tag

——————————————

Chrome          32779           >64k

Android          8192           >64k

Firefox          >64k           >64k

Safari           >64k           >64k

IE11             2047           5120

Edge 16          2047          10240

43. All Broken Internal Links:

Possible issues:

All instances of links where the target URL return a 4xx status code. These links may result in poor user experience, and waste crawl budget, so they can be updated to a new target page, or removed from the source page.

Recommendation:

Replace or remove those broken internal links.

44. Missing Image Link alt tags:

Possible issues:

Image alt attribute is blank.

Linked images can pass relevancy to the target page via the alt tag contents, so any linked images with a blank or missing alt tag can be optimized by adding relevant text.

Recommendation:

Add some relevant keyword to the alt attribute.

45. Unique External Links:

Possible issues:

All external link should be unique and relevant to your site. These links to external sites may affect the user experience, and potentially analyzed by Google, so they should be reviewed for quality and suitability. Apply latest seo techniques for better result.

Recommendation:

Build external links which are relevant to your site.

46. External Broken Links:

Possible issues:

All external links which are present in your site and returns 4xx will affect your ranking in SERP.

Recommendation:

Identify and remove those links.

47. External Redirects:

Possible issues:

All internal URLs included in the crawl, which redirect to an external URL. These URLs will pass PageRank to the external pages.

Recommendation:

This may be used for tracking purpose.

48. Primary Indexable Pages Not in Sitemaps:

Possible issues:

Primary indexable pages not included in Sitemaps may have been removed.

Recommendation:

Review to ensure that all important pages are discoverable via your sitemaps.

49. Broken Sitemap Links:

Possible issues:

Broken pages which were found in the sitemaps.

Recommendation:

Including broken pages in the sitemaps may lead to unnecessary usage of crawl budget and should it be avoided.

50. Mobile Alternates in Sitemaps:

Possible issues:

URLs which were found within the mobile/AMP rel alt tag of another page, which was found in a sitemap during the crawl. These mobile alternates have correct reciprocation set up, so are likely to be discovered by search engines, who may follow the canonical tag to find the desktop pages.

Recommendation:

Avoid having rel=alternate mobile and rel=canonical tags that point to URLs that in turn redirect to other pages. This is confusing for search engines.

51. Disallowed/Malformed URLs in Sitemaps:

Possible issues:

URLs which were found in sitemaps, but could not be crawled because they were disallowed. This may lead to unnecessary use of crawl budget.

Recommendation:

Disallowed URLs should generally not be included within the sitemap.

52. Orphaned Sitemaps Pages:

Possible issues:

Pages discovered in an XML Sitemap, which are not linked from any page included in the web crawl. will be crawled by Google taking up crawl budget, but they are not linked internally which may indicate they are low-value pages.

Recommendation:

Remove from the Sitemaps or link to other valuable pages.

53. Indexable Pages without Search Impressions:

Possible issues:

Indexable pages which did not appear in Search Analytics over a long enough period.  This could indicate that this page targets a rarely searched topic, or that search engines do not rank it well for relevant topics.

Recommendation:

Content needs to be improved using a specific keyword of that particular page with low Search Impressions.

54. Non-indexable Pages with Search Impressions:

Possible issues:

These pages are non-indexable but appeared in search during the reporting period. This signals that Google is ranking pages which are canonicalized, no indexed, or duplicates.

Recommendation:

Fix the current page and if it has any duplicate issue.

55. Mobile/AMP Pages with Search Impressions:

Possible issues:

Mobile alternate and AMP pages which had impressions in Google Search Console Search Analytics. These mobile alternate and AMP pages appeared in Google search during the reporting period.  AMP provides a relatively easy way to improve the speed of mobile websites for publishers. AMP is also one of the ranking factors which may help in ranking.

Recommendation:

Add AMP to your site, through WordPress there is a plugin called AMP.

56. MOBILE/AMP PAGES WITH DESKTOP SEARCH CLICKS:

Possible issues:

These URLs are mobile or AMP alternates but received desktop clicks during the reporting period. This may indicate that there is an issue with the mobile configuration or the desktop version of these pages.

Recommendation:

Reconfigure the AMP page and add properly, else if it’s a WordPress site then add plugin call AMP.

57. BROKEN PAGES WITH TRAFFIC:

Possible issues:

URLs which had Google Search Console clicks or Analytics visits during the reporting period, but which return broken status code (4xx, 5xx). These pages are at high risk of losing rankings. Landing on an error page is poor user experience, so search engines attempt to remove these pages from their index as soon as they detect the error.

Recommendation:

Remove pages with 4xx, 5xx or redirect is to your important pages if the broken pages are getting traffic.

58. REDIRECTING PAGES WITH TRAFFIC:

Possible issues:

Pages which had traffic before, now redirecting the page may indicate that there are some issues with the pages. If they do not redirect to relevant pages, rankings may drop, and the website will lose the traffic that they were driving.

Recommendation:

If there isn’t an issue with the page then there is no need for unnecessary redirect, but if there is then redirect the page to a relevant page.

59. DISALLOWED URLS WITH TRAFFIC:

Possible issues:

URLs which had traffic before, but now they are not crawlable because they were disallowed by your robots.txt. This may cause a drop in rankings, or modified search results (leading to a lower clickthrough rate) because search engines are unable to crawl the destination URLs. If the website owner does not want the URL appearing in search results – but if the URL is receiving search traffic then this implies that it is indexed.

Recommendation:

First check :

Are other URLs specified the disallowed URL as a canonical?

Is the disallowed URL included in any XML Sitemaps?

Remove the URL from sitemap, check if the URL is present in the canonical tag then remove it.

60. BROKEN PAGES WITH BACKLINKS:

Possible issues:

URLs that have backlinks but those links return 4xx/5xx status code. This means that PageRank or link equity which may have been passed by those backlinks is lost.

Recommendation:

These URLs should be restored to a 200 status code or redirected to an appropriate alternative. Removal of those backlinks may also be an option.

61. DISALLOWED URLS WITH BACKLINKS:

Possible issues:

URLs which were found in your backlink list, but were disallowed by your robots.txt. The PageRank or link equity that these backlinks have may not be passing to your site because search engines are unable to crawl the destination URLs.

Recommendation:

If those backlinks are irrelevant or unnecessary then the best option is to remove it, if the links is valuable then allow it from your robots.txt.

62. PAGES WITH META NOFOLLOW AND BACKLINKS:

Possible issues:

URLs which have backlinks but which have a meta or header nofollow directive. This means that the PageRank or link equity that these URLs have is unable to flow through to the rest of the website.

Recommendation:

These nofollow directives should be investigated and removed where appropriate. Top search engines like Google occasionally penalize websites that are trying a little too blatantly in their attempts by creating too many nofollow backlinks. Mixing it up with dofollow backlinks gives less of an impression.

63. MOBILE ALTERNATES WITH BACKLINKS:

Possible issues:

URLs which were found within the mobile/AMP rel alt tag of another page, and which have backlinks from external websites. These backlinked mobile alternates have correct reciprocation, so are likely to be contributing to the authority of your website.

Recommendation:

Add backlinks to your AMP/mobile pages.

64. PAGES WITHOUT BACKLINKS:

Possible issues:

URLs which were found within the mobile/AMP rel alt tag of another page, and which have backlinks from external websites. These backlinked mobile alternates have correct reciprocation, so are likely to be contributing to the authority of your website.

Recommendation:

Add backlinks to your AMP/mobile pages

65. PAGES WITHOUT BACKLINKS:

Possible issues:

URLs which were found within the mobile/AMP rel alt tag of another page, and which have backlinks from external websites. These backlinked mobile alternates have correct reciprocation, so are likely to be contributing to the authority of your website.

Recommendation:

Add backlinks to your AMP/mobile pages.

66. ORPHANED PAGES WITH BACKLINKS:

Possible issues:

Pages with external backlinks, based on the uploaded backlink data, which are not linked from any other page. These pages are not linked from other pages, but may still be passing PageRank through its own link.

Recommendation:

These pages should be reviewed to see if they should be linked into the site.

67. DISCOURAGED VIEWPORT TYPES:

Possible issues:

All pages with a viewport value that is discouraged due to poor mobile compatibility, namely the minimum-scale, maximum-scale, and user-scalable attributes. These pages may have usability issues and are likely not to be categorized as ‘mobile friendly’ by Google, which may affect the ranking in mobile search results.

Recommendation:

You should confirm whether these viewport values are correct.

68. MOBILE CONTENT MISMATCH:

Possible issues:

Alternate mobile and desktop pages should have matching page content. Mismatching page content between mobile and desktop versions of the website could prevent pages from being correctly indexed or served within search result pages.

Recommendation:

Identify if there are two versions of the same site mobile and desktop and also if the content is the same or not. If not configure the site properly.

69. RESPONSIVE DESIGN:

Possible issues:

All mobile responsive pages with a valid viewport value. Pages which return specific viewport values may be responsive design pages.

Recommendation:

The configuration of the viewport should be in a proper way.

70. DYNAMICALLY SERVED:

Possible issues:

All pages with a vary: user-agent value in the headers which indicates to search engines that different HTML might be returned for a different user agent on the same URL. If Googlebot sees the user-agent vary value, it is likely to crawl the same URL using both Googlebot and Googlebot mobile user agents.

Recommendation:

Configure properly that mobile specific page returns the same URL as the desktop version of the page.

71. NO BOT HITS:

Possible issues:

If search engines do not crawl pages, they may not be indexed and may not be considered important by search engines.

Recommendation:

There is some crawling issue present on the site.

72. HIGH BOT HITS:

Possible issues:

These pages are crawled frequently by search engines. This can be a sign that these pages are considered important and/or likely to change often. Know more in details form best seo company.

Recommendation:

Heavy bot traffic (5% of sessions or more) can skew our data and pollute our analytics. The easiest way to keep bot traffic out of your Analytics reports is to use Google’s automatic filter. To set up this filter, go to your view settings and check the box that says “Exclude all hits from known bots and spiders.”

deep crawl 1

73. FAST FETCH TIME (<1SEC):

Possible issues:

All URLs with a fetch time of 1 second or less. Slow pages can negatively impact crawl efficiency and user experience.

Recommendation:

The page which loads in one second or faster are recommended (Google recommends that pages should be usable in one second or less).

74. MAX FETCH TIME:

Possible issues:

All URLs exceeding the maximum fetch time specified (default: 2s). Slow pages can negatively impact crawl efficiency and user experience.

Recommendation:

Identify slow pages that can be optimized for improved performance.

75. MAX HTML SIZE:

Possible issues:

All pages that exceed the maximum HTML size specified  (default: 204,800bytes). Large HTML pages can be slow to load and may not be fully indexed by search engines, negatively impacting both usability and SEO performance.

Recommendation:

Identify HTML pages that exceed the Max HTML size and review if the size of HTML can be reduced.

  •  
  •  
  •  
  •  

Leave a Reply

Your email address will not be published. Required fields are marked *