Do you have pages that have the potential for ranking and organic search traffic but aren’t part of your site structure? Or pages that aren’t supposed to be in your site structure, but Google finds them anyway?
The answer is most likely yes. At least, it is for the majority of websites!
These pages are known as orphan pages, and re-associating the excellent ones with your website structure helps you to fully utilize their potential (as does banning search engine bots from your low-value ones!).
So, Just What Are Orphan Pages?
Orphan pages are those that have no links to them anyplace on your website. Because there are no connections to them, neither website users nor site spiders will locate them.
So, how do you go about finding orphan pages?
You’ll need to employ a web crawler as well as a log file analyzer. Go for Screaming Frog!
How To Identify Orphaned Pages
If you wish to detect orphan pages, it is often best to compare the current URLs on your website to the URLs found in your log files. To do this, you must utilize a program to crawl your site and generate a list of your URLs. Then you must repeat the process with your log files. Once you get the lists, you must determine whether any URL matches the two of them. If you come across any, these are orphan pages.
This method can be time-consuming and might result in blunders (missing orphan pages), especially when dealing with big websites. Fortunately, there is a more convenient and faster approach to locating orphaned pages. And all it takes is crawling your page with a program like Screaming Frog and granting it access to your log files. The system displays a list of your Orphan pages with a single click. Furthermore, you’ll obtain pertinent information about each page, such as its status code or the amount of GoogleBot visits.
Why Are Orphan Pages Detrimental To SEO?
Orphan pages generate two major SEO issues:
- Low traffic and rankings: Even if they have exceptional content, orphan sites seldom rank well in SERPs or receive a lot of organic search traffic.
- Crawl Waste: Low-value orphan pages (such as duplicate pages) can divert crawl money away from vital pages.
When orphan pages account for a sizable portion of the pages Google investigates on your website, such as more than 70% in the example below, you get a decent picture of how serious the situation is.
How Do I Resolve Orphan Pages?
Orphan pages are classified into two types:
- The predicted orphan pages that you shouldn’t be concerned about
- The unexpected orphan pages that you should be concerned about
Their type will determine the path you follow to fix your orphan pages. So, when we notice a significant amount of orphan pages, the first thing we do is look at what they look like and whether to expect them or not.
Expected Orphan Pages: Usually Not A Reason For Worry
After doing a site crawl and comparing it to your server log files to identify pages Google is finding but aren’t in your site structure, you can click on “found by Google” to receive a list of all your orphan pages.
Many of these orphan pages will be generated by:
- Pages that do not already exist on your site but have links to another site. It is usual to receive an external link to a page, which you subsequently erase or redirect. Google will still detect the old link because it still exists on the other website.
How to solve: Because you have no control over the links on other websites, the only method to remedy this sort of orphan page is to contact the site owner and request that they update the page to the right new location.
- Pages that return status codes other than 200. Google may continue to crawl pages that produce 4xx status codes even though it’s updated on your site.
How to resolve: Google will ultimately cease indexing these pages. Nothing to be concerned about.
- Pages that have expired. This is prevalent on websites with a large number of short-lived pages, such as classified ads that expire fast.
How to fix: We should only be concerned about expired sites discovered by Google if they have been orphaned for an extended period. Otherwise, the number of orphan pages only indicates the website’s page rotation rate.
Unexpected Orphan Pages: Cause For Concern?
- Expired pages that continue to return content: Some websites stop referring to expired material (such as goods withdrawn from the catalogue) and fail to produce a status code (such as HTTP 404 or 410), indicating that the content is no longer available. As a result, the previous page is still accessible.
How to Repair: In addition to eliminating links to expired information, you should ensure that the expired page is updated with the correct status code. Make sure to 404 or 410 the content if it is no longer available.
- Pages left out of a prior site migration: These were not redirected pages; thus, old material may still be visible.
How to fix:
If your new website has equivalent information, you should redirect these old URLs to it. If there isn’t, these outdated/omitted pages should produce a 404 or 410 status code.
- A syntactic error occurred when creating sitemaps: These generate erroneous URLs, which can deliver content, duplicates, or HTTP errors.
How to fix: If you discover erroneous URLs caused by a syntax problem, work with your development team to find a solution.
- A syntactic error occurred when creating canonical tags: Erroneous URLs. These URLs might be delivering status codes 200 OK or error codes.
How to Repair: If you discover erroneous URLs caused by a syntax problem, work with your development team to find a solution.
- Important, high-quality sites that aren’t connected in your website structure: Some websites employ navigation pages (content lists such as category pages or internal search result pages) that are only linked when one or more criteria are satisfied. Sub-categories, for example, will display in a menu only if the list is not empty or exceeds a certain amount of items. There are several instances in which we may fail to connect to high-value sites, whether due to an error in automation or not.
How to Repair: The correct technique is to decide when a page no longer meets business requirements for organic traffic and then delete it once and for all: remove links and return HTTP 404 or 410. By that time, link it to some other pages on the website.
Orphan Pages With Expired Content
When pages expire, it might result in the creation of orphan pages. This is sometimes natural and anticipated. In other circumstances, it is abnormal, and it requires corrective action.
The HTTP status code distinguishes between expired content’s anticipated and unexpected orphan pages. There were links to the page when Google crawled, but the pages weren’t linked when Screaming Frog crawled. Then, when the content expires, the regular orphan page reports that it is no longer available (it returns HTTP 404 or 410); however, the abnormal one remains (it returns HTTP 200).
Here’s how to tell them apart in Screaming Frog:
- Normal orphan pages: The number of HTTP 404 pages will rapidly increase, while the number of HTTP 200 pages will remain relatively consistent.
- Abnormal orphan pages: The number of HTTP 200 orphan pages will continue to rise over time.
Finding and Repairing Orphan Pages
Search engines cannot index orphan sites unless they appear in your sitemap – and even if they do, they might cause additional SEO concerns.
After you’ve completed these procedures and located your orphan pages, ask yourself the following questions:
- How significant is this page? If it is, figure out how to include it. If it isn’t, take it out.
- Despite being an orphan page, does this page rank for any keywords? If it is, figure out how to include it. If it isn’t, take it out.
- Where should the page be located in the taxonomy of your website?
- Is this a duplicate or a near copy of another page? Consider incorporating the material onto a related page that isn’t an orphan.
- Is the page optimized? Can you improve or link it?
- Does the page have links to other websites?
Use the procedures mentioned in this post to locate your orphan pages and remedy this issue.