Find Unusual URL Patterns Using Regex URL Validation Guide

Find Unusual URL Patterns Using Regex URL Validation Guide

As SEOs, we are quite familiar with seo friendly urls and how to optimize and maintain them. Typically SEO friendly URLs are of the following formats.

url validation guide

https://www.domain.com

https://domain.com

http://domain.com

http://www.domain.com

https://domain.com/category/subcategory/page

And so on…

You get a general idea.

However actually maintaining SEO-friendly URLs can be a painstaking task, especially for giant commercial websites like ecommerce. Sometimes parameter-based urls are almost unavoidable.

However, as SEOs, we must always have the means to identify these unusual URLs in order to minimize their presence.

In this exercise, we will be learning how to identify these Unusual URLs in your website using Google search console and Regex.

What is Regex?

In its most basic definition, a Regex is a string of text that allows you to create patterns that help match, locate, and manage text. This helps allows us to search for various URLs, and queries in the Search Console and other Regex-supported platforms based on various custom-made parameters.

For example, we will be finding specific URLs which are not SEO friendly. Such kinds of urls usually contain parameters with useful information like pricing, paginations, values, etc.

Here’s an Amazon product page url. You will get the idea.

Example URL:

How to Find Unusual URL patterns?

Step 1: Open Search Console and go to the Performance Tab

It is often referred to as the search results tab.

Step 2: Click on New > Page. A dialog box will open.

Step 3: Select “Custom (regex)” as the filter. And select “Doesn’t match regex” in the next drop-down list.

 Now enter the following Regex code: 

^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\’\/\\\+&%\$#_]*)?$

Step 4: Hit Apply and you will obtain all unusual URLs currently present on your website.

Here’s a Sample Result:

Explanation

So it’s actually to define a Regex for an unusual url since an unusual url can have various patterns. So the best way is to define a regex for all defined and valid url patterns.

The following regex serves that exact purpose:

^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\’\/\\\+&%\$#_]*)?$

We can test this in a regex tester tool. As you can see in the picture below different formats of the valid urls are matched by this regex pattern.

But as you can see in the pics below, all parameterized URLs are not getting matched.

Hence, we simply used the above expression a filtered all URLs that don’t satisfy it. That way all invalid URLs are shown in the results.

Leave a Reply

Your email address will not be published.