Extensive XPath SEO Guide For Data Scraping 2021

Extensive XPath SEO Guide For Data Scraping 2021

What is XPath?

XML Path (XPath) is a query language developed by W3 to navigate XML documents and select specified nodes of data. This definitive XPath SEO guide will make you understand the entire architecture in terms of SEO.

xpath seo guide

Use of XPath in SEO

This option allows you to scrape data by using XPath selectors, including attributes.

How to Find XPath For a Website

Easiest way to find XPath is using Chrome’s Inspect Tool. Here’s how:

Select desired section of the website for which you want to find the XPath, then right click on it and select Inspect.

Once you have the source, then you can right click an element and select Copy > Copy XPath.

🔶 Then Run Screaming Frog Tool

From the top menu navigation, select Configuration > Custom > Extraction

🔶 Then paste the copied element in the XPath section as shown in the above screenshot and make sure the option should be selected as Extract Text.

🔶 Next, crawl the website on Screaming Frog.

After that, view the scraped data under the Custom Extraction Tab which we set on the previous section in the Extractor 1. We picked the <H2> section of the site to get the details of scrapped data.

X Path Cheat Sheet

Basic Xpaths

ELEMENTXPATH FOR SCREAMING FROGEXTRACTION
Any element//*Extract Text
Any <p> element//pExtract Text
Any <div> element//divExtract Text
Any element with class “example”//*[@class=’example’]Extract Text
The whole webpage/htmlExtract Inner HTML
All webpage body/html/bodyExtract Inner HTML
All text//text()Extract Text
All links//@hrefExtract Text
Links with specific anchor text “example”//a[contains(., ‘example’)]/@hrefExtract Text
Email Addresses//a[starts-with(@href, ‘mailto’)]Extract Text

Elements can have different classes and IDs, however, there are usually some basic XPaths you can scrape that account for most site formatting.

XPath for SEO

ELEMENTXPATHEXTRACTION
H3//h3Extract Text
H3 with specific text “example”//h3[contains(text(), “example”)]Extract Text
Count of H3scount(//h3)Function
Full hreflang (link + value)//*[@hreflang]Extract Text
Hreflang values//*[@hreflang]/@hreflangExtract Text
Types of Schema//*[@itemtype]/@itemtypeExtract Text
Schema itemprop rules//*[@itemprop]/@itempropExtract Text

Conclusion

When the progress bar reaches ‘100%’, the crawl has finished and you can choose to ‘export’ the data using the ‘export’ buttons.

Here in this XPath SEO guide analysis, we have extracted the headings (H2) of the site as shown in the exported excel screenshot: