The main advantage of using XPath is that the element you copy doesn’t have to be unique. For example, if you want to copy a <div> but it has no class, it will still work. The disadvantage is that sometimes the XPath can vary from page to page, even if it is part of a template.

Since this guide is aimed at relative beginners, the remaining examples with use mainly regex examples. However, feel free to play around with XPath as it can be a lot more powerful if done right.

Thin Categories

One big problem for ecommerce websites is keeping track of thin or empty categories. If your website has a large number of categories which contain few or no products, search engines can see this as a sign of poor user experience.

Too many of these pages can cause your search rankings and conversion rate to suffer. Using a simple custom extraction can help to monitor these pages to ensure a healthy website architecture which benefits the user.

Have a look at this example from Tesco. The page displays the number of products within the category in the top left, like in the screenshot below:

Note: This is common on many ecommerce websites. If yours does not display this, it is worth making a request with your web developers.

You can easily find out how many products are in every category by creating a custom extraction for this element.

For this example, the regex required is <div class=”filter-productCount”>(.*?)</div> which can be obtained using the method described earlier based on the code in the screenshot below:

A crawl using this extraction will help you to identify how many products are in every category, thus allowing you to audit your overall architecture and decide if any pages need to be removed. Since thin pages can negatively affect SEO performance, it may be worth removing categories containing few or no products.

Alternatively, categories containing a large number of products could represent an opportunity to expand the website, create new sub-categories and increase overall organic visibility.

Checking Product Stock Levels

It is vitally important that users can buy the products which they see to prevent them trying to find the same product elsewhere. Whilst it is perfectly normal for a website to run out of a particular product, you need to be careful that they don’t take over the site.

For this example, a stock status example is displayed in the screenshot below:

Be careful, some websites will use different HTML code depending on whether or not the product is in stock. For this example, we will need to have 2 separate extractions running at the same time using the following regex queries:

<p class=”availability out-of-stock”><span>(.*?)</span></p>

<p class=”availability in-stock”><span>(.*?)</span></p>

Using both extractions together will provide a list of all the products which are in stock or out of stock. Too many out of stock products can signal a poor quality site and can negatively affect user engagement and search rankings. If a large percentage of products are out of stock then there may be a need to remove any which are unlikely to return to the website.

Duplicate Content

One common issue on ecommerce websites is duplicate product descriptions across a number of pages. This is really common where there are a number of similar products available in different sizes, styles or colours. This should be avoided but can be hard to keep track of on larger websites.

DeepCrawl has a built in feature which can find duplicate content for you. However, if you want to get results for a specific area of the page (such as the product description) or check for close matches then a custom extraction of the specific area of text is recommended.

For this example, I have used a product page from Pottery Barn below:

We want to check the highlighted text for duplication across the site. Using Inspect Element, we know that the text is contained within the following code:

Therefore, we want to run a custom extraction for the following regex:

<div class=”accordion-tab-copy”>(.*?)</div>

Once completed, use Excel to find duplicates or use a fuzzy lookup to find close matches. If a significant number of products are using duplicate or templated content, it will be worth auditing their search performance to see if they are being hindered.

If you find a multitude of products with the same or similar descriptions, it’s likely that they would benefit from being merged into a smaller number of configurable products. These allow a user to select their size or colour from a single page instead of navigating between different product pages.

GA Implementation

If you are using Google Analytics (other tracking tools are available) then it is vital that the tracking code is present on every page. Fortunately, there is a REALLY easy piece of regex code you can use to check this:

(UA-[0–9]+-[0–9]+)

If you are using DeepCrawl, this is actually one of the preset options, which makes it even simpler.

Alternatively, the same code will also work in Screaming Frog. If there are any pages with missing tracking code then the report will show a blank row. Easy peasy.

Number of Reviews / Review score

Reviews are a great way to boost your conversion rate or your click through rate if your structured data is set up correctly. If you have a large number of reviews on a site, a custom extraction could be an effective way to find areas of the site where reviews are either low in quantity or poor in quality. Once a list of issue products has been identified, a strategy can be produced to encourage customers to leave positive reviews in more important areas.

Most websites display their review scores in a variety of different ways, so have a play with your site and see what the best way to implement your regex code is.

For this example, the review is displayed on the website as below:

In this snippet, using inspect element, we learned that the number of reviews is contained within <p class=”rating-links”>(.*?)</p>. The rating score is generated using <div class=”rating” style=”width:93%”> where 93% is the average score which is then converted into a star rating. Therefore, we can use the regex <div class=”rating” style=”width:(.*?)”> to obtain the average review score for each product.

These are just a few of ways we use custom extractions at We Influence, but there is an incredible amount you can do with them to help improve the SEO and conversion performance of your website. If you have your own ideas for custom extractions or need any help setting them up then let us know.