When used creatively, XPaths can help improve the efficiency of auditing large websites. Consider this another tool in your SEO toolbelt.
There are endless types of information you can unlock with XPaths, which can be used in any category of online business.
Some popular ways to audit large sites with XPaths include:
In this guide, we’ll cover exactly how to perform these audits in detail.
What Are XPaths?
Simply put, XPath is a syntax that uses path expressions to navigate XML documents and identify specified elements.
This is used to find the exact location of any element on a page using the HTML DOM structure.
We can use XPaths to help extract bits of information such as H1 page titles, product descriptions on ecommerce sites, or really anything that’s available on a page.
While this may sound complex to many people, in practice, it’s actually quite easy!
How to Use XPaths in Screaming Frog
In this guide, we’ll be using Screaming Frog to scrape webpages.
Screaming Frog offers custom extraction methods, such as CSS selectors and XPaths.
It’s entirely possible to use other means to scrape webpages, such as Python. However, the Screaming Frog method requires far less coding knowledge.
(Note: I’m not in any way currently affiliated with Screaming Frog, but I highly recommend their software for web scraping.)
Step 1: Identify Your Data Point
Figure out what data point you want to extract.
For example, let’s pretend Search Engine Journal didn’t have author pages and you wanted to extract the author name for each article.
What you’ll do is:
Right-click on the author name.
In the dev tools elements panel, you will see your element already highlighted.
Right-click the highlighted HTML element and go to Copy and select Copy XPath.
At this point, your computer’s clipboard will have the desired XPath copied.
Step 2: Set up Custom Extraction
In this step, you will need to open Screaming Frog and set up the website you want to crawl. In this instance, I would enter the full Search Engine Journal URL.
Go to Configuration > Custom > Extraction
This will bring up the Custom Extraction configuration window. There are a lot of options here, but if you’re looking to simply extract text, match your configuration to the screenshot below.
Step 3: Run Crawl & Export
At this point, you should be all set to run your crawl. You’ll notice that your custom extraction is the second to last column on the right.
When analyzing crawls in bulk, it makes sense to export your crawl into an Excel format. This will allow you to apply a variety of filters, pivot tables, charts, and anything your heart desires.
3 Creative Ways XPaths Help Scale Your Audits
Now that we know how to run an XPath crawl, the possibilities are endless!
We have access to all of the answers, now we just need to find the right questions.
What are some aspects of your audit that could be automated?
Are there common elements in your content silos that can be extracted for auditing?
What are the most important elements on your pages?
The exact problems you’re trying to solve may vary by industry or site type. Below are some unique situations where XPaths can make your SEO life easier.
1. Using XPaths with Redirect Maps
Recently, I had to redesign a site that required a new URL structure. The former pages all had parameters as the URL slug instead of the page name.
This made creating a redirect map for hundreds of pages a complete nightmare!
So I thought to myself, “How can I easily identify each page at scale?”
After analyzing the various page templates, I came to the conclusion that the actual title of the page looked like an H1 but was actually just large paragraph text. This meant that I couldn’t just get the standard H1 data from Screaming Frog.
However, XPaths would allow me to copy the exact location for each page title and extract it in my web scraping report.
In this case I was able to extract the page title for all of the old URLs and match them with the new URLs through the VLOOKUP function in Excel. This automated most of the redirect map work for me.
With any automated work, you may have to perform some spot checking for accuracy.
2. Auditing Ecommerce Sites with XPaths
Sometimes, stakeholders will need product level audits on an ad hoc basis. Sometimes this covers just categories of products, but sometimes it may be the entire site.
Using the XPath extraction method we learned earlier in this article, we can extract all types of data including:
And much more
This can help identify products that may be lacking valuable information within your ecommerce site.
The cool thing about Screaming Frog is that you can extract multiple data points to stretch your audits even further.
3. Auditing Blogs with XPaths
This is a more common method for using XPaths. Screaming Frog allows you to set parameters to crawl specific subfolders of sites, such as blogs.
However, using XPaths, we can go beyond simple meta data and grab valuable insights to help identify content gap opportunities.
Categories & Tags
One of the most common ways SEO professionals use XPaths for blog auditing is scraping categories and tags.
This is important because it helps us group related blogs together, which can help us identify content cannibalization and gaps.
This is typically the first step in any blog audit.
This step is a bit more Excel-focused and advanced. How this works, is you set up an XPath extraction to pull the body copy out of each blog.
Fair warning, this may drastically increase your crawl time.
Whenever you export this crawl into Excel, you will get all of the body text in one cell. I highly recommend that you disable text wrapping, or your spreadsheet will look terrifying.
Next, in the column to the right of your extracted body copy, enter the following formula:
In this formula, A1 equals the cell of the body copy.
To scale your efforts, you can have your “keyword” equal the cell that contains your category or tag. However, you may consider adding multiple columns of keywords to get a more accurate and robust picture of your blogging performance.
Google’s smart bidding strategies use a host of signals to inform bids with each auction. Now, Google is starting to show which signals are driving performance to optimize bids for people more or less likely to convert.
Top signals. The signals shown might include device type, location, day of week, time of day, keywords, remarketing and Customer Match lists and potentially some other signals. You might also see combinations of signals such as time and keyword. Signals in red are less likely to convert in that strategy, while signals in green are more likely to convert.
Where to see top signals reporting. The top signals will show in the bid strategy report. Keep in mind, that report is only available for portfolio bid strategies. The bid strategy report is located from Tools > Shared Library > Bid Strategies. Then select a portfolio strategy.
Google said it will show for Target CPA and Maximize conversions on Search, but you may be able to see top signals for other portfolio strategies. The example above is just for eCPC, in fact.
Why we care. Understanding which contextual signals have particular influence on your automated bidding can give you insights into your target customers and potentially inform your strategy. For example, if you see a keyword being “down signaled,” it may just be a poor match for that particular bid strategy, or perhaps there are ad or landing page optimizations you could make to improve its likelihood to convert.
You might also see trends that can inform other marketing efforts such as email send times. The screenshot above, for example, shows weekends are a strong signal. That could be a good time to test email flights rather than on weekdays.
More about pay-per-click advertising
About The Author
Ginny Marvin is Third Door Media’s Editor-in-Chief, running the day to day editorial operations across all publications and overseeing paid media coverage. Ginny Marvin writes about paid digital advertising and analytics news and trends for Search Engine Land, Marketing Land and MarTech Today. With more than 15 years of marketing experience, Ginny has held both in-house and agency management positions. She can be found on Twitter as @ginnymarvin.
Google is making it possible to use the Assistant (via Duplex) to buy movie tickets online. Back in May at Google I/O, the company announced that it was expanding the AI-powered Duplex beyond restaurant reservations to booking rental cars and buying movie tickets.
Duplex on the web. Called “Duplex on the web,” users will be able to use the Google Assistant for new reservations and purchase categories. Movies is the latest example.
As shown below, Android users in the U.S. or U.K. can ask the Assistant for movie showtimes or search movies in the Google app. The Assistant will then lead searchers through a “buy tickets” process that involves theater selection, movie times and, if available, seat selection. A saved payment card needs to be in Chrome to work in this case.
Expanding to many more categories. It’s not clear that users will prefer this process to manually booking tickets. However, it illustrates how Google is bringing the sophistication of its Duplex technology to the broader mobile internet.
It’s also not clear how much back end integration needs to be done by publishers to enable this; I suspect not that much. Regardless, I’m sure Google has a roadmap that extends to many other categories where online scheduling, reservations and basic transactions are involved.
Rand Fishkin has been speaking, including at SMX East, about how Google has evolved from “everyone’s search engine to everyone’s competitor” and the SEO implications of this. My view is a bit different.
Why we should care. Google has now talked repeatedly about “helping users get things done in search and with the Google Assistant. This is about making search more transactional and owning the transaction. Google is doing this in shopping and across the board in local (e.g., food ordering).
Google is trying to remove friction and compress the process between search and a sale. It’s handing that process off much less and less to third parties and site owners. This helps Google 1) improve the consumer experience, 2) keep users within its system, 3) create a closed loop for analytics and 4) generate fees or revenue from commerce, which has implications for smart speakers.
If these capabilities (i.e., Duplex on the web) take off, publishers and brands will need to be partnered or integrated with Google actions/services or risk losing the transaction to a competitor. It will also mean that Google owns the customer.
About The Author
Greg Sterling is a Contributing Editor at Search Engine Land. He writes about the connections between digital and offline commerce. He previously held leadership roles at LSA, The Kelsey Group and TechTV. Follow him Twitter or find him on LinkedIn.
Google announced it has added new filters to the performance report within Google Search Console to show you how well your product results are doing in search. Google now captures and displays click and impression data when rich results display based on your use of product rich results markup.
The report. Find this data under the Performance report by clicking on “search appearance” and then on “product results.” You’ll see clicks and impressions and can further segment by device, geography and queries.
What it looks like. Here is a screen shot of the report:
What is a product rich result? Below is a screenshot of what a product rich result looks like, but you can learn more about this in this developer document. Product rich results typically show product ratings, price, availability and some description information. Note that product rich results are not new, just the report in Search Console.
Why we care. The more data the better for SEOs and publishers, and this gives us more granular data on the impact of us adding product rich result markup to our pages. Google said this will show you how much traffic comes from experiences with rich data like price and availability and how does shopping traffic change over time, and the shopping search queries your website shows.
About The Author
Barry Schwartz is Search Engine Land’s News Editor and owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on SEM topics.