I’m a search geek. I read through patents that provide hints and possible glimpses behind the curtains of search engines like they are novels.
I look for patents from specific inventors, like people who might keep their eyes open for news of a new Marvel movie.
Patents don’t always provide actionable insights, but they do suggest questions and possible things to look out for or understand how search engines may be working, or even to test.
I found a patent this summer which reminded me of the concept of a sea change and how search results could transform and undergo a sea change.
One of the inventors I watch out for is Trystan Upstill, at one point the Head of Core Web Ranking and Mobile Content Search at Google.
He has been involved in some of the more interesting patents and processes at Google, like one I wrote about on How Google May Rank Some Results Based on Categorical Quality.
If you read about that one, you may see some similarities to the patent I am writing about today.
He writes about things that we may never visibly notice, that sort of happen behind the scenes (or curtains), and decide upon which pages may fill the search results we see in response to a query.
A newly granted (July 2, 2019) patent from Google has his name on it as one of the inventors, and it was filed when he was still the head of Core Web Ranking at Google back in 2015.
Adjusted Search Features
The patent starts out simply enough, by telling us:
“The search system ranks the resources based on their relevance to the query and importance and provides search results that link to the identified resources, and orders the search results according to the rank.”
The results shown are responsive to a query, and the search engines look at features of a webpage that query may appear upon and other aspects of that query, and possibly other information when determining search scores for the resources that appear in SERPs.
But most patents describe a problem that they report upon, and that problem explains the need for a patent to have been written, with an invented process that might address that problem.
Sometimes a patent will also tell us about the state of the technology at the time that patent was also written. Here is the problem, and the state of the technology as described in the summary section of the patent:
“Typically the search operation implements a robust search algorithm that performs well over a wide variety of resources. However, sometimes particular features for a particular query and a particular set of resources may be quite important in determining the search scores for the resources, while for other queries the particular features may be much less important. For example, for a particular query with certain terms, the presence of those terms in the resources may have a very strong impact on the search scores for the resources; conversely, for another query with different terms, the relative importance of the resources in an authority graph may have a much stronger impact on the search scores than the presence of query terms in the resources.
However, the relative importance of particular features for particular queries and resources is often difficult, if not impossible, to predict a priori.”
What these changes to features a page is ranked upon may mean is that in response to them, sometimes Google might adjust search features and rescore resources after a while.
The process behind the patent can include:
Receiving data that indicates resources identified by a search operation that are responsive to a query and ranked according to a first-order, each resource having a corresponding search score by which the resources are ranked in responsiveness to the query relative to the other resources identified by the search operation as being responsive to the query, wherein the search operation scores each of the resources based, in part, on features of the resource and the query, selecting a set of the resources.
Determining, from the SERPs and for each of the features of the resources and the query, an impact measure that measures the impact of the feature on the ranking of the resources that belong to the set.
Re-scoring the resources for the query in the SERPs based, in part, on the impact measures and ranking the set of resources according to a second-order that is different from the first order.
Providing, to a searcher in response to the query, search results according to the second-order, each search result identifying a corresponding resource.
Many patents include a section in their summary that lists what they refer to as “advantages” for using the process described in the patent. They are a forecast of what the expected outcome of the patent might be.
For this patent the expected advantages include:
Search operations may be adjusted to compensate for emergent phenomena that affect resource scoring.
Those adjustments may be determined at query time so that the foundational search operation need not be adjusted, and thus foundational search operation be built on known priors.
This approach allows for the retention of the foundational search operation that performs well for most resources in a corpus given a set of known priors, but also provides flexibility to adjust the search operation on a per-query basis when particular features affect the ranking of resources in a way that departs from the expected effects.
The re-ranking of resources resulting from scoring pursuant to the adjusted search operation tends to surface more prominent resources that are more likely to satisfy a user’s informational need, thereby increasing the quality of the overall user experience.
The ultimate goal is expressed there as providing resources that are “more likely to satisfy a user’s informational need, thereby increasing the quality of the overall user experience.”
This adjusted search features patent can be found at:
Search operation adjustment and re-scoring Inventors: Trystan G. Upstill, Andre Duque Madeira, Wisam Dakka and Zhong Xiu Assignee: Google LLC US Patent: 10,339,144 Granted: July 2, 2019 Filed: May 21, 2015
“Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving queries, and for each received query: receiving data indicating resources identified by a search operation as being responsive to the query, wherein the search operation scores each of the resources based, in part, on features of the resource and the query, selecting a subset of the resources, determining, from the subset of resources and for each of the features of the resources and the query, an impact measure that measures the impact of the feature on the ranking of the resources that belong to the subset, adjusting the search operation based on the respective impact measures, and initiating the search operation to re-score the resources in the subset of resources based, in part, on the adjustment and to rank the subset of resources according to a second-order that is different from the first order.”
More on Adjusted Search Features That May Change Search Engine Scores
I mentioned search engine scores that may be created according to “multiple features of the resource and the query.” These features could be related to:
Information retrieval, such as features related to recall and precision.
The relative authority of a resource in a resource graph.
The query terms.
User feedback of the resource given a query and other queries.
The patent tells us that “these features may be modeled in the search engine as parameters, and various parameter values may be selected for each parameter.”
How these search features are valued may be part of the what makes search engine scores work well. They give us an example:
“For example, with respect to a resources authority score, a parameter value may be a weight by which a feature value for the resource–the authority score–is multiplied or otherwise adjusted; with respect to resource terms and query terms,
Parameter values may include synonyms, related terms, and weights by which matches of terms and term counts are multiple or otherwise adjusted; and so on.”
So according to this patent, search could be a very complex process that looks to multiple types of scoring contributions of different types based upon a number of different types of parameters which could be related to features from web resources on the content of a query.
The search operation, once built, tends to perform well over a wide variety of search queries and documents. This could present some issues that need to be overcome, and the patent describes those for us.
It tells us that:
Some features may exhibit much more influence on the scoring of the resources than for other queries and other resources
Some features may exhibit much less influence on the scoring of the resources than for other queries and other resources
When a subject is a fairly new one on the Web (which they are referring to as an “emergent subject”), Some aspects of a score may have more impact than others:
“Furthermore, such influences may be evanescent; for example, for an emergent subject, an information retrieval score may be more influential for the first several weeks, and then, at a later time, authority scores and user feedback scores may tend to grow in influence. Thus, tuning a search operation to compensate for these features is difficult prior to their detection, if not impossible.”
So, the focus of this patent is on “when certain features exhibit greater or lesser impacts on the ranking of resources for a search operation for a query and then adjust a search operation based on the impacts.”
If you’ve ever ranked a page in a fairly new subject area, and one day the search results that it appears in all of a sudden seem to shift around and change (undergoing a sea change), the next paragraph from the patent could explain why that might happen as search results get adjusted:
“The adjusted search operation is the re-run on the identified resources to re-rank the resources in a manner that takes into account the detected impacts. In some implementations, an initial search for a query is executed, and a proper subset of the ranked resources, e.g., the top N ranked resources, is processed to determine appropriate modifications to the search operation. The search operation, adjusted by the appropriate modifications, is then re-run to re-score and re-rank the resources.”
When I read the next paragraph in the patent, I was reminded of a post that Jason Barnard wrote about ranking at Google, based upon information he had received from Gary Illyes, Webmaster Trends Analyst at Google, which he wrote about in How Google Search Ranking Works – Darwinism in Search:
“The search engine utilizes a search operation that generates search scores for the resources and ranks the resources based on search scores. The search operation quantifies the relevance of the resources to the query, and the quantification can be based on a variety of factors. Such factors include information retrieval (“IR”) scores, user feedback scores, and optionally a separate ranking of each resource relative to other resources (e.g., an authority score). The search results are ordered in a first-order according to these search scores and provided to the user device according to the first order, or, in some situations, may be re-ranked by an adjusted search operation and provided to the user device as search results’ ranked according to a second-order that is different from the first order.”
This patent also tells us about feedback scores based upon information from query logs and click logs:
“In some implementations, the queries submitted from user devices are stored in query logs. Click data for the queries and the web pages referenced by the search results are stored in click logs. The query logs and the click logs define search history data that include data from and related to previous search requests. The query logs and click logs can be used to map queries submitted by the user devices to web pages that were identified in search results and the actions taken by users. The click logs and query logs can thus be used by the search system to determine queries submitted by the user devices, the actions taken in response to the queries, and how often the queries are submitted. Such information can be stored as feedback scores for the queries and resources.”
And Then There Is Reranking of Results, or Adjusted Search Features
This is part of an adjustment of results as has been described in the patent when there may be shifts in the values that results were scored upon to modify search results:
“…the re-ranking engine, for each query, processes resources identified by a search operation as being responsive to the query and ranked according to the first order, selects a proper subset of the resources, and determines, for each feature the search operation takes into account, an impact measure that measures the impact of the feature on the ranking of the resources. The re-ranking engine can then adjust the search operation based on the respective impact measures, and initiate a subsequent run of the search operation to re-score the resources based, in part, on the adjustment, resulting in the search results’.”
When search results are ranked, the influence of each feature involved in ranking those is calculated, and any changes to those features may be measured by their impact.
If the impact doesn’t meet a threshold, then the re-ranking engine will not rerank the search results. If it does meet that threshold, then the results will be re-ranked.
The patent provides this peek at how reranking might take place, when Google decides to use adjusted search features.
“…then the process adjusts the search operation based on the impact measures (314). A variety of adjustments can be used. For example, depending on a category of the query, the search algorithm may be adjusted in different ways. By way of one example, if a query is categorized as being a “product” seeking query, then a relevance weight parameter value related to certain commercial content, such as reviews, pricing information, etc., may be increased; conversely, if a query is categorized as being an “informational” seeking query, then the relevance weight parameter value related to certain commercial content, such as reviews, pricing information, may be decreased, while a relevance weight parameter value related to anchor text linking to the resource may be increased, etc.”
And synonyms may play a role as well:
“…if an impact measure related to synonym matching terms is high, then the feature of query expansion may be adjusted such that a more aggressive form of query expansion is used.”
Adjusted Search Features Takeaways
The article that Barnard wrote names specific types of features that may be used to rank pages, such as topicality, quality, speed, RankBrain, entities, structured data, freshness.
Those aren’t described in this patent or discussed in any detail, but they do seem like they could be features of ranked resources or queries that could influence how a page may be ranked, which are mentioned in this patent.
If you haven’t had a chance to read Barnard’s post, I would recommend it. I read it around the same time that I first saw this patent, and I highlighted the paragraph from this patent that tells us that pages may be ranked based upon a variety of factors.
While this patent doesn’t tell us the same factors that Barnard was told, the idea that multiple factors may be involved in ranking pages at Google is one worth exploring in more detail, if you can.
What this patent adds to what Barnard told us was that Google may, upon seeing changes in the impact of different ranking signals that it may have used to rank a page beyond a certain threshold, Google may adjust rankings by applying a reranking process.
So, if you see the results that you have gotten used to for a particular query that you have been following, knowing the SERP place around that query well, and who else occupies positions in that SERP place, and you may suddenly see it shift around and change.
It is possible that Google may have adjusted search features and changed those results because the impact of ranking signals for those features may have changed.
You can think of Tobiko as a kind of anti-Yelp. Launched in 2018 by Rich Skrenta, the restaurant app relies on data and expert reviews (rather than user reviews) to deliver a kind of curated, foodie-insider experience.
A new Rich Skrenta project. Skrenta is a search veteran with several startups behind him. He was one of the founders of DMOZ, a pioneering web directory that was widely used. Most recently Skrenta was the CEO of human-aided search engine Blekko, whose technology was sold to IBM Watson in roughly 2015.
At the highest level, both DMOZ and Blekko sought to combine human editors and search technology. Tobiko is similar; it uses machine learning, crawling and third-party editorial content to offer restaurant recommendations.
Betting on expert opinion. Tobiko is also seeking to build a community, and user input will likely factor into recommendations at some point. However, what’s interesting is that Skrenta has shunned user reviews in favor of “trusted expert reviews” (read: critics).
Those expert reviews are represented by a range of publisher logos on profile pages that, when clicked, take the user to reviews or articles about the particular restaurant on those sites. Where available, users can also book reservations. And the app can be personalized by engaging a menu of preferences. (Yelp recently launched broad, site-wide personalization itself.)
While Skrenta is taking something of a philosophical stand in avoiding user reviews, his approach also made the app easier to launch because expert content on third-party sites already existed. Community content takes much longer to reach critical mass. However, Tobiko also could have presented or “summarized” user reviews from third-party sites as Google does in knowledge panels, with TripAdvisor or Facebook for example.
Tobiko is free and currently appears to have no ads. The company also offers a subscription-based option that has additional features.
Why we should care. It’s too early to tell whether Tobiko will succeed, but it provocatively bucks conventional wisdom about the importance of user reviews in the restaurant vertical (although reading lots of expert reviews can be burdensome). As they have gained importance, reviews have become somewhat less reliable, with review fraud on the rise. Last month, Google disclosed an algorithm change that has resulted in a sharp decrease in rich review results showing in Search.
Putting aside gamesmanship and fraud, reviews have brought transparency to online shopping but can also make purchase decisions more time-consuming. It would be inaccurate to say there’s widespread “review fatigue,” but there’s anecdotal evidence supporting the simplicity of expert reviews in some cases. Influencer marketing can be seen as an interesting hybrid between user and expert reviews, though it’s also susceptible to manipulation.
About The Author
Greg Sterling is a Contributing Editor at Search Engine Land. He writes about the connections between digital and offline commerce. He previously held leadership roles at LSA, The Kelsey Group and TechTV. Follow him Twitter or find him on LinkedIn.
When used creatively, XPaths can help improve the efficiency of auditing large websites. Consider this another tool in your SEO toolbelt.
There are endless types of information you can unlock with XPaths, which can be used in any category of online business.
Some popular ways to audit large sites with XPaths include:
In this guide, we’ll cover exactly how to perform these audits in detail.
What Are XPaths?
Simply put, XPath is a syntax that uses path expressions to navigate XML documents and identify specified elements.
This is used to find the exact location of any element on a page using the HTML DOM structure.
We can use XPaths to help extract bits of information such as H1 page titles, product descriptions on ecommerce sites, or really anything that’s available on a page.
While this may sound complex to many people, in practice, it’s actually quite easy!
How to Use XPaths in Screaming Frog
In this guide, we’ll be using Screaming Frog to scrape webpages.
Screaming Frog offers custom extraction methods, such as CSS selectors and XPaths.
It’s entirely possible to use other means to scrape webpages, such as Python. However, the Screaming Frog method requires far less coding knowledge.
(Note: I’m not in any way currently affiliated with Screaming Frog, but I highly recommend their software for web scraping.)
Step 1: Identify Your Data Point
Figure out what data point you want to extract.
For example, let’s pretend Search Engine Journal didn’t have author pages and you wanted to extract the author name for each article.
What you’ll do is:
Right-click on the author name.
In the dev tools elements panel, you will see your element already highlighted.
Right-click the highlighted HTML element and go to Copy and select Copy XPath.
At this point, your computer’s clipboard will have the desired XPath copied.
Step 2: Set up Custom Extraction
In this step, you will need to open Screaming Frog and set up the website you want to crawl. In this instance, I would enter the full Search Engine Journal URL.
Go to Configuration > Custom > Extraction
This will bring up the Custom Extraction configuration window. There are a lot of options here, but if you’re looking to simply extract text, match your configuration to the screenshot below.
Step 3: Run Crawl & Export
At this point, you should be all set to run your crawl. You’ll notice that your custom extraction is the second to last column on the right.
When analyzing crawls in bulk, it makes sense to export your crawl into an Excel format. This will allow you to apply a variety of filters, pivot tables, charts, and anything your heart desires.
3 Creative Ways XPaths Help Scale Your Audits
Now that we know how to run an XPath crawl, the possibilities are endless!
We have access to all of the answers, now we just need to find the right questions.
What are some aspects of your audit that could be automated?
Are there common elements in your content silos that can be extracted for auditing?
What are the most important elements on your pages?
The exact problems you’re trying to solve may vary by industry or site type. Below are some unique situations where XPaths can make your SEO life easier.
1. Using XPaths with Redirect Maps
Recently, I had to redesign a site that required a new URL structure. The former pages all had parameters as the URL slug instead of the page name.
This made creating a redirect map for hundreds of pages a complete nightmare!
So I thought to myself, “How can I easily identify each page at scale?”
After analyzing the various page templates, I came to the conclusion that the actual title of the page looked like an H1 but was actually just large paragraph text. This meant that I couldn’t just get the standard H1 data from Screaming Frog.
However, XPaths would allow me to copy the exact location for each page title and extract it in my web scraping report.
In this case I was able to extract the page title for all of the old URLs and match them with the new URLs through the VLOOKUP function in Excel. This automated most of the redirect map work for me.
With any automated work, you may have to perform some spot checking for accuracy.
2. Auditing Ecommerce Sites with XPaths
Sometimes, stakeholders will need product level audits on an ad hoc basis. Sometimes this covers just categories of products, but sometimes it may be the entire site.
Using the XPath extraction method we learned earlier in this article, we can extract all types of data including:
And much more
This can help identify products that may be lacking valuable information within your ecommerce site.
The cool thing about Screaming Frog is that you can extract multiple data points to stretch your audits even further.
3. Auditing Blogs with XPaths
This is a more common method for using XPaths. Screaming Frog allows you to set parameters to crawl specific subfolders of sites, such as blogs.
However, using XPaths, we can go beyond simple meta data and grab valuable insights to help identify content gap opportunities.
Categories & Tags
One of the most common ways SEO professionals use XPaths for blog auditing is scraping categories and tags.
This is important because it helps us group related blogs together, which can help us identify content cannibalization and gaps.
This is typically the first step in any blog audit.
This step is a bit more Excel-focused and advanced. How this works, is you set up an XPath extraction to pull the body copy out of each blog.
Fair warning, this may drastically increase your crawl time.
Whenever you export this crawl into Excel, you will get all of the body text in one cell. I highly recommend that you disable text wrapping, or your spreadsheet will look terrifying.
Next, in the column to the right of your extracted body copy, enter the following formula:
In this formula, A1 equals the cell of the body copy.
To scale your efforts, you can have your “keyword” equal the cell that contains your category or tag. However, you may consider adding multiple columns of keywords to get a more accurate and robust picture of your blogging performance.
Over the almost 16-years of covering search, specifically what Googlers have said in terms of SEO and ranking topics, I have seen my share of contradictory statements. Google’s ranking algorithms are complex, and the way one Googler explains something might sound contradictory to how another Googler talks about it. In reality, they are typically talking about different things or nuances.
Some of it is semantics, some of it is being literal in how one person might explain something while another person speaks figuratively. Some of it is being technically correct versus trying to dumb something down for general practitioners or even non-search marketers to understand. Some of it is that the algorithm can change over the years, so what was true then has evolved.
Does it matter if something is or is not a ranking factor? It can be easy to get wrapped up in details that end up being distractions. Ultimately, SEOs, webmasters, site owners, publishers and those that produce web pages need to care more about providing the best possible web site and web page for the topic. You do not want to chase algorithms and racing after what is or is not a ranking factor. Google’s stated aim is to rank the most relevant results to keep users happy and coming back to the search engine. How Google does that changes over time. It releases core updates, smaller algorithm updates, index updates and more all the time.
For SEOs, the goal is to make sure your pages offer the most authoritative and relevant content for the given query and can be accessed by search crawlers.
When it is and is not a ranking factor. An example of Googlers seeming to contradict themselves popped this week.
Gary Illyes from Google said at Pubcon Thursday that content accuracy is a ranking factor. That raised eyebrows because in past Google has seemed to say content accuracy is not a ranking factor. Last month Google’s Danny Sullivan said, “Machines can’t tell the ‘accuracy’ of content. Our systems rely instead on signals we find align with relevancy of topic and authority.” One could interpret that to mean that if Google cannot tell the accuracy of content, that it would be unable to use accuracy as a ranking factor.
Upon closer look at the context of Illyes comments this week, it’s clear he’s getting at the second part of Sullivan’s comment about using signals to understand “relevancy of topic and authority.” SEO Marie Haynes captured more of the context of Illyes’ comment.
Illyes was talking about YMYL (your money, your life) content. He added that Google goes through “great lengths to surface reputable and trustworthy sources.”
He didn’t outright say Google’s systems are able to tell if a piece of content is factually accurate or not. He implied Google uses multiple signals, like signals that determine reputations and trustworthiness, as a way to infer accuracy.
So is content accuracy a ranking factor? Yes and no. It depends if you are being technical, literal, figurative or explanatory. When I covered the different messaging around content accuracy on my personal site, Sullivan pointed out the difference, he said on Twitter “We don’t know if content is accurate” but “we do look for signals we believe align with that.”
It’s the same with whether there is an E-A-T score. Illyes said there is no E-A-T score. That is correct, technically. But Google has numerous algorithms and ranking signals it uses to figure out E-A-T as an overall theme. Sullivan said on Twitter, “Is E-A-T a ranking factor? Not if you mean there’s some technical thing like with speed that we can measure directly. We do use a variety of signals as a proxy to tell if content seems to match E-A-T as humans would assess it. In that regard, yeah, it’s a ranking factor.”
You can see the dual point Sullivan is making here.
The minutiae. When you have people like me, who for almost 16 years, analyze and scrutinize every word, tweet, blog post or video that Google produces, it can be hard for a Google representative to always convey the exact clear message at every point. Sometimes it is important to step back, look at the bigger picture, and ask yourself, Why is this Googler saying this or not saying that?
Why we should care. It is important to look at long term goals, and as I said above, not chase the algorithm or specific ranking factors but focus on the ultimate goals of your business (money). Produce content and web pages that Google would be proud to rank at the top of the results for a given query and other sites will want to source and link to. And above all, do whatever you can to make the best possible site for users — beyond what your competitors produce.
About The Author
Barry Schwartz is Search Engine Land’s News Editor and owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on SEM topics.