Connect with us


Google’s Site Quality Algorithm Patent



A Google patent describes a method of classifying sites as low quality by ranking the links. The algorithm patent is called, Classifying Sites as Low Quality Sites.  The patent names specific factors for identifying low quality sites.

It’s worthwhile to learn these factors and consider them. There’s no way to know if they are in use. But the factors themselves can help improve SEO practices, regardless if Google is using the algorithm or not.

An Obscure Link Algorithm

This patent dates from 2012 to 2015. It corresponds to the time that Penguin was first released.

There have only been a few discussions of this algorithm. It has, in my opinion, not been discussed in the detail offered below. As a consequence, it seems that many people may not be aware of it.

I believe this is an important algorithm to understand. If any parts of it are in use, then it could impact the SEO process.

Just Because it’s Patented…

What must be noted in any discussions of patents or research papers is that just because it’s patented does not mean it’s in use. I would also like to point out that this patent dates from 2012 to 2015. This corresponds to the time period of the Penguin Algorithm.

There is no evidence that this is a part of the Penguin Algorithm. But it is interesting because it is one of the few link ranking algorithms we know about from Google. Not a site ranking algorithm, a link ranking algorithm. That quality makes this particular algorithm especially interesting.

Although this algorithm may or may not be use, I believe that it is worthwhile to understand what is possible. Knowing what is possible can help you better understand what is not possible or likely. And once you know that you are better able to spot bad SEO information.

How the Algorithm Ranks Links

The algorithm is called Classifying Sites as Low Quality. It works by ranking links, not the content itself. The underlying principle can be said to be that if the links to a site are low quality then the site itself must be low quality.

This algorithm may be resistant to spammy scraper links because it only comes into play after the ranking algorithm has done it’s work. It’s the ranking algorithm that includes Penguin and other link related algorithms. So once the ranking engine has ranked sites, the link data that this algorithm uses will likely be filtered and represent a reduced link graph. A reduced link graph is a map of the links to and from sites that have had all the spam connections removed.

The algorithm ranks the links according to three ranking scores. The patent calls these scores, “quality groups.”

The scores are named Vital, Good, and Bad.

Obviously, the Vital score is the highest, Good is medium and Bad is not good (so to speak!).

The algorithm will then take all the scores and compute a total score. If this score falls below a certain threshold then the site or page itself is deemed low quality.

That’s my plain English translation of the patent.

Here is how the the patent itself describes itself:

“The system assigns the resources to resource quality groups (310). Each resource quality group is defined by a range of resource quality scores. The ranges can be non-overlapping. The system assigns each resource to the resource quality group defined by the range encompassing the resource quality score for the resource. In some implementations, the system assigns each resource to one of three groups, vital, good, and bad. Vital resources have the highest resource quality scores, good resource have medium resource quality scores, and bad resources have the lowest resource quality scores.”

Implied Links

The patent also describes something called an Implied Link. The concept of implied links must be explained before we proceed further.

There is an idea in the SEO community that Implied Links are unlinked citations. An unlinked citation is a URL that is not a link, a URL that cannot be clicked to visit the site. However, there are other definitions of an Implied Link.

A non-Google researcher named Ryan Rossi describes a Latent Link as a sort of virtual link. Latent means something that is hidden or can’t be readily seen. The paper is called, Discovering Latent Graphs with Positive and Negative Links to Eliminate Spam in Adversarial Information Retrieval

A latent link happens when site A links to Site B, and Site C links to Site A. So you have this: Site A > Site B > Site C. The implied link exists between Site A and Site C.

An illustration showing nodes interlinking to each otherThis is an illustration showing the link relationships that create a latent (or implied) link. The nodes labeled S represent spam sites. The nodes labeled N represent normal sites. The dotted lines are implied links. What’s notable is that there are no links from the normal sites to the spam sites.

Here’s what the non-Google research paper says:

“Latent relationships between sites are discovered based on the structure of the normal and spam communities.

… Automatic ranking of links where latent links are discovered… between the spam sites {S1, S2} and normal sites {N1, N2,N3} based on the fundamental structure of the two communities.

…The results provide significant evidence that our Latent Graph strongly favors normal sites while essentially eliminating spam sites and communities through the suppression of their links.”

The takeaway from the above is the concept of Latent Links, which can correspond with the concept of Implied Links.

Here is what the Google Patent says about Implied Links:

“A link can be an express link or an implied link. An express link exists where a resource explicitly refers to the site. An implied link exists where there is some other relationship between a resource and the site.”

If the Google patent author meant to say that the link was an unlinked URL, it’s not unreasonable to assume they would have said so. Instead, the author states that there is “some other relationship” between the “resource” (the linking site) and the website (the site that’s being linked to implicitly).

It’s my opinion that a likely candidate for an Implied Link is similar to what Ryan Rossi described as a Latent Link.

Link Quality Factors

Here are the quality factors that the patent named. Google does not generally say whether or not a patent or research is actually in use or how. And what is actually in use could possibly go beyond. Nevertheless, it’s useful to know that these factors were named in the patent and to then think about these link ranking factors when creating a link strategy.

Diversity Filtering

Diversity filtering is the process of identifying that a site has multiple incoming links from a single site. This algorithm will discard all the links from the linking site and use just one.

“Diversity filtering is a process for discarding resources that provide essentially redundant information to the link quality engine.

…the link quality engine can discard one of those resources and select a representative resource quality score for both of them. For example, the link quality engine can receive resource quality scores for both resources and discard the lower resource quality Score.”

The patent also goes on to say that it could also use a Site Quality Score to rank the link.

Boilerplate Links

The patent says that it has the option to not use what it calls “boilerplate” links. It uses navigational links as an example.

That appears to say that links from the navigation and possibly from a sidebar or footer that are repeated across the entire site will optionally not be counted. They may be discarded entirely.

This makes a lot of sense. A link is a vote for another site. In general a link that has a context and meaning is what is counted because they say something about the site they are linking to. There is no such semantic context in a sitewide link.

Links That Are Related

It’s not unusual for groups of sites within a niche to link to each other. This part of the patent describes a group of sites that seem to be linking to similar sites. This could be a statistical number that represents an unnatural amount of similar outbound links to the same sites.

The research paper doesn’t go into further detail. But this is, in my opinion a typical way of identifying related links and unraveling a spam network.

“…the system can determine that a group of candidate resources all belong to a same different site, e.g., by determining that the group of candidate resources are associated with the same domain name or the same Internet Protocol (IP) address, or that each of the candidate resources in the group links to a minimum number of the same sites.”

Links from Sites with Similar Content Context

This is an interesting example. If the links share the context of the content, the algorithm will discard it:

“In another example, the system can determine that a group of candidate resources share a same content context.”

…The system can then select one candidate resource from the group, e.g., the candidate resource having the highest resource quality score, to represent the group.”

Overview and Takeaways

This algorithm is described as “for enhancing search results.” This means that the ranking engine does it thing and then this algorithm steps in to rank the inbound links and lower the ranking scores of sites that have low quality scores.

An interesting feature is that this belongs to a class of algorithms that ranks links, not sites.

Classifying Sites as Low Quality Sites

Read the entire patent here.  And download the PDF version of the patent here.

More Resources

Images by Shutterstock, Modified by Author
Screenshots by Author, Modified by Author

Subscribe to SEJ

Get our daily newsletter from SEJ’s Founder Loren Baker about the latest news in the industry!


Source link

Continue Reading
Click to comment

You must be logged in to post a comment Login

Leave a Reply


Google extends optimization score to Display campaigns



The campaign optimization score that Google Ads shows for Search and Shopping campaigns is now available for Display campaigns. Scores will be available at the campaign level, and a combined account-level score now encompasses Search, Shopping and Display. You may also see recommendations tailored to Display campaigns.

Optimization scores in Google Ads are now available for Display campaigns.

What is Google Ads optimization score? Scores range from 0% to 100% and indicate how well your campaigns are expected to perform based on a number of factors such as targeting, bid automation, ads and extensions and more. The score is accompanied by a set of automated recommendations with indicators of how much of a score improvement you can expect to see by accepting them.

Why we should care. These scores and accompanying recommendations can be directionally helpful, but don’t accept the recommendations blindly. Carefully consider them and whether they are right for your campaign. And equally important on the flip side, an optimization score of 100% with no recommendations does not mean there aren’t plenty of opportunities for improvement.

About The Author

Ginny Marvin is Third Door Media’s Editor-in-Chief, running the day to day editorial operations across all publications and overseeing paid media coverage. Ginny Marvin writes about paid digital advertising and analytics news and trends for Search Engine Land, Marketing Land and MarTech Today. With more than 15 years of marketing experience, Ginny has held both in-house and agency management positions. She can be found on Twitter as @ginnymarvin.

Continue Reading


Google Search Console messages can now be viewed without leaving reports



Messages within Google Search Console are now accessible through the bell icon at the top of any page within Search Console, the company announced Wednesday. The updated interface now allows site owners to view their messages from anywhere within the tool, without leaving reports.

Source: Google.

Why we care

Being able to reference messages without having to leave the report you’re viewing makes information more accessible and improves our workflow, which can facilitate better decision making.

The categorized messages (as seen in the example above) will also make it easier to locate communications pertaining to a specific issue.

More on the announcement

  • Messages are now categorized into types, such as Performance, Coverage, Enhancement types and so on.
  • When a user gains access to a new site in Search Console, they will be able to view all messages the site has previously received, dating back to May 23, 2019. Messages sent prior to that date can only be viewed in the legacy message list or in your personal inbox.
  • For the time being, old messages are still available in the “Legacy tools & reports” section of the sidebar.

About The Author

George Nguyen is an Associate Editor at Third Door Media. His background is in content marketing, journalism, and storytelling.

Continue Reading


How to breathe fresh life into evergreen content (and get fresh traffic, too)



NEW YORK — Creating content can do wonders for your brand, but not if it goes unseen. A staggering 90% of the content in existence today has been created within the last two years, yet 91% of content gets no traffic from Google, said John Shehata, vice president of audience development strategy for Conde Nast, at SMX East in New York.

Investing in new content isn’t always the right choice for better content marketing. Sometimes, brands are better served by leveraging assets they already have or putting a fresh spin on an existing topic.

Old content, new traffic

“For the first 100 articles that we optimized, we saw a 210% increase in search traffic and our keyword coverage for that content increased by 900%,” said Shehata, explaining the results of his “Pinetree Initiative,” an experiment aimed at expanding existing content and merging underperforming content to increase organic visibility. “Once we refreshed the content, the traffic started increasing immediately. It went from like 100 visits to like 15,000–20,000 visits.”

(Don’t Miss SMX West in San Jose!)

“You’re reporting news or something trending, the traffic spikes out for like 24 to 48 hours, and it’s done, right?” Shehata said. “Versus evergreen content — that content can bring you traffic for a year plus.”

Content is considered evergreen if it remains relevant long after its publication. Tutorials, FAQ’s, in-depth guides, expert interviews and case studies are all examples of evergreen content.

In addition to providing more sustainable traffic to your site, evergreen content also insulates publishers from slow news cycles and can drive prospects to the top of the funnel, Shehata said.

However, news content can still be valuable and publishers should aim for a 60/40 split of both content types, in either direction, said Shehata. For example, if you’re a news publisher, 60% news and 40% evergreen content is more likely to resonate with your audience, as where an industry-based publication might publish 60% evergreen and 40% news content.

Refreshing evergreen content, step by step

Conde Nast’s search traffic and ranking keyword growth was made possible by a process that Shehata developed specifically for content refreshes. It begins with examining your own site, analyzing the search results pages for your target keywords, evaluating competing content, optimizing on-page content and publishing and promotion, as illustrated below.

1. Assess your existing content. Brands can begin their evergreen content refreshes by either selecting a topic and keywords or selecting a main page to refresh, said Shehata.

Whichever starting point you choose, the next thing you’ll need to do is identify all of your own competing pages that rank for the target keywords. Shehata does this by combining Google Sheets with various keyword research tool APIs to consolidate the URLs and relevant metrics into one place, giving him a better idea of the landscape of his content, which pages to avoid cannibalizing, which underperforming pages can be merged into more authoritative content and which relevant content can be included in your new evergreen article.

2. Research the results page. “Last year, we had this amazing page about celebrity homes, and it wasn’t getting any traffic at all,” Shehata said as an example of the importance of aligning with search intent.

“When we analyzed the SERPs for other types of content that are ranked for that topic, all of them were galleries. Google identified the intent for ‘celebrity homes’ as people watching galleries. So, we converted the page from an article format with a couple of images to a gallery with less content. And, guess what? Immediately ranked number two. So, the characteristics of the content are very important for the success of the SEO.”

Understanding the type of content search engines surface for specific queries can give publishers an idea of how to present their content so as to increase their chances of ranking well.

The difference in search intent between the queries “how to pack a suitcase” and “best carry on suitcase” manifests in the different types of results that surface.

In addition to the particular formats of content that make up the top organic results, you’ll also want to take note of any rich results that appear and ask yourself why they might be surfacing. For example, if a news carousel is present, is the topic news-driven, and if so, how will that affect your odds of ranking well?

Featured snippets, which often resolve a user’s query right on the search results page, may also provide you with information about the questions people are likely to ask on a given topic. Simple resources such as Google’s “People also ask” box can help you identify common questions to address, which yields opportunities to add more depth to your evergreen content, Shehata said.

3. Evaluate competing content. “If you are writing about how to boil an egg, and all the other sites that are ranking mention ‘eggshells,’ and ‘breakfast,’ and ‘easy,’ you may want to consider these topics to give you complete and in-depth coverage of your topic,” Shehata said.

Conducting a term frequency-inverse document frequency (TF-IDF) analysis is one method that may help you identify those “must-have” terms as well as the related entities that should be included in your refreshed evergreen content.

The next step in the process involves a more granular look at the pages that rank for your target keywords to determine what search engines consider to be a “right answer” for that type of query, Shehata said. As with the SERP analysis step, you’ll want to examine the way the content is presented, but also its length, publishing date and other commonalities for clues as to why the content might rank well.

4. Optimize on-page content. After collecting the above-mentioned information, it’s time to refresh the content by expanding the original article, merging it with other relevant, underperforming content and setting up redirects.

“When you refresh content, it should be at least 30% new,” Shehata said. A new title, introduction, publishing date and more new internal links should accompany your optimizations.

Once your evergreen content has been updated, look for internal linking opportunities amongst your existing articles. You’ll also want to loop in your social and email teams to make sure that the content that got refreshed is in their workflow. “It’s all the signals that tell Google this is new, refreshed content,” said Shehata.

During your content refresh process, pages with conversion goals, such as newsletter signups or affiliate links, attached to them may have been affected. This would be the time to clean up any loose ends by finding a way to implement them on your updated page.

5. Time to publish. For evergreen content pertaining to seasonal trends, aim to publish three months ahead of time to maximize your results, Shehata advised.

“In general, your refreshed, optimized content will last you at least a year, if not longer,” said Shehata. Should traffic start to substantially decline, it may be time to conduct another round of refreshes. Creating an editorial refresh calendar can also help keep you on track with future updates.

Quality content takes a considerable amount of resources to create. But, by finding creative ways to refresh or repurpose it, while striking a balance between evergreen and news content, you stand to maximize the efficacy of the content you do create and bolster traffic for your brand over the long haul.

About The Author

George Nguyen is an Associate Editor at Third Door Media. His background is in content marketing, journalism, and storytelling.

Continue Reading


Copyright © 2019 Plolu.