Connect with us


Google’s Site Quality Algorithm Patent



A Google patent describes a method of classifying sites as low quality by ranking the links. The algorithm patent is called, Classifying Sites as Low Quality Sites.  The patent names specific factors for identifying low quality sites.

It’s worthwhile to learn these factors and consider them. There’s no way to know if they are in use. But the factors themselves can help improve SEO practices, regardless if Google is using the algorithm or not.

An Obscure Link Algorithm

This patent dates from 2012 to 2015. It corresponds to the time that Penguin was first released.

There have only been a few discussions of this algorithm. It has, in my opinion, not been discussed in the detail offered below. As a consequence, it seems that many people may not be aware of it.

I believe this is an important algorithm to understand. If any parts of it are in use, then it could impact the SEO process.

Just Because it’s Patented…

What must be noted in any discussions of patents or research papers is that just because it’s patented does not mean it’s in use. I would also like to point out that this patent dates from 2012 to 2015. This corresponds to the time period of the Penguin Algorithm.

There is no evidence that this is a part of the Penguin Algorithm. But it is interesting because it is one of the few link ranking algorithms we know about from Google. Not a site ranking algorithm, a link ranking algorithm. That quality makes this particular algorithm especially interesting.

Although this algorithm may or may not be use, I believe that it is worthwhile to understand what is possible. Knowing what is possible can help you better understand what is not possible or likely. And once you know that you are better able to spot bad SEO information.

How the Algorithm Ranks Links

The algorithm is called Classifying Sites as Low Quality. It works by ranking links, not the content itself. The underlying principle can be said to be that if the links to a site are low quality then the site itself must be low quality.

This algorithm may be resistant to spammy scraper links because it only comes into play after the ranking algorithm has done it’s work. It’s the ranking algorithm that includes Penguin and other link related algorithms. So once the ranking engine has ranked sites, the link data that this algorithm uses will likely be filtered and represent a reduced link graph. A reduced link graph is a map of the links to and from sites that have had all the spam connections removed.

The algorithm ranks the links according to three ranking scores. The patent calls these scores, “quality groups.”

The scores are named Vital, Good, and Bad.

Obviously, the Vital score is the highest, Good is medium and Bad is not good (so to speak!).

The algorithm will then take all the scores and compute a total score. If this score falls below a certain threshold then the site or page itself is deemed low quality.

That’s my plain English translation of the patent.

Here is how the the patent itself describes itself:

“The system assigns the resources to resource quality groups (310). Each resource quality group is defined by a range of resource quality scores. The ranges can be non-overlapping. The system assigns each resource to the resource quality group defined by the range encompassing the resource quality score for the resource. In some implementations, the system assigns each resource to one of three groups, vital, good, and bad. Vital resources have the highest resource quality scores, good resource have medium resource quality scores, and bad resources have the lowest resource quality scores.”

Implied Links

The patent also describes something called an Implied Link. The concept of implied links must be explained before we proceed further.

There is an idea in the SEO community that Implied Links are unlinked citations. An unlinked citation is a URL that is not a link, a URL that cannot be clicked to visit the site. However, there are other definitions of an Implied Link.

A non-Google researcher named Ryan Rossi describes a Latent Link as a sort of virtual link. Latent means something that is hidden or can’t be readily seen. The paper is called, Discovering Latent Graphs with Positive and Negative Links to Eliminate Spam in Adversarial Information Retrieval

A latent link happens when site A links to Site B, and Site C links to Site A. So you have this: Site A > Site B > Site C. The implied link exists between Site A and Site C.

An illustration showing nodes interlinking to each otherThis is an illustration showing the link relationships that create a latent (or implied) link. The nodes labeled S represent spam sites. The nodes labeled N represent normal sites. The dotted lines are implied links. What’s notable is that there are no links from the normal sites to the spam sites.

Here’s what the non-Google research paper says:

“Latent relationships between sites are discovered based on the structure of the normal and spam communities.

… Automatic ranking of links where latent links are discovered… between the spam sites {S1, S2} and normal sites {N1, N2,N3} based on the fundamental structure of the two communities.

…The results provide significant evidence that our Latent Graph strongly favors normal sites while essentially eliminating spam sites and communities through the suppression of their links.”

The takeaway from the above is the concept of Latent Links, which can correspond with the concept of Implied Links.

Here is what the Google Patent says about Implied Links:

“A link can be an express link or an implied link. An express link exists where a resource explicitly refers to the site. An implied link exists where there is some other relationship between a resource and the site.”

If the Google patent author meant to say that the link was an unlinked URL, it’s not unreasonable to assume they would have said so. Instead, the author states that there is “some other relationship” between the “resource” (the linking site) and the website (the site that’s being linked to implicitly).

It’s my opinion that a likely candidate for an Implied Link is similar to what Ryan Rossi described as a Latent Link.

Link Quality Factors

Here are the quality factors that the patent named. Google does not generally say whether or not a patent or research is actually in use or how. And what is actually in use could possibly go beyond. Nevertheless, it’s useful to know that these factors were named in the patent and to then think about these link ranking factors when creating a link strategy.

Diversity Filtering

Diversity filtering is the process of identifying that a site has multiple incoming links from a single site. This algorithm will discard all the links from the linking site and use just one.

“Diversity filtering is a process for discarding resources that provide essentially redundant information to the link quality engine.

…the link quality engine can discard one of those resources and select a representative resource quality score for both of them. For example, the link quality engine can receive resource quality scores for both resources and discard the lower resource quality Score.”

The patent also goes on to say that it could also use a Site Quality Score to rank the link.

Boilerplate Links

The patent says that it has the option to not use what it calls “boilerplate” links. It uses navigational links as an example.

That appears to say that links from the navigation and possibly from a sidebar or footer that are repeated across the entire site will optionally not be counted. They may be discarded entirely.

This makes a lot of sense. A link is a vote for another site. In general a link that has a context and meaning is what is counted because they say something about the site they are linking to. There is no such semantic context in a sitewide link.

Links That Are Related

It’s not unusual for groups of sites within a niche to link to each other. This part of the patent describes a group of sites that seem to be linking to similar sites. This could be a statistical number that represents an unnatural amount of similar outbound links to the same sites.

The research paper doesn’t go into further detail. But this is, in my opinion a typical way of identifying related links and unraveling a spam network.

“…the system can determine that a group of candidate resources all belong to a same different site, e.g., by determining that the group of candidate resources are associated with the same domain name or the same Internet Protocol (IP) address, or that each of the candidate resources in the group links to a minimum number of the same sites.”

Links from Sites with Similar Content Context

This is an interesting example. If the links share the context of the content, the algorithm will discard it:

“In another example, the system can determine that a group of candidate resources share a same content context.”

…The system can then select one candidate resource from the group, e.g., the candidate resource having the highest resource quality score, to represent the group.”

Overview and Takeaways

This algorithm is described as “for enhancing search results.” This means that the ranking engine does it thing and then this algorithm steps in to rank the inbound links and lower the ranking scores of sites that have low quality scores.

An interesting feature is that this belongs to a class of algorithms that ranks links, not sites.

Classifying Sites as Low Quality Sites

Read the entire patent here.  And download the PDF version of the patent here.

More Resources

Images by Shutterstock, Modified by Author
Screenshots by Author, Modified by Author

Subscribe to SEJ

Get our daily newsletter from SEJ’s Founder Loren Baker about the latest news in the industry!


Source link

Continue Reading
Click to comment

You must be logged in to post a comment Login

Leave a Reply


LinkedIn Users Can View All Sponsored Content From the Past 6 Months



LinkedIn pages will soon feature an ‘Ads’ tab showing all sponsored content an advertiser has run in the past six months.

The company says this change is being made in an effort to bring even greater transparency to ads on LinkedIn.

“At LinkedIn, we are committed to providing a safe, trusted, and professional environment where members can connect with each other, engage with relevant content, and grow their careers. Increased transparency to both our customers and members is critical to creating this trusted environment.”

While viewing ads in the new tab, users can click on the ads but the advertiser will not be charged.

Ad clicks from within the ‘Ads’ tab will not impact campaign reporting either.

From a marketing perspective, I see this as being an opportunity for competitor research.

Do you know a company who is killing it with LinkedIn advertising? View their ads tab to see if you can learn from what they’re doing.

Of course, the Ads tab will only show you what their ads look like.

It won’t reveal anything about how those ads are targeted or what the company’s daily budget is. But hey, it’s something.

LinkedIn says this is the first of many updates to come as the company furthers its effort to provide users with useful information about the ads they see.

The new Ads tab is rolling out globally over the next few weeks

Source link

Continue Reading


SEMrush expands to Amazon with Sellerly for product page testing



SEMrush is a popular competitive intelligence platform used by search marketers. The company, recently infused with $40 million in funding to expand beyond Google, Bing and Yahoo insights, has launched a new product called Sellerly specifically for Amazon sellers.

What is Sellerly? Announced Monday, Sellerly designed to give Amazon sellers the ability to split test product detail pages.

“By introducing Sellerly as a seller’s buddy in Amazon marketing, we hope to improve hundreds of existing Amazon sellers’ strategies,” said SEMrush Chief Strategy Officer Eugene Levin in a statement. “Sellerly split testing is only the first step here. We’ve already started to build a community around the new product, which is very important to us. We believe that by combining feedback from users with our leading technology and 10 years of SEO software experience, we will be able to build something truly exceptional for Amazon sellers.”

How does it work? Sellerly is currently free to use. Amazon sellers connect their Amazon accounts to the tool in order to manage their product pages. Sellers can make changes to product detail pages to test against the controls. Sellerly collects data in real time and sellers can then choose winners based on views and conversions.

Sellers can run an unlimited number of tests.

Why we should care. Optimized product detail pages on Amazon is a critical aspect of success on the platform. As Amazon continues to generate an increasing share of e-commerce sales for merchants big and small, and competition only increases, product page optimization becomes even more critical. Amazon does not support AB testing natively. Sellerly is not the first split test product for Amazon product pages to market. Splitly (paid), Listing Dojo (free) are two others that offer similar split testing services.

About The Author

Ginny Marvin is Third Door Media’s Editor-in-Chief, managing day-to-day editorial operations across all of our publications. Ginny writes about paid online marketing topics including paid search, paid social, display and retargeting for Search Engine Land, Marketing Land and MarTech Today. With more than 15 years of marketing experience, she has held both in-house and agency management positions. She can be found on Twitter as @ginnymarvin.

Source link

Continue Reading


Google on Domain Penalties that Don’t Expire



Google’s John Mueller was presented with a peculiar situation of a website with zero notifications of a manual action that cannot rank for it’s own brand name. Mueller analyzed the situation, thought it through, then appeared to reach the conclusion that maybe Google was keeping it from ranking.

This is a problem that has existed for a long time, from before Mueller worked at Google. It’s a penalty that’s associated with a domain that remains even if the domain is registered by a new buyer years later.

Description of the Problem

The site with a penalty has not received notices of a manual penalty.

That’s what makes it weird because, how can a site be penalized if it’s not penalized, right?

The site had an influx of natural links due to word of mouth popularity. Yet even with those links, the site cannot rank for it’s own name or a snippet of content from it’s home page.

Had those natural links or the content been a problem then Google would have notified the site owner.  So the problem is not with the links or the content.

Nevertheless, the site owner disavowed old inbound links from before he purchased the site but the site still did not rank.

Here is how the site owner described the problem:

“We bought the domain three years ago to have a brand called Girlfriend Collective, it’s a clothing company on the Shopify platform.

We haven’t had any… warnings from our webmaster tools that says we have any penalizations… So I was just wondering if there was any other underlying issues that you would know outside of that…

The domain is and the query would be Girlfriend Collective.

It’s been as high as the second page of the SERPs, but… we get quite a few search queries for our own branded terms… it will not show up.

My assumption was that before we bought it, it was a pretty spammy dating directory.”

John Mueller’s response was:

“I can double check to see from our side if there’s anything kind of sticking around there that you’d need to take care of…”

It appears as if Mueller is being circumspect in his answer and doesn’t wish to say that it might be a problem at Google. At this point, he’s still holding on to the possibility that there’s something wrong with the site. You can’t blame him because he probably gets this all the time, where someone thinks it’s Google but it’s really something wrong with the site.

Is There Something Wrong with the Domain Name?

I checked to see what it’s history was. It was linking to adult sites prior to 2004 and sometime in mid 2004 the domain switched it’s monetization strategy away from linking to adult sites to displaying Google ads as a parked domain.

A parked domain is a domain that does not have a website on it. It just has ads. People used to type domain names into the address field and sites like would monetize the “type-in” traffic with Google AdSense, usually with a service that shows ads on the site owner’s behalf in exchange for a percentage of the earnings.

The fact that it was linking to adult sites could be a factor that has caused Google to more or less blacklist and keep it from ranking.

Domain Related Penalties Have Existed for a Long Time

This has happened many times over the years. It used to be standard to check the background of a domain before purchasing it.

I remember the case of a newbie SEO who couldn’t rank for his own brand name. Another SEO who was more competent contacted Google on his behalf and Google lifted the legacy domain penalty.

The Search Query

Mueller referred to the search queries the site owner wanted to rank for as being “generic” and commented that ranking for those kinds of “generic” terms is tricky.

This is what John Mueller said:

“In general, when it comes to kind of generic terms like that, that’s always a bit tricky. But it sounds like you’re not trying to rank for like just… girlfriend. “

However the phrase under discussion was the company name, Girlfriend Collective, which is not a generic phrase.

It could be argued that the domain name is not relevant for the brand name. So perhaps Mueller was referencing the generic nature of the domain name when he commented on ranking for “generic” phrases?

I don’t understand why “generic” phrases entered into this discussion. The site owner answered Mueller to reinforce that he’s not trying to rank for generic phrases, that he just wants to rank for his brand name.

The search phrase the site owner is failing to rank for is Girlfriend Collective. Girlfriend Collective is not a generic keyword phrase.

Is the Site Poorly Optimized?

When you visit the website itself, the word Collective does not exist in the visible content.

The word “collective” is nowhere on the page, not even in the footer copyright. The word is there, but it’s in an image, it has to be in text for Google to recognize it for the regular search results.

That’s a considerable oversight to omit your own brand name from the website’s home page.

Screenshot of's footer

  • The brand name exists in the title tag and other meta data.
  • It does not exist in the visible content where it really matters.
  • The word collective is not a part of the domain name.

A reasonable case could be made that does not merit ranking for the brand name of Girlfriend Collective because the word collective only exists in the title tag of the home page, not on the page itself.

Google Does Not Even Rank it for Page Snippets

However that reasonable case falls apart upon closer scrutiny. If you take any content from the page and search with that snippet of content in Google, you’ll see that the domain name does not even rank for the content that is on it’s own page.

The site is fully indexed, but the content is not allowed to rank.

I searched for the following phrases but only found other pages and social media posts ranking in Google, not

  • “Five classic colors made from recycled water bottles.”
  • “A bunch of old water bottles have never looked so good.”

That first phrase, “Five classic colors…” doesn’t rank anywhere on Google for the first several pages.

But as you can see below, ranks #6 in Bing:

Screenshot of ranking in Bing.Bing has no trouble ranking Girlfriend Collective for a snippet of text taken from the home page. Google does not show it at all. This points to this issue being something to do with Google and not with the site itself.

Even though appears to fall short in its search optimization, that is not the problem. The problem is that Google is preventing any content from that domain from ranking.

The reason Google is preventing that content from ranking is because the domain was problematic in the past. At some point in its history it was filtered from ranking. It’s a Legacy Google Penalty.

Checking the snapshot of via shows that it was being used to promote adult websites prior to 2004.

This is what it looked like sometime in 2004 and onward. It appears to be a parked domain that is showing Google AdSense ads.

Screenshot of from 2004This is a snapshot of circa 2004. It wasn’t a directory as the site owner believed. Checking the HTML source code reveals that the page is displaying Google AdSense ads. That’s what a parked domain looked like.

Parked domains used to be able to rank. But at some point after 2004 Google stopped ranking those pages.

There’s no way to speculate if the domain received it’s penalty before 2004 or after.

Site Can’t Rank for it’s Own Brand Name

There are many reasons why a site can’t rank for it’s own domain name or words from it’s own pages. If you suspect that your site may be suffering from a legacy Google penalty, you can verify the previous content by checking is a non-profit that stores snapshots of what web pages look like. allows you to verify if your domain was previously used by someone else to host low quality content.

Unfortunately, Google does not provide a way to contact them to resolve this matter.

Bing Ranks for Girlfriend Collective

If there was a big problem with links or content on that was keeping it from ranking on Google, then it would very likely be apparent on Bing.

Bing and Google use different algorithms. But if there was something so massively wrong with Girlfriend Collective, whether site quality or a technical issue, there would be a high probability that the massive problem would keep it from ranking at Bing.

Bing has no problem ranking for it’s brand name:

Screenshot of Bing search results showing that it ranks in a normal mannerBing ranks in a normal manner. This may be proof that there is no major issue with the site itself. The problem may be at Google.

Google’s John Mueller Admits it Might be Google

After listening to how the site owner has spent three years waiting for the legacy domain penalty to drop off, three years of uploading disavows, three years of bidding on AdWords for it’s own brand name, John Mueller seemed to realize that the issue was not on the site owner’s side but on Google’s side.

This is what John Mueller offered:

“I need to take a look to see if there’s anything sticking around there because it does seem like the old domain was pretty problematic. So that… always makes it a little bit harder to turn it around into something reasonable.

But it feels like after a couple of years that should be possible. “

In the end, Mueller admitted that it might be something on Google’s side. However an issue that remains is that there is no solution for other publishers. This is not something a publisher can do on their own like a disavow. It’s something a Googler must be made aware of in order to fix.

Watch the Google Webmaster Hangout here

Screenshots by Author, Modified by Author

Source link

Continue Reading


Copyright © 2019 Plolu.