Connect with us


Link Distance Ranking Algorithms – Search Engine Journal



There is a kind of link algorithm that isn’t widely discussed, not nearly enough. This article is meant as introduction to link and link distance ranking algorithms. It’s something that may play a role in how sites are ranked. In my opinion it’s important to be aware of this.

Does Google Use This?

While the algorithm under consideration is from a patent that was filed by Google, Google’s official statement about patents and research papers is that they produce many of them and that not all of them are used and sometimes they are used in a way that is different than what is described.

That said, the details of this algorithm appear to resemble the contours of what Google has officially said about how it handles links.

Complexity of Calculations

There are two sections of the patent (Producing a Ranking for Pages Using Distances in a Web-link Graph) that state how complex the calculations are:

“Unfortunately, this variation of PageRank requires solving the entire system for each seed separately. Hence, as the number of seed pages increases, the complexity of computation increases linearly, thereby limiting the number of seeds that can be practically used.”

Hence, what is needed is a method and an apparatus for producing a ranking for pages on the web using a large number of diversified seed pages…”

The above points to the difficulty of making these calculations web wide because of the large number of data points. It states that breaking these down by topic niches the calculations are easier to compute.

What’s interesting about that statement is that the original Penguin algorithm was calculated once a year or longer. Sites that were penalized pretty much stayed penalized until the next seemingly random date that Google recalculated the Penguin score.

At a certain point Google’s infrastructure must have improved. Google is constantly building it’s own infrastructure but apparently doesn’t announce it. The Caffeine web indexing system is one of the exceptions.

Real-time Penguin rolled out in the fall of 2016.

It is notable that these calculations are difficult. It points to the possibility that Google would do a periodic calculation for the entire web, then assign scores based on the distances from the trusted sites to all the rest of the sites. Thus, one gigantic calculation, done a year.

So when a SERP is calculated via PageRank, the distance scores are also calculated. This sounds a lot like the process we know as the Penguin Algorithm.

“The system then assigns lengths to the links based on properties of the links and properties of the pages attached to the links. The system next computes shortest distances from the set of seed pages to each page in the set of pages based on the lengths of the links between the pages. Next, the system determines a ranking score for each page in the set of pages based on the computed shortest distances.”

What is the System Doing?

The system creates a score that is based on the shortest distance between a seed set and the proposed ranked pages. The score is used to rank these pages.

So it’s basically an overlay on top of the PageRank score to help weed out manipulated links, based on the theory that manipulated links will naturally have a longer distance of link connections between the spam page and the trusted set.

Ranking a web page can be said to consist of three processes.

  • Indexing
  • Ranking
  • Ranking Modification (usually related to personalization)

That’s an extreme reduction of the ranking process. There’s a lot more that goes on.

Interestingly, this distance ranking process happens during the ranking part of the process. Under this algorithm there’s no chance of ranking for meaningful phrases unless the page is associated with the seed set.

Here is what it says:

“One possible variation of PageRank that would reduce the effect of these techniques is to select a few “trusted” pages (also referred to as the seed pages) and discovers other pages which are likely to be good by following the links from the trusted pages.”

This is an important distinction, to know in what part of the ranking process the seed set calculation happens because it helps us formulate what our ranking strategy is going to be.

This is different from the Yahoo TrustRank thing. YTR was shown to be biased.

Majestic’s Topical TrustFlow can be said to be an improved version, similar to a research paper that demonstrated that by using a seed set that is organized by niche topics is more accurate. Research also showed that organizing a seed set algorithm by topic is several orders better than not doing so.

Thus, it makes sense that Google’s distance ranking algorithm also organizes it’s seed set by niche topic buckets.

As I understand this, this Google patent calculates distances between a seed set and assigns distance scores.

Reduced Link Graph

“In a variation on this embodiment, the links associated with the computed shortest distances constitute a reduced link-graph.”

What this means is that there’s a map of the Internet commonly known as the Link Graph and then there’s a smaller version the link graph populated by web pages that have had spam pages filtered out. Sites that primarily obtain links outside of the reduced link graph might never get inside. Dirty links thus get no traction.

What is a Reduced Link Graph?

I’ll keep this short and sweet. The link to the document follows below.

What you really need to know is this part:

“The early success of link-based ranking algorithms was predicated on the assumption that links imply merit of the target pages. However, today many links exist for purposes other than to confer authority. Such links bring noise into link analysis and harm the quality of retrieval.

In order to provide high quality search results, it is important to detect them and reduce their influence… With the help of a classifier, these noisy links are detected and dropped. After that, link analysis algorithms are performed on the reduced link graph.”

Read this PDF for more information about Reduced Link Graphs.

If you’re obtaining links from sites like news organizations, it may be fair to assume they are on the inside of the reduced link graph. But are they a part of the seed set? Maybe we should’t obsess over that.

Is This Why Google Says Negative SEO Doesn’t Exist?

“…the links associated with the computed shortest distances constitute a reduced link-graph”

A reduced link graph is different from a link graph. A link graph can be said to be a map of the entire Internet organized by the link relationships between sites, pages or even parts of pages.

Then there’s a reduced link graph, which is a map of everything minus certain sites that don’t meet specific criteria.

A reduced link graph can be a map of the web minus non-spam sites. The sites outside of the reduced link graph will have zero effect on the sites inside the link graph, because they’re on the outside.

That’s probably why a spam site linking to a normal site will not cause a negative effect on a non-spam site. Because the spam site is outside of the reduced link graph, it has no effect whatsoever. The link is ignored.

Could this be why Google is so confident that it’s catching link spam and that negative SEO does not exist?

Distance from Seed Set Equals Less Ranking Power?

I don’t think it’s necessary to try to map out what the seed set is.  What’s more important, in my opinion, is to be aware of topical neighborhoods and how that relates to where you get your links.

At one time Google used to publicly display a PageRank score for every page, so I can remember what kinds of sites tended to have low scores. There are a class of sites that have low PageRank and low Moz DA, but they are closely linked to sites that in my opinion are likely a few clicks away from the seed set.

What Moz DA is measuring is an approximation of a site’s authority. It’s a good tool. However, what Moz DA is measuring may not be a distance from a seed set, which cannot be known because it’s a Google secret.

So I’m not putting down the Moz DA tool, keep using it. I’m just suggesting you may want to expand your criteria and definition of what a useful link may be.

What Does it Mean to be Close to a Seed Set?

From a Stanford university classroom document, page 17 asks, What is a good notion of proximity? The answers are:

  •  Multiple connections
  • Quality of connection
  • Direct & Indirect connections
  • Length, Degree, Weight

That is an interesting consideration.


There are many people who are worried about anchor text ratios, DA/PA of inbound links, but I think those considerations are somewhat old.

The concern with DA/PA is a throwback to the hand-wringing about obtaining links from pages with a PageRank of 4 or more, which was a practice that began from a randomly chosen PageRank score, the number four.

When we talk about or think about when considering links in the context of ranking, it may be useful to consider distance ranking as a part of that conversation.

Read the patent here

Images by Shutterstock, Modified by Author

Source link

Continue Reading
Click to comment

You must be logged in to post a comment Login

Leave a Reply


BrightLocal launches ‘Local RankFlux’ Google local algorithm tracking tool



BrightLocal has launched a new free tool called “Local RankFlux,” designed to alert marketers to changes in local search rankings across multiple industries.

Exclusively focused on the Google local algorithm, it offers tracking for 26 verticals. The ranking fluctuations of individual industries can then be compared to the overall sample.

Tracking over 14,000 keywords. Local RankFlux tracks roughly 560 keywords per industry vertical in 20 cities, according to BrightLocal’s blog post. It “plots the ranking position of each business in the top 20 search results and compares that ranking to the previous day’s position to determine the daily change.” 

Source: BrightLocal

Changes in higher SERP positions (e.g., 1 – 2) are weighted more heavily and are treated as more significant than changes in lower rankings (e.g., 19 – 20) in its scoring. “Local RankFlux then multiplies the change in position between today’s and yesterday’s rankings by the weighting to create a total daily fluctuation. This total is then converted into an average based on the number of keywords that returned meaningful results^ and a score produced for All Industries and for each individual industry.”

Scores above 6 suggest an update. BrightLocal explains that scores between 0 – 3 indicate nothing meaningful has happened – given that there are regular, even daily fluctuations going on. Scores of more than 3 but less than 6 indicate a minor change in the algorithm, according to BrightLocal, while scores of 6 to 10 suggest a local algorithm update. The spike in the chart below (industry average of 6.1) on August 8 suggests a meaningful change in the algorithm.

Local RankFlux score: legal category vs industry average

Source: BrightLocal

In early August Google made a core algorithm update. But the last time there was a significant local impact was in August of last year (and possibly in June, 2019 after another core update). In August 2018, SterlingSky’s Joy Hawkins detailed the ways in which her small business customers were impacted by that 2018 core algorithm update.

Why we should care. This free tool will be a useful way for local SEOs to reality check against broader industry benchmarks, to confirm whether there was indeed a local algorithm update. Informally, a number of local SEOs praised the tool based on early exposure.

Take a look and provide feedback on whether it aligns with your observations and experiences. And be sure not to miss SMX East’s full–day track on local SEO and location-based marketing for brands.

About The Author

Greg Sterling is a Contributing Editor at Search Engine Land. He writes about the connections between digital and offline commerce. He previously held leadership roles at LSA, The Kelsey Group and TechTV. Follow him Twitter or find him on LinkedIn.

Continue Reading


Google’s John Mueller on Where to Insert JSON-LD Structured Data



In the latest instalment of the #AskGoogleWebmasters video series, Google’s John Mueller answers a common question about JSON-LD structured data.

Here is the question that was submitted:

“Is it possible to insert JSON structured data at the bottom of theinstead of the? It seems to work fine for many websites.”

In response, Mueller says “yes.” JSON-LD structured data can absolutely be inserted in either the head or body of the page. Just as the person who submitted the question assumed – it will work fine either way.

JSON-LD can also be inserted into pages using JavaScript, if that’s what happens to suit your pages better.

What’s the Difference Between JSON-LD and Other Structured Data Types?

Before answering the question, Mueller gave a brief explanation of each type of structured data and how they’re different from each other.

There are two other types of structured data in addition to JSON-LD. Here are the differences between each of them.

  • JSON-LD: A JavaScript notation embedded in a script tag in the page head or body.
  • Microdata: An open-community HTML mspecification used to nest structured data within HTML content.
  • RDFA: An HTML5 extension that supports link data through additional attributes added to existing HTML tags on the page.

Although all of these types of structured data are acceptable to use, Mueller has gone on record saying Google prefers the use of JSON-LD.

Continue Reading


Subdomain leasing and the giant hole in Google’s Medic update



ConsumerAffairs provides buying guides for everything from mattresses to home warranties. But they also direct consumers on purchasing hearing aids, dentures, diabetic supplies, and even lasik surgery. Many have questioned the legitimacy of ConsumerAffairs buying guides, largely because top-rated brands often have financial relationships with the organization. ConsumerAffairs’ health content has been hit in the post-medic world, but now it seems they’ve found a way to circumvent the algorithm update by hosting slightly modified versions of their buying guides on local news websites around the country. Google “hearing aids in Phoenix” and you’ll discover just how well this strategy is working. Local ABC affiliate station ABC15 hosts all of ConsumerAffairs’ buying guides, including those in the health category, on their new “reviews” subdomain. So far, I’ve counted almost 100 of these ConsumerAffairs content mirrors. Despite cracking down on low-authority medical advice and subdomain leasing, Google seems to be missing this huge hack on their ranking algorithm.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

About The Author

Abram Bailey, AuD is a Doctor of Audiology and the founder of, the leading independent resource for informed hearing aid consumers.

Continue Reading


Copyright © 2019 Plolu.