Connect with us


Link Distance Ranking Algorithms – Search Engine Journal



There is a kind of link algorithm that isn’t widely discussed, not nearly enough. This article is meant as introduction to link and link distance ranking algorithms. It’s something that may play a role in how sites are ranked. In my opinion it’s important to be aware of this.

Does Google Use This?

While the algorithm under consideration is from a patent that was filed by Google, Google’s official statement about patents and research papers is that they produce many of them and that not all of them are used and sometimes they are used in a way that is different than what is described.

That said, the details of this algorithm appear to resemble the contours of what Google has officially said about how it handles links.

Complexity of Calculations

There are two sections of the patent (Producing a Ranking for Pages Using Distances in a Web-link Graph) that state how complex the calculations are:

“Unfortunately, this variation of PageRank requires solving the entire system for each seed separately. Hence, as the number of seed pages increases, the complexity of computation increases linearly, thereby limiting the number of seeds that can be practically used.”

Hence, what is needed is a method and an apparatus for producing a ranking for pages on the web using a large number of diversified seed pages…”

The above points to the difficulty of making these calculations web wide because of the large number of data points. It states that breaking these down by topic niches the calculations are easier to compute.

What’s interesting about that statement is that the original Penguin algorithm was calculated once a year or longer. Sites that were penalized pretty much stayed penalized until the next seemingly random date that Google recalculated the Penguin score.

At a certain point Google’s infrastructure must have improved. Google is constantly building it’s own infrastructure but apparently doesn’t announce it. The Caffeine web indexing system is one of the exceptions.

Real-time Penguin rolled out in the fall of 2016.

It is notable that these calculations are difficult. It points to the possibility that Google would do a periodic calculation for the entire web, then assign scores based on the distances from the trusted sites to all the rest of the sites. Thus, one gigantic calculation, done a year.

So when a SERP is calculated via PageRank, the distance scores are also calculated. This sounds a lot like the process we know as the Penguin Algorithm.

“The system then assigns lengths to the links based on properties of the links and properties of the pages attached to the links. The system next computes shortest distances from the set of seed pages to each page in the set of pages based on the lengths of the links between the pages. Next, the system determines a ranking score for each page in the set of pages based on the computed shortest distances.”

What is the System Doing?

The system creates a score that is based on the shortest distance between a seed set and the proposed ranked pages. The score is used to rank these pages.

So it’s basically an overlay on top of the PageRank score to help weed out manipulated links, based on the theory that manipulated links will naturally have a longer distance of link connections between the spam page and the trusted set.

Ranking a web page can be said to consist of three processes.

  • Indexing
  • Ranking
  • Ranking Modification (usually related to personalization)

That’s an extreme reduction of the ranking process. There’s a lot more that goes on.

Interestingly, this distance ranking process happens during the ranking part of the process. Under this algorithm there’s no chance of ranking for meaningful phrases unless the page is associated with the seed set.

Here is what it says:

“One possible variation of PageRank that would reduce the effect of these techniques is to select a few “trusted” pages (also referred to as the seed pages) and discovers other pages which are likely to be good by following the links from the trusted pages.”

This is an important distinction, to know in what part of the ranking process the seed set calculation happens because it helps us formulate what our ranking strategy is going to be.

This is different from the Yahoo TrustRank thing. YTR was shown to be biased.

Majestic’s Topical TrustFlow can be said to be an improved version, similar to a research paper that demonstrated that by using a seed set that is organized by niche topics is more accurate. Research also showed that organizing a seed set algorithm by topic is several orders better than not doing so.

Thus, it makes sense that Google’s distance ranking algorithm also organizes it’s seed set by niche topic buckets.

As I understand this, this Google patent calculates distances between a seed set and assigns distance scores.

Reduced Link Graph

“In a variation on this embodiment, the links associated with the computed shortest distances constitute a reduced link-graph.”

What this means is that there’s a map of the Internet commonly known as the Link Graph and then there’s a smaller version the link graph populated by web pages that have had spam pages filtered out. Sites that primarily obtain links outside of the reduced link graph might never get inside. Dirty links thus get no traction.

What is a Reduced Link Graph?

I’ll keep this short and sweet. The link to the document follows below.

What you really need to know is this part:

“The early success of link-based ranking algorithms was predicated on the assumption that links imply merit of the target pages. However, today many links exist for purposes other than to confer authority. Such links bring noise into link analysis and harm the quality of retrieval.

In order to provide high quality search results, it is important to detect them and reduce their influence… With the help of a classifier, these noisy links are detected and dropped. After that, link analysis algorithms are performed on the reduced link graph.”

Read this PDF for more information about Reduced Link Graphs.

If you’re obtaining links from sites like news organizations, it may be fair to assume they are on the inside of the reduced link graph. But are they a part of the seed set? Maybe we should’t obsess over that.

Is This Why Google Says Negative SEO Doesn’t Exist?

“…the links associated with the computed shortest distances constitute a reduced link-graph”

A reduced link graph is different from a link graph. A link graph can be said to be a map of the entire Internet organized by the link relationships between sites, pages or even parts of pages.

Then there’s a reduced link graph, which is a map of everything minus certain sites that don’t meet specific criteria.

A reduced link graph can be a map of the web minus non-spam sites. The sites outside of the reduced link graph will have zero effect on the sites inside the link graph, because they’re on the outside.

That’s probably why a spam site linking to a normal site will not cause a negative effect on a non-spam site. Because the spam site is outside of the reduced link graph, it has no effect whatsoever. The link is ignored.

Could this be why Google is so confident that it’s catching link spam and that negative SEO does not exist?

Distance from Seed Set Equals Less Ranking Power?

I don’t think it’s necessary to try to map out what the seed set is.  What’s more important, in my opinion, is to be aware of topical neighborhoods and how that relates to where you get your links.

At one time Google used to publicly display a PageRank score for every page, so I can remember what kinds of sites tended to have low scores. There are a class of sites that have low PageRank and low Moz DA, but they are closely linked to sites that in my opinion are likely a few clicks away from the seed set.

What Moz DA is measuring is an approximation of a site’s authority. It’s a good tool. However, what Moz DA is measuring may not be a distance from a seed set, which cannot be known because it’s a Google secret.

So I’m not putting down the Moz DA tool, keep using it. I’m just suggesting you may want to expand your criteria and definition of what a useful link may be.

What Does it Mean to be Close to a Seed Set?

From a Stanford university classroom document, page 17 asks, What is a good notion of proximity? The answers are:

  •  Multiple connections
  • Quality of connection
  • Direct & Indirect connections
  • Length, Degree, Weight

That is an interesting consideration.


There are many people who are worried about anchor text ratios, DA/PA of inbound links, but I think those considerations are somewhat old.

The concern with DA/PA is a throwback to the hand-wringing about obtaining links from pages with a PageRank of 4 or more, which was a practice that began from a randomly chosen PageRank score, the number four.

When we talk about or think about when considering links in the context of ranking, it may be useful to consider distance ranking as a part of that conversation.

Read the patent here

Images by Shutterstock, Modified by Author

Source link

Continue Reading
Click to comment

You must be logged in to post a comment Login

Leave a Reply


Google Search Console image search reporting bug June 5-7



Google posted a notice that between the dates of June 5 through June 7, it was unable to capture data around image search traffic. This is just a reporting bug and did not impact actual search traffic, but the Search Console performance report may show drops in image search traffic in that date range.

The notice. The notice read, “June 5-7: Some image search statistics were not captured during this period due to an internal issue. Because of this, you may see a drop in your image search statistics during this period. The change did not affect user Search results, only the data reporting.”

How do I see this? If you login to Google Search Console, click into your performance report and then filter by clicking on the “search type” filter. You can then select image from the filters.

Here is a screen shot of this filter:

How To Filter By Image Traffic in Google Search Console

Why we should care. If your site gets a lot of Google Image search traffic, you may notice a dip in your traffic reporting within Google Search Console. You may have not noticed a similar dip in your other analytics tools. That being said, Google said this is only a reporting glitch within Google Search Console and did not impact your actual traffic to your web site.

About The Author

Barry Schwartz is Search Engine Land’s News Editor and owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on SEM topics.

Continue Reading


Facebook Changes Reach of Comments in News Feed



Facebook announced a change to it’s algorithms that will affect the reach of comments on a post. Comments that have specific quality signals will  be highly ranked. Low quality comment practices may result in less reach.

Comment Ranking in News Feeds

Facebook noted that not only are posts ranked in news feeds but comments are also ranked as well.

Posts with comments that have positive quality signals will be seen by more people. Posts with low quality signals will have their news feed reach reduced.

Facebook Comment-Quality Signals

Facebook noted that their updated comment algorithm has four features:

  1. Integrity signals
  2. User indicated preferences
  3. User interaction signals
  4. Moderation signals

Integrity Signals

Integrity Signals are a measure of authenticity. Comments that violate community standards or fall into engagement-bait are negative signals. Violations of community standards are said to be removed.

Engagement Bait

Facebook engagement bait is a practice that has four features:

1. React Baiting

Encouraging users to react to your post

2. Follow and Share Baiting

This is described as telling visitors to like, share or subscribe.

3. Comment Baiting

Encouraging users to comment with a letter or number are given as examples.

. Monetization Baiting

This is described as asking for “stars” in exchange for something else, which could include something trivial like “doing push ups.”

User Indicated Preferences

This is a reference to user polls that Facebook conducts in order to understand what users say they wish to see in comments.

User Interaction Signals

These are signals related to whether users interact with a post.

Moderation Signals

This is a reference to how users hide or delete comments made in their posts.

Here is how Facebook describes it:

“People can moderate the comments on their post by hiding, deleting, or engaging with comments.

Ranking is on by default for Pages and people with a a lot of followers, but Pages and people with a lot of followers can choose to turn off comment ranking.

People who don’t have as many followers will not have comment ranking turned on automatically since there are less comments overall, but any person can decide to enable comment ranking by going to their settings. (See more details here.) “

Facebook Targeting Low Quality Comments

One of the stated goals of this update is to hide low quality posts from people’s Facebook feeds and to promote high quality posts by people you might know.

This is how Facebook described it:

“To improve relevance and quality, we’ll start showing comments on public posts more prominently when:

  • The comments have interactions from the Page or person who originally posted; or
  • The comments or reactions are from friends of the person who posted.”

Read Facebook’s announcement here: Making Public Comments More Meaningful

Images by Shutterstock, Modified by Author


Continue Reading


Build your PPC campaigns with this mini campaign builder script for Google Ads



Need to quickly build a campaign or add keywords to an existing one? This script will do the work for you!

All you need to do is input a few keywords and headlines in a spreadsheet and BAM! You’ve got yourself the beginnings of a great campaign.

I’m a firm believer in Single Keyword per Ad Group (SKAG) structure – it increases ad/keyword relevance and therefore improves quality score, makes CPCs cheaper, gets you a higher ad rank and a better CTR.

Sadly, building out SKAG structures is a pretty time-consuming endeavor. You can’t implement millions of keywords and ads without PPC tech powering your builds.

But if a client just needs a couple of new keywords after updating their site with new content, this script is a quick and easy solution.

And that’s exactly what I love about PPC. There’s a special place in my heart for simple scripts anyone can use to achieve tasks that are otherwise repetitive or near-impossible.

What does the script do?

This tool will save a lot of time with small-scale builds where you know exactly which keywords and ad copy you need, for example when you’re adding a few keywords to an existing campaign.

You input your campaign name, keywords, headlines, descriptions, paths and final URL, and it will output three tabs for you: one with keyword combinations, one with negatives, and ads to upload to Google Ads Editor.

It creates one exact and one broad match modifier campaign and creates a list of keywords as exact negatives in the broad campaign to make sure that search terms that match exactly will go through the exact keyword.

I’m sure you’re dying to give it a whirl, so let’s get cracking!

How do you use it?

Make a copy of this spreadsheet (note: you’ll need to authorize the script to run). You’ll find all the instructions there as a future reminder.

Once you’ve got the spreadsheet ready, input the following:

  • The campaign name
  • The campaign name delimiter to distinguish between broad and exact campaigns
  • Headline 1 (if this cell is not specified, then it will be the same as the keyword)
  • Headline 2
  • Optionally, headline 3
  • Description 1
  • Optionally, description 2
  • Optionally, path 1 and path 2
  • The final URL
  • The keywords (you can keep going outside of the box with these!)

You’ll see a handy character counter which will go red if you exceed the character limit. Bear in mind that this tool will assume that you’re using it correctly and so you’ll need to make sure that you’re staying within the limit!

You can also optionally create a second ad variant by choosing the part of your text you want to vary (e.g., headline 2 or description 2) and inputting the copy. Otherwise, just select “None” from the dropdown menu.

Once you’re done, click the gigantic “Go!” Button, and wait for the magic to happen.

It will generate three tabs labelled “Keywords,” “Negatives” and “Ads.” If you want to run the script again with different keywords, make sure you save these tabs elsewhere or rename them to prevent the script from overriding them.

Finally, you can paste these tabs into Editor and update all the relevant settings and adjustments. Job done!

DOWNLOAD: You’ll need to authorize the script to run after you make a copy of this spreadsheet.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

About The Author

Daniel Gilbert is the CEO at Brainlabs, the best paid media agency in the world (self-declared). He has started and invested in a number of big data and technology startups since leaving Google in 2010.

Continue Reading


Copyright © 2019 Plolu.