Connect with us

SEO

Majestic’s enhanced tool now gives SEOs a lot more useful context about backlinks

Published

on


All links aren’t created equal. We all know that but Majestic now provides much more insight into the precise context into why and where links appear.

Geeking out like never before. Earlier this week the company announced upgrades that allow SEOs to explore links in much more depth. The tool now organizes webpages into 40 segments and provides visibility into where backlinks physically appear on the page, what text surrounds them and internal and external link density. This helps more easily determine which links are more valuable and potentially uncovers new linkbuilding opportunities.

Dashoboard view of British Heart Foundation link context

Source: Majestic

Search/sort/filter. According to Dixon Jones, Majestic’s Global Brand Ambassador, the new capabilities allow SEOs to do things like:

  • Find all the backlinks to a domain where any keyword is used in the surrounding text.
  • Find “DoFollow” links that lie in on the top 20% of a web page.
  • Find links which are bundled tightly with other links, so likely to spam be directory listings.
  • Find links on high value pages which do not contain many other external links.

The tool offers numerous ways to search, sort and filter the data to get to some pretty precise and esoteric results.

Link context visualization. One particularly noteworthy innovation is Majestic’s link context chart that helps visualize all this information, and for which they are seeking a patent. Accordingly, the graphic below presents a link analysis of five different types of sites. The green box illustrates where the link appears on the page. The blue bars are internal links and the yellow represent external links.

Visualization of distinct types of links from different websites

Source: Majestic

Why we should care. Beyond being a helpful auditing tool, the enhancements should make it a more useful and effective tool for link building. 

The video below offers an overview and visual product tour from Dixon Jones. There’s also an in-depth guide (.pdf) to the new capabilities that goes into significantly more detail than I have above.


About The Author

Greg Sterling is a Contributing Editor at Search Engine Land. He writes about the connections between digital and offline commerce. He previously held leadership roles at LSA, The Kelsey Group and TechTV. Follow him Twitter or find him on LinkedIn.

Continue Reading
Click to comment

You must be logged in to post a comment Login

Leave a Reply

SEO

TripAdvisor says it blocked or removed nearly 1.5 million fake reviews in 2018

Published

on


The majority of consumers (80% – 90%) routinely consult reviews before buying something, whether online or off. The powerful influence of reviews on purchase behavior has spawned a cottage industry of fake-reviews, a problem that is growing on major sites such as Amazon, Google and Yelp, among other places.

Just over 2% of reviews submitted were fake. TripAdvisor is one of those other places, where reviews form the core of the company’s content and the principle reason consumers visit. How much of the review activity on TripAdvisor is fraudulent? In its inaugural TripAdvisor Transparency Report the company says that 2.1% of all reviews submitted to the site in 2018 were fake. (A total of 4.7% of all review submissions were rejected or removed for violating TripAdvisor’s review guidelines, which extend beyond fraud.)

Source: TripAdvisor Review Transparency Report

73% blocked by machine detection. Given the volume of review submissions TripAdvisor receives – more than 66 million in 2018 – that translates into roughly 1.4 million fake reviews. TripAdvisor says that 73% of those fake reviews were blocked before being posted, while the remainder of fake reviews were later removed. The company also says that it has “stopped the activity of more than 75 websites that were caught trying to sell reviews” since 2015.

TripAdvisor defines “fake review” as one “written by someone who is trying to unfairly manipulate a business’ average rating or traveler ranking, such as a staff member or a business’ competitor. Reviews that give an account of a genuine customer’s experience, even if elements of that account are disputed by the business in question, are not categorized as fake.”

The company uses a mix of machine detection, human moderation and community flagging to catch fraudulent reviews. The bulk of inauthentic reviews (91%) are fake positive reviews TripAdvisor says.

Most of the fake reviews that are submitted to TripAdvisor (91%) are "biased positive reviews."
Source: TripAdvisor Review Transparency Report

TripAdvisor says that the review fraud problem is global, with fake reviews originating in most countries. However, it said there was a higher percentage than average of fake reviews “originating from Russia.” By contrast, China is the source of many fake reviews on Amazon.

Punishing fake reviews. TripAdvisor has a number of penalties and punishments for review fraud. In the first instance of a business being caught posting or buying fake reviews, TripAdvisor imposes a temporary ranking penalty.

Upon multiple infractions, the company will impose a content ban that prevents the individual or individuals in question from posting additional reviews and content on the site. It also prevents the involved parties from creating new accounts to circumvent the ban.

In the most extreme cases, the company will apply a badge of shame (penalty badge) that warns consumers the business has repeatedly attempted to defraud them. This is effectively a kiss of death for the business. Yelp does something similar.

Why we should care. Consumer trust is eroding online. It’s incumbent upon major consumer destinations sites to police their reviews aggressively and prevent unscrupulous merchants from deceiving consumers. Yelp has been widely criticized for its “review filter” but credit the company for its long-standing efforts to protect the integrity of its content.

Google and Amazon, in particular, need to do much more to combat review spam and fraud. Hopefully TripAdvisor’s effort and others like it will inspire them to.


About The Author

Greg Sterling is a Contributing Editor at Search Engine Land. He writes about the connections between digital and offline commerce. He previously held leadership roles at LSA, The Kelsey Group and TechTV. Follow him Twitter or find him on LinkedIn.

Continue Reading

SEO

10 Key Checks for Assessing Crawl Hygiene

Published

on


When optimizing our websites for crawlability, our main goal is to make sure that search engines are spending their time on our most important pages so that they are regularly crawled and any new content can be found.

Each time Googlebot visits your website, it has a limited window in which to crawl and discover as many pages and links on your site as possible. When that limit is hit, it will stop.

The time it takes for your pages to be revisited depends on a number of different factors that play into how Google prioritizes URLs for crawling, including:

  • PageRank.
  • XML sitemap inclusion.
  • Position within the site’s architecture.
  • How frequently the page changes.
  • And more.

The bottom line is: your site only gets Googlebot’s attention for a finite amount of time with each crawl, which could be infrequent. Make sure that time is spent wisely.

It can be hard to know where to start when analyzing how well-optimized your site is for search engine crawlers, especially when you work on a large site with a lot of URLs to analyze, or work in a large company with a lot of competing priorities and outstanding SEO fixes to prioritize.

That’s why I’ve put together this list of top-level checks for assessing crawl hygiene to give you a starting point for your analysis.

1. How Many Pages Are Being Indexed vs. How Many Indexable Pages Are There on the Site?

Why This Is Important

This shows you how many pages on your site are available for Google to index, and how many of those pages Google was actually able to find and how many it determined were important enough to be indexed.

An indexability pie chart in DeepCrawlBar chart showing indexed pages in Google Search Console

2. How Many Pages Are Being Crawled Overall?

Why This Is Important

Comparing Googlebot’s crawl activity against the number of pages you have on your site can give you insights into how many pages Google either can’t access, or has determined aren’t enough of a priority to schedule to be crawled regularly.

Crawl stats line graph in Google Search ConsoleBar chart showing Googlebot crawling in Logz.io

3. How Many Pages Aren’t Indexable?

Why This Is Important

Spending time crawling non-indexable pages isn’t the best use of Google’s crawl budget. Check how many of these pages are being crawled, and whether or not any of them should be made available for indexing.

Bar chart showing non-indexable pages in DeepCrawl

4. How Many URLs Are Being Disallowed from Being Crawled?

Why This Is Important

This will show you how many pages you are preventing search engines from accessing on your site. It’s important to make sure that these pages aren’t important for indexing or for discovering further pages for crawling.

Bar chart showing pages blocked by the robots.txt in Google Search Console

5. How Many Low-Value Pages Are Being Indexed?

Why This Is Important

Looking at which pages Google has already indexed on your site gives an indication into the areas of the site that the crawler has been able to access.

For example, these might be pages that you haven’t included in your sitemaps as they are low-quality, but have been found and indexed anyway.

Bar chart showing pages indexed but not submitted in a sitemap in Google Search Console

6. How Many 4xx Error Pages Are Being Crawled?

Why This Is Important

It’s important to make sure that crawl budget isn’t being used up on error pages instead of pages that you want to have indexed.

Googlebot will periodically try to crawl 404 error pages to see whether the page is live again, so make sure you use 410 status codes correctly to show that pages are gone and don’t need to be recrawled.

A line graph showing broken pages in DeepCrawl

7. How Many Internal Redirects Are Being Crawled?

Why This Is Important

Each request that Googlebot makes on a site uses up crawl budget, and this includes any additional requests within each of the steps in a redirect chain.

Help Google crawl more efficiently and conserve crawl budget by making sure only pages with 200 status codes are linked to within your site, and reduce the number of requests being made to pages that aren’t final destination URLs.

Redirect chain report in DeepCrawl

8. How Many Canonical Pages Are There vs. Canonicalized Pages?

Why This Is Important

The number of canonicalized pages on your site gives an indication into how much duplication there is on your site. While canonical tags consolidate link equity between sets of duplicate pages, they don’t help crawl budget.

Google will choose to index one page out of a set of canonicalized pages, but to be able to decide which is the primary page, it will first have to crawl all of them.

Pie chart showing canonical pages in DeepCrawl

9. How Many Paginated or Faceted Pages Are Being Crawled?

Why This Is Important

Google only needs to crawl pages that include otherwise undiscovered content or unlinked URLs.

Pagination and facets are usually a source of duplicate URLs and crawler traps, so make sure that these pages that don’t include any unique content or links aren’t being crawled unnecessarily.

As rel=next and rel=prev are no longer supported by Google, ensure your internal linking is optimized to reduce reliance on pagination for page discovery.

Pie chart showing pagination breakdown in DeepCrawl

10. Are There Mismatches in Page Discovery Across Crawl Sources?

Why This Is Important

If you’re seeing pages being accessed by users through your analytics data that aren’t being crawled by search engines within your log file data, it could be because these pages aren’t as discoverable for search engines as they are for users.

By integrating different data sources with your crawl data, you can spot gaps where pages can’t be easily found by search engines.

Google’s two main sources of URL discovery are external links and XML sitemaps, so if you’re having trouble getting Google to crawl your pages, make sure they are included in your sitemap if they’re not yet being linked to from any other sites that Google already knows about and crawls regularly.

Bar chart showing crawl source gaps in DeepCrawl

To Sum Up

By running through these 10 checks for your websites that you manage, you should be able to get a better understanding of the crawlability and overall technical health of a site.

Once you identify areas of crawl waste, you can instruct Google to crawl less of those pages by using methods like disallowing them in robots.txt.

You can then start influencing it to crawl more of your important pages by optimizing your site’s architecture and internal linking to make them more prominent and discoverable.

More Resources:


Image Credits

All screenshots taken by author, September 2019



Continue Reading

SEO

Google explains why syndicators may outrank original publishers

Published

on


Last week we reported that Google has updated its algorithms to give original reporting preferred ranking in Google search. So when John Shehata, VP of Audience Growth at Condé Nast, a major publishing company, posted on Twitter that Yahoo is outranking the original source of the article, Google took notice.

The complaint. Shehata posted on Twitter, “Recently I see a lot of instances where Google Top Stories ranking syndicated content from Yahoo above or instead of original content. This is disturbing especially for publishers. Yahoo has no canonicals back to original content but sometimes they link back.”

As you can see, he provided screen shots of this happening as evidence.

No canonical. John also mentioned that Yahoo, who is legally syndicating the content on behalf of Conde Nast, is not using a canonical tag to point back to the original source. Google’s recommendation for those allowing others to syndicate content is to have a clause requiring syndicators must use the canonical tag to point back to the source the site is syndicating from. Using this canonical tag indicate to Google which article page is the original source.

The issue. Sometimes those who license content, the syndicators, post the content before or at the same time as the source they are syndicating it from. That makes it hard for Google or other search engines to know which is the original source. That is why Google wrote, “Publishers that allow others to republish content can help ensure that their original versions perform better in Google News by asking those republishing to block or make use of canonical. Google News also encourages those that republish material to consider proactively blocking such content or making use of the canonical, so that we can better identify the original content and credit it appropriately.”

Google’s response. Google Search Liason Danny Sullivan responded on Twitter: “If people deliberately chose to syndicate their content, it makes it difficult to identify the originating source. That’s why we recommend the use of canonical or blocking. The publishers syndicating can require this.”

This affects both web and News results, Sullivan said. In fact, th original reporting algorithm update has not yet rolled out to Google News, it is just for web search currently:

Solution. If you allow people to syndicate your content, you should require them to use the canonical tag or make them block Google from indexing that content. Otherwise, do not always expect Google to be able to figure out where the article originated from, espesially when your syndication partners publish the story before or at the same time that you publish your story.

Why we care. While the original reporting change is interesting in this case, it is somewhat unrelated. If the same article is published on two different sites at the same time, both sites can appear to the search engines as the original source. If these sites are syndicating your content legally, review or update your contracts to require syndicators to either use canonical tags or block their syndicated content from indexing altogether. If syndicators are stealing your content and outranking you, Google should be better at dealing with that algorithmically, otherwise, you can file a DMCA takedown request with Google.


About The Author

Barry Schwartz is Search Engine Land’s News Editor and owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on SEM topics.

Continue Reading

Trending

Copyright © 2019 Plolu.