Connect with us

SEO

Change log: Google updated its Search Quality Rater Guidelines

Published

on


Google updated its Search Quality Rater Guidelines on September 5; the update was first brought to the attention of the SEO community via a tweet from SEM consultant Marie Haynes. This latest version places more emphasis on vetting news sources as well as YMYL content and its creators and expands the basis for which a rater might apply the lowest ratings to content that may potentially spread hate. The previous update occurred on May 16.

Below is an analysis of the changes. The May 16 version of the guidelines appears on the left-hand side of the screenshots, with the corresponding section of the latest version on the right-hand side.

2.3: Your Money or Your Life (YMYL) Pages

Addition of the word “topics,” broadening the content that this section may pertain to beyond web pages.

The previous “News articles or public/official Information pages Important for having an Informed citizenry” and the “Legal Information pages” sections have been reorganized and replaced with “News and current events” and “Civics, government, and law,” which now appear at the top of the list.

“News and current events” features examples of news that may not necessarily be considered YMYL. The term “voting” is explicitly mentioned in the new “Civics, government, and law” section.

“Shopping” and “Finance” are now two separate categories. “Shopping” YMYL content now includes “information about or services related to research or purchase of goods/services,” such as reviews.

“Groups of people” is also a new section; the description could be interpreted as pertaining to hate groups, solidarity groups or anything in between.

The “Other” section now also provides more examples of content that could be considered YMYL.

2.5.2 Finding Who is Responsible for the Website and Who Created the Content on the Page

This subsection of 2.5 (Understanding the Website) adds “Websites want users to be able to distinguish between content created by themselves versus content that was added by other users.” For users, being able discern between in-house and third-party content may affect the publisher’s reputation. For example, lyrics website Genius accused Google of stealing its content; only then did Google disclose that it licenses lyrics from third parties.

The final sentence of the subsection increases the responsibility borne by websites that syndicate content from just having to possess the appropriate licenses to now also potentially being held accountable for the quality of that licensed content.

2.6.1 Research on the Reputation of the Website or Creator of the Main Content

This subsection of 2.6 (Reputation of the Website or Creator of the Main Content) puts more emphasis on print media by replacing “newspaper website” with “newspaper (with an associated website).” It also allows search quality raters to take a track record of high-quality, original reporting into account — not just awards won by a publication.

4.6 Examples of High Quality Pages

The Page Quality (PQ) Rating and Explanation sections of three examples have been expanded. In line with the additions from 2.6.1 above, the news explanations now include having a positive reputation for objective reporting and investigative journalism.

Overall, the guidelines seem to be moving away from Pulitzers as the be all, end all of awards and are instructing raters to acknowledge other accolades as well.

5.0 Highest Quality Pages

Section 5.1 (Very High Quality Main Content) now expands YMYL topics outside the bounds of news articles and information pages. There are now criteria for news, artistic and informational content.

Section 5.2 (Very Positive Reputation) removes the reference to E-A-T and reminds raters to carefully check the reputation of YMYL content creators.

Section 5.3 (Very High Level of E-A-T) raises the overall bar for YMYL content, but acknowledges that standards for E-A-T will vary. It also adds video as a content source.

5.4 (Examples of Highest Quality Pages) now features two examples of high-quality news, with explanations that emphasize awards, high-quality main content, uniqueness, originality, depth and investigative journalism.

The “Highest: Entertainment” example is not present on the latest iteration of the guidelines.

The explanations for three high-quality video examples have been expanded to include uniqueness and originality.

6.7 Examples of Low Quality Pages

Four examples from this section now carry the YMYL label.

7.3 Pages That Potentially Spread Hate

“Criteria” has been removed and replaced with broader bases for applying the lowest page rating. There is also a stronger emphasis on groups.

11.0 Page Quality Rating FAQs

Google has made it more explicit to its raters that pages existing for the sake of artistic expression, humor, entertainment and the like are “all valid and valued page purposes,” and thus may not necessarily deserve a low quality rating because they do not serve more practical purposes.

12.9 Rating on Your Phone Issues

Google is no longer telling raters to assume, by default, that queries with device-specific results were issued on an Android device.

13.2.1 Examples of Fully Meets (FullyM) Result Blocks

The note once-attached to this explanation has been removed. This is in line with abandoning the assumption that device-specific results are coming from Android devices (mentioned above in 12.9).

13.5.1 Examples of Slightly Meets (SM) Result Blocks

The “ellen degeneres” example has been removed.

13.6 Fails to Meet (FailsM)

The “zoo atlanta” example has been removed.

14.6.1 Using the Upsetting-Offensive Flag

As with section 7.3, the word “criteria” has been removed, allowing for a wider bases with which to justify applying the Upsetting-Offensive flag. There is also additional emphasis on groups of people.


About The Author

George Nguyen is an Associate Editor at Third Door Media. His background is in content marketing, journalism, and storytelling.

Continue Reading
Click to comment

You must be logged in to post a comment Login

Leave a Reply

SEO

TripAdvisor says it blocked or removed nearly 1.5 million fake reviews in 2018

Published

on


The majority of consumers (80% – 90%) routinely consult reviews before buying something, whether online or off. The powerful influence of reviews on purchase behavior has spawned a cottage industry of fake-reviews, a problem that is growing on major sites such as Amazon, Google and Yelp, among other places.

Just over 2% of reviews submitted were fake. TripAdvisor is one of those other places, where reviews form the core of the company’s content and the principle reason consumers visit. How much of the review activity on TripAdvisor is fraudulent? In its inaugural TripAdvisor Transparency Report the company says that 2.1% of all reviews submitted to the site in 2018 were fake. (A total of 4.7% of all review submissions were rejected or removed for violating TripAdvisor’s review guidelines, which extend beyond fraud.)

Source: TripAdvisor Review Transparency Report

73% blocked by machine detection. Given the volume of review submissions TripAdvisor receives – more than 66 million in 2018 – that translates into roughly 1.4 million fake reviews. TripAdvisor says that 73% of those fake reviews were blocked before being posted, while the remainder of fake reviews were later removed. The company also says that it has “stopped the activity of more than 75 websites that were caught trying to sell reviews” since 2015.

TripAdvisor defines “fake review” as one “written by someone who is trying to unfairly manipulate a business’ average rating or traveler ranking, such as a staff member or a business’ competitor. Reviews that give an account of a genuine customer’s experience, even if elements of that account are disputed by the business in question, are not categorized as fake.”

The company uses a mix of machine detection, human moderation and community flagging to catch fraudulent reviews. The bulk of inauthentic reviews (91%) are fake positive reviews TripAdvisor says.

Most of the fake reviews that are submitted to TripAdvisor (91%) are "biased positive reviews."
Source: TripAdvisor Review Transparency Report

TripAdvisor says that the review fraud problem is global, with fake reviews originating in most countries. However, it said there was a higher percentage than average of fake reviews “originating from Russia.” By contrast, China is the source of many fake reviews on Amazon.

Punishing fake reviews. TripAdvisor has a number of penalties and punishments for review fraud. In the first instance of a business being caught posting or buying fake reviews, TripAdvisor imposes a temporary ranking penalty.

Upon multiple infractions, the company will impose a content ban that prevents the individual or individuals in question from posting additional reviews and content on the site. It also prevents the involved parties from creating new accounts to circumvent the ban.

In the most extreme cases, the company will apply a badge of shame (penalty badge) that warns consumers the business has repeatedly attempted to defraud them. This is effectively a kiss of death for the business. Yelp does something similar.

Why we should care. Consumer trust is eroding online. It’s incumbent upon major consumer destinations sites to police their reviews aggressively and prevent unscrupulous merchants from deceiving consumers. Yelp has been widely criticized for its “review filter” but credit the company for its long-standing efforts to protect the integrity of its content.

Google and Amazon, in particular, need to do much more to combat review spam and fraud. Hopefully TripAdvisor’s effort and others like it will inspire them to.


About The Author

Greg Sterling is a Contributing Editor at Search Engine Land. He writes about the connections between digital and offline commerce. He previously held leadership roles at LSA, The Kelsey Group and TechTV. Follow him Twitter or find him on LinkedIn.

Continue Reading

SEO

10 Key Checks for Assessing Crawl Hygiene

Published

on


When optimizing our websites for crawlability, our main goal is to make sure that search engines are spending their time on our most important pages so that they are regularly crawled and any new content can be found.

Each time Googlebot visits your website, it has a limited window in which to crawl and discover as many pages and links on your site as possible. When that limit is hit, it will stop.

The time it takes for your pages to be revisited depends on a number of different factors that play into how Google prioritizes URLs for crawling, including:

  • PageRank.
  • XML sitemap inclusion.
  • Position within the site’s architecture.
  • How frequently the page changes.
  • And more.

The bottom line is: your site only gets Googlebot’s attention for a finite amount of time with each crawl, which could be infrequent. Make sure that time is spent wisely.

It can be hard to know where to start when analyzing how well-optimized your site is for search engine crawlers, especially when you work on a large site with a lot of URLs to analyze, or work in a large company with a lot of competing priorities and outstanding SEO fixes to prioritize.

That’s why I’ve put together this list of top-level checks for assessing crawl hygiene to give you a starting point for your analysis.

1. How Many Pages Are Being Indexed vs. How Many Indexable Pages Are There on the Site?

Why This Is Important

This shows you how many pages on your site are available for Google to index, and how many of those pages Google was actually able to find and how many it determined were important enough to be indexed.

An indexability pie chart in DeepCrawlBar chart showing indexed pages in Google Search Console

2. How Many Pages Are Being Crawled Overall?

Why This Is Important

Comparing Googlebot’s crawl activity against the number of pages you have on your site can give you insights into how many pages Google either can’t access, or has determined aren’t enough of a priority to schedule to be crawled regularly.

Crawl stats line graph in Google Search ConsoleBar chart showing Googlebot crawling in Logz.io

3. How Many Pages Aren’t Indexable?

Why This Is Important

Spending time crawling non-indexable pages isn’t the best use of Google’s crawl budget. Check how many of these pages are being crawled, and whether or not any of them should be made available for indexing.

Bar chart showing non-indexable pages in DeepCrawl

4. How Many URLs Are Being Disallowed from Being Crawled?

Why This Is Important

This will show you how many pages you are preventing search engines from accessing on your site. It’s important to make sure that these pages aren’t important for indexing or for discovering further pages for crawling.

Bar chart showing pages blocked by the robots.txt in Google Search Console

5. How Many Low-Value Pages Are Being Indexed?

Why This Is Important

Looking at which pages Google has already indexed on your site gives an indication into the areas of the site that the crawler has been able to access.

For example, these might be pages that you haven’t included in your sitemaps as they are low-quality, but have been found and indexed anyway.

Bar chart showing pages indexed but not submitted in a sitemap in Google Search Console

6. How Many 4xx Error Pages Are Being Crawled?

Why This Is Important

It’s important to make sure that crawl budget isn’t being used up on error pages instead of pages that you want to have indexed.

Googlebot will periodically try to crawl 404 error pages to see whether the page is live again, so make sure you use 410 status codes correctly to show that pages are gone and don’t need to be recrawled.

A line graph showing broken pages in DeepCrawl

7. How Many Internal Redirects Are Being Crawled?

Why This Is Important

Each request that Googlebot makes on a site uses up crawl budget, and this includes any additional requests within each of the steps in a redirect chain.

Help Google crawl more efficiently and conserve crawl budget by making sure only pages with 200 status codes are linked to within your site, and reduce the number of requests being made to pages that aren’t final destination URLs.

Redirect chain report in DeepCrawl

8. How Many Canonical Pages Are There vs. Canonicalized Pages?

Why This Is Important

The number of canonicalized pages on your site gives an indication into how much duplication there is on your site. While canonical tags consolidate link equity between sets of duplicate pages, they don’t help crawl budget.

Google will choose to index one page out of a set of canonicalized pages, but to be able to decide which is the primary page, it will first have to crawl all of them.

Pie chart showing canonical pages in DeepCrawl

9. How Many Paginated or Faceted Pages Are Being Crawled?

Why This Is Important

Google only needs to crawl pages that include otherwise undiscovered content or unlinked URLs.

Pagination and facets are usually a source of duplicate URLs and crawler traps, so make sure that these pages that don’t include any unique content or links aren’t being crawled unnecessarily.

As rel=next and rel=prev are no longer supported by Google, ensure your internal linking is optimized to reduce reliance on pagination for page discovery.

Pie chart showing pagination breakdown in DeepCrawl

10. Are There Mismatches in Page Discovery Across Crawl Sources?

Why This Is Important

If you’re seeing pages being accessed by users through your analytics data that aren’t being crawled by search engines within your log file data, it could be because these pages aren’t as discoverable for search engines as they are for users.

By integrating different data sources with your crawl data, you can spot gaps where pages can’t be easily found by search engines.

Google’s two main sources of URL discovery are external links and XML sitemaps, so if you’re having trouble getting Google to crawl your pages, make sure they are included in your sitemap if they’re not yet being linked to from any other sites that Google already knows about and crawls regularly.

Bar chart showing crawl source gaps in DeepCrawl

To Sum Up

By running through these 10 checks for your websites that you manage, you should be able to get a better understanding of the crawlability and overall technical health of a site.

Once you identify areas of crawl waste, you can instruct Google to crawl less of those pages by using methods like disallowing them in robots.txt.

You can then start influencing it to crawl more of your important pages by optimizing your site’s architecture and internal linking to make them more prominent and discoverable.

More Resources:


Image Credits

All screenshots taken by author, September 2019



Continue Reading

SEO

Google explains why syndicators may outrank original publishers

Published

on


Last week we reported that Google has updated its algorithms to give original reporting preferred ranking in Google search. So when John Shehata, VP of Audience Growth at Condé Nast, a major publishing company, posted on Twitter that Yahoo is outranking the original source of the article, Google took notice.

The complaint. Shehata posted on Twitter, “Recently I see a lot of instances where Google Top Stories ranking syndicated content from Yahoo above or instead of original content. This is disturbing especially for publishers. Yahoo has no canonicals back to original content but sometimes they link back.”

As you can see, he provided screen shots of this happening as evidence.

No canonical. John also mentioned that Yahoo, who is legally syndicating the content on behalf of Conde Nast, is not using a canonical tag to point back to the original source. Google’s recommendation for those allowing others to syndicate content is to have a clause requiring syndicators must use the canonical tag to point back to the source the site is syndicating from. Using this canonical tag indicate to Google which article page is the original source.

The issue. Sometimes those who license content, the syndicators, post the content before or at the same time as the source they are syndicating it from. That makes it hard for Google or other search engines to know which is the original source. That is why Google wrote, “Publishers that allow others to republish content can help ensure that their original versions perform better in Google News by asking those republishing to block or make use of canonical. Google News also encourages those that republish material to consider proactively blocking such content or making use of the canonical, so that we can better identify the original content and credit it appropriately.”

Google’s response. Google Search Liason Danny Sullivan responded on Twitter: “If people deliberately chose to syndicate their content, it makes it difficult to identify the originating source. That’s why we recommend the use of canonical or blocking. The publishers syndicating can require this.”

This affects both web and News results, Sullivan said. In fact, th original reporting algorithm update has not yet rolled out to Google News, it is just for web search currently:

Solution. If you allow people to syndicate your content, you should require them to use the canonical tag or make them block Google from indexing that content. Otherwise, do not always expect Google to be able to figure out where the article originated from, espesially when your syndication partners publish the story before or at the same time that you publish your story.

Why we care. While the original reporting change is interesting in this case, it is somewhat unrelated. If the same article is published on two different sites at the same time, both sites can appear to the search engines as the original source. If these sites are syndicating your content legally, review or update your contracts to require syndicators to either use canonical tags or block their syndicated content from indexing altogether. If syndicators are stealing your content and outranking you, Google should be better at dealing with that algorithmically, otherwise, you can file a DMCA takedown request with Google.


About The Author

Barry Schwartz is Search Engine Land’s News Editor and owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on SEM topics.

Continue Reading

Trending

Copyright © 2019 Plolu.