Connect with us

SEO

What Is a Google Penalty in SEO?

Published

on


Editor’s note: “Ask an SEO” is a weekly column by technical SEO experts Jenny Halasz and Kristine Schachinger. Come up with your hardest SEO question and fill out our form. You might see your answer in the next #AskanSEO post!


This week for “Ask An SEO”, we have a question from Quora:

“What is a Google Penalty in SEO?”

The term “penalty” is often used very loosely in SEO circles.

When a site experiences a downturn in traffic and visibility the SEO will often say that downturn was the result of a “penalty” whether that downturn was manually applied or algorithmically.

This use of the term, however, is not accurate.

The term “penalty” with regards to SEO, has a distinct and definitive meaning.

What Is a Penalty?

The only true penalty (officially) is a “manual action” from Google.

A manual action is when a Google human reviewer has looked at your website and dampened your visibility in the search engine result pages (SERPs) for violating the Webmaster Quality Guidelines in some manner.

Webmaster Quality Guidelines - Manual Action

You can be penalized for reasons that aren’t listed here, but most manual actions come from this list.

A penalty is not when your site has a downturn because Google has rolled out a general algorithmic update affecting many sites either positively or negatively.

There is no official name for this process. But when your site loses rankings it is often called an “algorithmic devaluation.”

It’s not a term that exactly rolls off the tongue, so likely why the term “penalty” is so often bandied about. However, penalties are site-specific.

So How Does a Penalty (Manual Action) Differ from an Algorithmic Devaluation?

If you have been in SEO for more than a month or so, you are sure to have heard of the dreaded manual action.

Manual actions are as named – manual actions applied to your site by Google, or more specifically a human at Google, rather than an algorithm.

This happens when Google notices you have violated their Webmaster Guidelines and they give you a “slap on the wrist” (or sometimes a full slap down), for what you have done “wrong” – in other words, a penalty.

The impact of these manually applied penalties on sites differ from site to site depending on the severity of the issue and the Webmaster Guideline that has been violated, but they can range from small downturn in the rankings on a few query terms to full site removal from the search results.

How Will You Know If You Have a Manual Action?

Because Google has applied the penalty directly to your site, they will inform you when it happens via Google Search Console.

This is one of the reasons why it is so important to have search console set up for your site. (I suggest using the URL Prefix not Domain Properties.)

“Google issues a manual action against a site when a human reviewer at Google has determined that pages on the site are not compliant with Google’s webmaster quality guidelines. Most manual actions address attempts to manipulate our search index. Most issues reported here will result in pages or sites being ranked lower or omitted from search results without any visual indication to the user.

If your site is affected by a manual action, we will notify you in the Manual Actions report and in the Search Console message center.”  – Google Search Console Help

The main difference between manual actions and algorithmic issues is that manual actions are all known and can be reviewed on Google’s support page for understanding and fixes.

With algorithmic changes, Google’s response is often – well there is nothing to fix. (Almost never true, by the way, but that is for another article.)

Here are the manual actions that can be applied by Google:

  • User-generated spam
  • Spammy free host
  • Structured data issue
  • Unnatural links to your site
  • Unnatural links from your site
  • Thin content with little or no added value
  • Cloaking and/or sneaky redirects
  • Pure spam
  • Cloaked images
  • Hidden text and/or keyword stuffing
  • AMP content mismatch
  • Sneaky mobile redirects

When you go to the site to review these items, Google has drop downs for each one that will tell you how to fix them.

These fixes, however, can be complicated.

Once you have identified your penalty and fixed it you can submit the site for a reconsideration request. This is how you ask Google to remove the manually applied action or penalty.

The ability to submit a reconsideration request also helps distinguish a manual action as a penalty.

While a manual action can be removed with a proper reconsideration request, an algorithmic devaluation cannot. For algorithm issues, you have to wait for the algorithm update to run again.

This was one of the core contentions with old Penguin (before it became “real time”). There was as long as two years between updates and in between sites could not regain their rankings.

With manual actions, all you have to do is fix and request removal.

If Google believes you have done what you should to fix the site and that you won’t violate the guidelines again, they will remove the penalty.

Why Are Algorithmic Devaluations Not Penalties?

This is pretty simple.

Algorithms can devalue sites and limit their visibility and traffic for not meeting the best practices of the Webmaster Guidelines and the Google ranking factor(s) the algorithm is supposed to target – but it can also up value the site and give it much broader visibility and traffic.

Because algorithms can make your site go up or down in terms of visibility and/or traffic, a devaluation cannot be classified as a penalty because, had your site been affected in the reverse, the term could not apply.

One final note: SEO professionals will probably never stop using the colloquial version of the term “penalty.”

It is much easier to tell a client they had a loss in visibility and traffic because of a penalty than it is to try to explain “algorithmic devaluations,” even if that is what they are actually.

More Resources:


Image Credits

Featured Image: Paulo Bobita
Screenshot taken by author, September 2019



Continue Reading
Click to comment

You must be logged in to post a comment Login

Leave a Reply

SEO

TripAdvisor says it blocked or removed nearly 1.5 million fake reviews in 2018

Published

on


The majority of consumers (80% – 90%) routinely consult reviews before buying something, whether online or off. The powerful influence of reviews on purchase behavior has spawned a cottage industry of fake-reviews, a problem that is growing on major sites such as Amazon, Google and Yelp, among other places.

Just over 2% of reviews submitted were fake. TripAdvisor is one of those other places, where reviews form the core of the company’s content and the principle reason consumers visit. How much of the review activity on TripAdvisor is fraudulent? In its inaugural TripAdvisor Transparency Report the company says that 2.1% of all reviews submitted to the site in 2018 were fake. (A total of 4.7% of all review submissions were rejected or removed for violating TripAdvisor’s review guidelines, which extend beyond fraud.)

Source: TripAdvisor Review Transparency Report

73% blocked by machine detection. Given the volume of review submissions TripAdvisor receives – more than 66 million in 2018 – that translates into roughly 1.4 million fake reviews. TripAdvisor says that 73% of those fake reviews were blocked before being posted, while the remainder of fake reviews were later removed. The company also says that it has “stopped the activity of more than 75 websites that were caught trying to sell reviews” since 2015.

TripAdvisor defines “fake review” as one “written by someone who is trying to unfairly manipulate a business’ average rating or traveler ranking, such as a staff member or a business’ competitor. Reviews that give an account of a genuine customer’s experience, even if elements of that account are disputed by the business in question, are not categorized as fake.”

The company uses a mix of machine detection, human moderation and community flagging to catch fraudulent reviews. The bulk of inauthentic reviews (91%) are fake positive reviews TripAdvisor says.

Most of the fake reviews that are submitted to TripAdvisor (91%) are "biased positive reviews."
Source: TripAdvisor Review Transparency Report

TripAdvisor says that the review fraud problem is global, with fake reviews originating in most countries. However, it said there was a higher percentage than average of fake reviews “originating from Russia.” By contrast, China is the source of many fake reviews on Amazon.

Punishing fake reviews. TripAdvisor has a number of penalties and punishments for review fraud. In the first instance of a business being caught posting or buying fake reviews, TripAdvisor imposes a temporary ranking penalty.

Upon multiple infractions, the company will impose a content ban that prevents the individual or individuals in question from posting additional reviews and content on the site. It also prevents the involved parties from creating new accounts to circumvent the ban.

In the most extreme cases, the company will apply a badge of shame (penalty badge) that warns consumers the business has repeatedly attempted to defraud them. This is effectively a kiss of death for the business. Yelp does something similar.

Why we should care. Consumer trust is eroding online. It’s incumbent upon major consumer destinations sites to police their reviews aggressively and prevent unscrupulous merchants from deceiving consumers. Yelp has been widely criticized for its “review filter” but credit the company for its long-standing efforts to protect the integrity of its content.

Google and Amazon, in particular, need to do much more to combat review spam and fraud. Hopefully TripAdvisor’s effort and others like it will inspire them to.


About The Author

Greg Sterling is a Contributing Editor at Search Engine Land. He writes about the connections between digital and offline commerce. He previously held leadership roles at LSA, The Kelsey Group and TechTV. Follow him Twitter or find him on LinkedIn.

Continue Reading

SEO

10 Key Checks for Assessing Crawl Hygiene

Published

on


When optimizing our websites for crawlability, our main goal is to make sure that search engines are spending their time on our most important pages so that they are regularly crawled and any new content can be found.

Each time Googlebot visits your website, it has a limited window in which to crawl and discover as many pages and links on your site as possible. When that limit is hit, it will stop.

The time it takes for your pages to be revisited depends on a number of different factors that play into how Google prioritizes URLs for crawling, including:

  • PageRank.
  • XML sitemap inclusion.
  • Position within the site’s architecture.
  • How frequently the page changes.
  • And more.

The bottom line is: your site only gets Googlebot’s attention for a finite amount of time with each crawl, which could be infrequent. Make sure that time is spent wisely.

It can be hard to know where to start when analyzing how well-optimized your site is for search engine crawlers, especially when you work on a large site with a lot of URLs to analyze, or work in a large company with a lot of competing priorities and outstanding SEO fixes to prioritize.

That’s why I’ve put together this list of top-level checks for assessing crawl hygiene to give you a starting point for your analysis.

1. How Many Pages Are Being Indexed vs. How Many Indexable Pages Are There on the Site?

Why This Is Important

This shows you how many pages on your site are available for Google to index, and how many of those pages Google was actually able to find and how many it determined were important enough to be indexed.

An indexability pie chart in DeepCrawlBar chart showing indexed pages in Google Search Console

2. How Many Pages Are Being Crawled Overall?

Why This Is Important

Comparing Googlebot’s crawl activity against the number of pages you have on your site can give you insights into how many pages Google either can’t access, or has determined aren’t enough of a priority to schedule to be crawled regularly.

Crawl stats line graph in Google Search ConsoleBar chart showing Googlebot crawling in Logz.io

3. How Many Pages Aren’t Indexable?

Why This Is Important

Spending time crawling non-indexable pages isn’t the best use of Google’s crawl budget. Check how many of these pages are being crawled, and whether or not any of them should be made available for indexing.

Bar chart showing non-indexable pages in DeepCrawl

4. How Many URLs Are Being Disallowed from Being Crawled?

Why This Is Important

This will show you how many pages you are preventing search engines from accessing on your site. It’s important to make sure that these pages aren’t important for indexing or for discovering further pages for crawling.

Bar chart showing pages blocked by the robots.txt in Google Search Console

5. How Many Low-Value Pages Are Being Indexed?

Why This Is Important

Looking at which pages Google has already indexed on your site gives an indication into the areas of the site that the crawler has been able to access.

For example, these might be pages that you haven’t included in your sitemaps as they are low-quality, but have been found and indexed anyway.

Bar chart showing pages indexed but not submitted in a sitemap in Google Search Console

6. How Many 4xx Error Pages Are Being Crawled?

Why This Is Important

It’s important to make sure that crawl budget isn’t being used up on error pages instead of pages that you want to have indexed.

Googlebot will periodically try to crawl 404 error pages to see whether the page is live again, so make sure you use 410 status codes correctly to show that pages are gone and don’t need to be recrawled.

A line graph showing broken pages in DeepCrawl

7. How Many Internal Redirects Are Being Crawled?

Why This Is Important

Each request that Googlebot makes on a site uses up crawl budget, and this includes any additional requests within each of the steps in a redirect chain.

Help Google crawl more efficiently and conserve crawl budget by making sure only pages with 200 status codes are linked to within your site, and reduce the number of requests being made to pages that aren’t final destination URLs.

Redirect chain report in DeepCrawl

8. How Many Canonical Pages Are There vs. Canonicalized Pages?

Why This Is Important

The number of canonicalized pages on your site gives an indication into how much duplication there is on your site. While canonical tags consolidate link equity between sets of duplicate pages, they don’t help crawl budget.

Google will choose to index one page out of a set of canonicalized pages, but to be able to decide which is the primary page, it will first have to crawl all of them.

Pie chart showing canonical pages in DeepCrawl

9. How Many Paginated or Faceted Pages Are Being Crawled?

Why This Is Important

Google only needs to crawl pages that include otherwise undiscovered content or unlinked URLs.

Pagination and facets are usually a source of duplicate URLs and crawler traps, so make sure that these pages that don’t include any unique content or links aren’t being crawled unnecessarily.

As rel=next and rel=prev are no longer supported by Google, ensure your internal linking is optimized to reduce reliance on pagination for page discovery.

Pie chart showing pagination breakdown in DeepCrawl

10. Are There Mismatches in Page Discovery Across Crawl Sources?

Why This Is Important

If you’re seeing pages being accessed by users through your analytics data that aren’t being crawled by search engines within your log file data, it could be because these pages aren’t as discoverable for search engines as they are for users.

By integrating different data sources with your crawl data, you can spot gaps where pages can’t be easily found by search engines.

Google’s two main sources of URL discovery are external links and XML sitemaps, so if you’re having trouble getting Google to crawl your pages, make sure they are included in your sitemap if they’re not yet being linked to from any other sites that Google already knows about and crawls regularly.

Bar chart showing crawl source gaps in DeepCrawl

To Sum Up

By running through these 10 checks for your websites that you manage, you should be able to get a better understanding of the crawlability and overall technical health of a site.

Once you identify areas of crawl waste, you can instruct Google to crawl less of those pages by using methods like disallowing them in robots.txt.

You can then start influencing it to crawl more of your important pages by optimizing your site’s architecture and internal linking to make them more prominent and discoverable.

More Resources:


Image Credits

All screenshots taken by author, September 2019



Continue Reading

SEO

Google explains why syndicators may outrank original publishers

Published

on


Last week we reported that Google has updated its algorithms to give original reporting preferred ranking in Google search. So when John Shehata, VP of Audience Growth at Condé Nast, a major publishing company, posted on Twitter that Yahoo is outranking the original source of the article, Google took notice.

The complaint. Shehata posted on Twitter, “Recently I see a lot of instances where Google Top Stories ranking syndicated content from Yahoo above or instead of original content. This is disturbing especially for publishers. Yahoo has no canonicals back to original content but sometimes they link back.”

As you can see, he provided screen shots of this happening as evidence.

No canonical. John also mentioned that Yahoo, who is legally syndicating the content on behalf of Conde Nast, is not using a canonical tag to point back to the original source. Google’s recommendation for those allowing others to syndicate content is to have a clause requiring syndicators must use the canonical tag to point back to the source the site is syndicating from. Using this canonical tag indicate to Google which article page is the original source.

The issue. Sometimes those who license content, the syndicators, post the content before or at the same time as the source they are syndicating it from. That makes it hard for Google or other search engines to know which is the original source. That is why Google wrote, “Publishers that allow others to republish content can help ensure that their original versions perform better in Google News by asking those republishing to block or make use of canonical. Google News also encourages those that republish material to consider proactively blocking such content or making use of the canonical, so that we can better identify the original content and credit it appropriately.”

Google’s response. Google Search Liason Danny Sullivan responded on Twitter: “If people deliberately chose to syndicate their content, it makes it difficult to identify the originating source. That’s why we recommend the use of canonical or blocking. The publishers syndicating can require this.”

This affects both web and News results, Sullivan said. In fact, th original reporting algorithm update has not yet rolled out to Google News, it is just for web search currently:

Solution. If you allow people to syndicate your content, you should require them to use the canonical tag or make them block Google from indexing that content. Otherwise, do not always expect Google to be able to figure out where the article originated from, espesially when your syndication partners publish the story before or at the same time that you publish your story.

Why we care. While the original reporting change is interesting in this case, it is somewhat unrelated. If the same article is published on two different sites at the same time, both sites can appear to the search engines as the original source. If these sites are syndicating your content legally, review or update your contracts to require syndicators to either use canonical tags or block their syndicated content from indexing altogether. If syndicators are stealing your content and outranking you, Google should be better at dealing with that algorithmically, otherwise, you can file a DMCA takedown request with Google.


About The Author

Barry Schwartz is Search Engine Land’s News Editor and owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on SEM topics.

Continue Reading

Trending

Copyright © 2019 Plolu.