Connect with us

SEO

Leverage Python and Google Cloud to extract meaningful SEO insights from server log data

Published

on


For my first post on Search Engine Land, I’ll start by quoting Ian Lurie:

Log file analysis is a lost art. But it can save your SEO butt!

Wise words.

However, getting the data we need out of server log files is usually laborious:

  • Gigantic log files require robust data ingestion pipelines, a reliable cloud storage infrastructure, and a solid querying system
  • Meticulous data modeling is also needed in order to convert cryptic, raw logs data into legible bits, suitable for exploratory data analysis and visualization

In the first post of this two-part series, I will show you how to easily scale your analyses to larger datasets, and extract meaningful SEO insights from your server logs.

All of that with just a pinch of Python and a hint of Google Cloud!

Here’s our detailed plan of action:

#1 – I’ll start by giving you a bit of context:

  • What are log files and why they matter for SEO
  • How to get hold of them
  • Why Python alone doesn’t always cut it when it comes to server log analysis

#2 – We’ll then set things up:

  • Create a Google Cloud Platform account
  • Create a Google Cloud Storage bucket to store our log files
  • Use the Command-Line to convert our files to a compliant format for querying
  • Transfer our files to Google Cloud Storage, manually and programmatically

#3 – Lastly, we’ll get into the nitty-gritty of Pythoning – we will:

  • Query our log files with Bigquery, inside Colab!
  • Build a data model that makes our raw logs more legible 
  • Create categorical columns that will enhance our analyses further down the line
  • Filter and export our results to .csv

In part two of this series (available later this year), we’ll discuss more advanced data modeling techniques in Python to assess:

  • Bot crawl volume
  • Crawl budget waste
  • Duplicate URL crawling

I’ll also show you how to aggregate and join log data to Search Console data, and create interactive visualizations with Plotly Dash!

Excited? Let’s get cracking!

System requirements

We will use Google Colab in this article. No specific requirements or backward compatibility issues here, as Google Colab sits in the cloud.

Downloadable files

  • The Colab notebook can be accessed here 
  • The log files can be downloaded on Github – 4 sample files of 20 MB each, spanning 4 days (1 day per file)

Be assured that the notebook has been tested with several million rows at lightning speed and without any hurdles!

Preamble: What are log files?

While I don’t want to babble too much about what log files are, why they can be invaluable for SEO, etc. (heck, there are many great articles on the topic already!), here’s a bit of context.

A server log file records every request made to your web server for content.

Every. Single. One.

In their rawest forms, logs are indecipherable, e.g. here are a few raw lines from an Apache webserver:

Daunting, isn’t it?

Raw logs must be “cleansed” in order to be analyzed; that’s where data modeling kicks in. But more on that later.

Whereas the structure of a log file mainly depends on the server (Apache, Nginx, IIS etc…), it has evergreen attributes:

  • Server IP
  • Date/Time (also called timestamp)
  • Method (GET or POST)
  • URI
  • HTTP status code
  • User-agent

Additional attributes can usually be included, such as:

  • Referrer: the URL that ‘linked’ the user to your site
  • Redirected URL, when a redirect occurs
  • Size of the file sent (in bytes)
  • Time taken: the time it takes for a request to be processed and its response to be sent

Why are log files important for SEO?

If you don’t know why they matter, read this. Time spent wisely!

Accessing your log files

If you’re not sure where to start, the best is to ask your (client’s) Web Developer/DevOps if they can grant you access to raw server logs via FTP, ideally without any filtering applied.

Here are the general guidelines to find and manage log data on the three most popular servers:

We’ll use raw Apache files in this project.

Why Pandas alone is not enough when it comes to log analysis

Pandas (an open-source data manipulation tool built with Python) is pretty ubiquitous in data science.

It’s a must to slice and dice tabular data structures, and the mammal works like a charm when the data fits in memory!

That is, a few gigabytes. But not terabytes.

Parallel computing aside (e.g. Dask, PySpark), a database is usually a better solution for big data tasks that do not fit in memory. With a database, we can work with datasets that consume terabytes of disk space. Everything can be queried (via SQL), accessed, and updated in a breeze!

In this post, we’ll query our raw log data programmatically in Python via Google BigQuery. It’s easy to use, affordable and lightning-fast – even on terabytes of data!

The Python/BigQuery combo also allows you to query files stored on Google Cloud Storage. Sweet!

If Google is a nay-nay for you and you wish to try alternatives, Amazon and Microsoft also offer cloud data warehouses. They integrate well with Python too:

Amazon:

Microsoft:

Create a GCP account and set-up Cloud Storage

Both Google Cloud Storage and BigQuery are part of Google Cloud Platform (GCP), Google’s suite of cloud computing services.

GCP is not free, but you can try it for a year with $300 credits, with access to all products. Pretty cool.

Note that once the trial expires, Google Cloud Free Tier will still give you access to most Google Cloud resources, free of charge. With 5 GB of storage per month, it’s usually enough if you want to experiment with small datasets, work on proof of concepts, etc…

Believe me, there are many. Great. Things. To. Try!

You can sign-up for a free trial here.

Once you have completed sign-up, a new project will be automatically created with a random, and rather exotic, name – e.g. mine was “learned-spider-266010“!

Create our first bucket to store our log files

In Google Cloud Storage, files are stored in “buckets”. They will contain our log files.

To create your first bucket, go to storage > browser > create bucket:

The bucket name has to be unique. I’ve aptly named mine ‘seo_server_logs’!

We then need to choose where and how to store our log data:

  • #1 Location type – ‘Region’ is usually good enough.
  • #2 Location – As I’m based in the UK, I’ve selected ‘Europe-West2’. Select your nearest location
  • #3 Click on ‘continue’

Default storage class: I’ve had good results with ‘nearline‘. It is cheaper than standard, and the data is retrieved quickly enough:

Access to objects: “Uniform” is fine:

Finally, in the “advanced settings” block, select:

  • #1 – Google-managed key
  • #2 – No retention policy
  • #3 – No need to add a label for now

When you’re done, click “‘create.”

You’ve created your first bucket! Time to upload our log data.

Adding log files to your Cloud Storage bucket

You can upload as many files as you wish, whenever you want to!

The simplest way is to drag and drop your files to Cloud Storage’s Web UI, as shown below:

Yet, if you really wanted to get serious about log analysis, I’d strongly suggest automating the data ingestion process!

Here are a few things you can try:

  • Cron jobs can be set up between FTP servers and Cloud Storage infrastructures: 
  • FTP managers like Cyberduck also offer automatic transfers to storage systems, too
  • More data ingestion tips here (AppEngine, JSON API etc.)

A quick note on file formats

The sample files uploaded in Github have already been converted to .csv for you.

Bear in mind that you may have to convert your own log files to a compliant file format for SQL querying. Bigquery accepts .csv or .parquet. 

Files can easily be bulk-converted to another format via the command line. You can access the command line as follows on Windows:

  • Open the Windows Start menu
  • Type “command” in the search bar
  • Select “Command Prompt” from the search results
  • I’ve not tried this on a Mac, but I believe the CLI is located in the Utilities folder

Once opened, navigate to the folder containing the files you want to convert via this command:

CD 'path/to/folder’

Simply replace path/to/folder with your path.

Then, type the command below to convert e.g. .log files to .csv:

for file in *.log; do mv "$file" "$(basename "$file" .*0).csv"; done

Note that you may need to enable Windows Subsystem for Linux to use this Bash command.

Now that our log files are in, and in the right format, it’s time to start Pythoning!

Unleash the Python

Do I still need to present Python?!

According to Stack Overflow, Python is now the fastest-growing major programming language. It’s also getting incredibly popular in the SEO sphere, thanks to Python preachers like Hamlet or JR.

You can run Python on your local computer via Jupyter notebook or an IDE, or even in the cloud via Google Colab. We’ll use Google Colab in this article.

Remember, the notebook is here, and the code snippets are pasted below, along with explanations.

Import libraries + GCP authentication

We’ll start by running the cell below:

It imports the Python libraries we need and redirects you to an authentication screen.

There you’ll have to choose the Google account linked to your GCP project.

Connect to Google Cloud Storage (GCS) and BigQuery

There’s quite a bit of info to add in order to connect our Python notebook to GCS & BigQuery. Besides, filling in that info manually can be tedious!

Fortunately, Google Colab’s forms make it easy to parameterize our code and save time.

The forms in this notebook have been pre-populated for you. No need to do anything, although I do suggest you amend the code to suit your needs.

Here’s how to create your own form: Go to Insert > add form field > then fill in the details below:

When you change an element in the form, its corresponding values will magically change in the code!

Fill in ‘project ID’ and ‘bucket location’

In our first form, you’ll need to add two variables:

  • Your GCP PROJECT_ID (mine is ‘learned-spider-266010′)
  • Your bucket location:
    • To find it, in GCP go to storage > browser > check location in table
    • Mine is ‘europe-west2′

Here’s the code snippet for that form:

Fill in ‘bucket name’ and ‘file/folder path’:

In the second form, we’ll need to fill in two more variables:

The bucket name:

  • To find it, in GCP go to: storage > browser > then check its ‘name’ in the table
  • I’ve aptly called it ‘apache_seo_logs’!

The file path:

  • You can use a wildcard to query several files – Very nice!
  • E.g. with the wildcarded path ‘Loggy*’, Bigquery would query these three files at once:
    • Loggy01.csv
    • Loggy02.csv
    • Loggy03.csv
  • Bigquery also creates a temporary table for that matter (more on that below)

Here’s the code for the form:

Connect Python to Google Cloud Storage and BigQuery

In the third form, you need to give a name to your BigQuery table – I’ve called mine ‘log_sample’. Note that this temporary table won’t be created in your Bigquery account.

Okay, so now things are getting really exciting, as we can start querying our dataset via SQL *without* leaving our notebook – How cool is that?!

As log data is still in its raw form, querying it is somehow limited. However, we can apply basic SQL filtering that will speed up Pandas operations later on.

I have created 2 SQL queries in this form:

  • “SQL_1st_Filter” to filter any text
  • “SQL_Useragent_Filter” to select your User-Agent, via a drop-down

Feel free to check the underlying code and tweak these two queries to your needs.

If your SQL trivia is a bit rusty, here’s a good refresher from Kaggle!

Code for that form:

Converting the list output to a Pandas Dataframe

The output generated by BigQuery is a two-dimensional list (also called ‘list of lists’). We’ll need to convert it to a Pandas Dataframe via this code:

Done! We now have a Dataframe that can be wrangled in Pandas!

Data cleansing time, the Pandas way!

Time to make these cryptic logs a bit more presentable by:

  • Splitting each element
  • Creating a column for each element

Split IP addresses

Split dates and times

We now need to convert the date column from string to a “Date time” object, via the Pandas to_datetime() method:

Doing so will allow us to perform time-series operations such as:

  • Slicing specific date ranges 
  • Resampling time series for different time periods (e.g. from day to month)
  • Computing rolling statistics, such as a rolling average

The Pandas/Numpy combo is really powerful when it comes to time series manipulation, check out all you can do here!

More split operations below:

Split domains

Split methods (Get, Post etc…)

Split URLs

Split HTTP Protocols

Split status codes

Split ‘time taken’

Split referral URLs

Split User Agents

Split redirected URLs (when existing)

Reorder columns

Time to check our masterpiece:

Well done! With just a few lines of code, you converted a set of cryptic logs to a structured Dataframe, ready for exploratory data analysis.

Let’s add a few more extras.

Create categorical columns

These categorical columns will come handy for data analysis or visualization tasks. We’ll create two, paving the way for your own experiments!

Create an HTTP codes class column

Create a search engine bots category column

As you can see, our new columns httpCodeClass and SEBotClass have been created:

Spotting ‘spoofed’ search engine bots

We still need to tackle one crucial step for SEO: verify that IP addresses are genuinely from Googlebots.

All credit due to the great Tyler Reardon for this bit! Tyler has created  searchtools.io, a clever tool that checks IP addresses and returns ‘fake’ Googlebot ones, based on a reverse DNS lookup.

We’ve simply integrated that script into the notebook – code snippet below:

Running the cell above will create a new column called ‘isRealGbot?:

Note that the script is still in its early days, so please consider the following caveats:

  • You may get errors when checking a huge amount of IP addresses. If so, just bypass the cell
  • Only Googlebots are checked currently

Tyler and I are working on the script to improve it, so keep an eye on Twitter for future enhancements!

Filter the Dataframe before final export

If you wish to further refine the table before exporting to .csv, here’s your chance to filter out status codes you don’t need and refine timescales.

Some common use cases:

  • You have 12 months’ worth of log data stored in the cloud, but only want to review the last 2 weeks
  • You’ve had a recent website migration and want to check all the redirects (301s, 302s, etc.) and their redirect locations
  • You want to check all 4XX response codes

Filter by date 

Refine start and end dates via this form:

Filter by status codes

Check status codes distribution before filtering:

Code:

Then filter HTTP status codes via this form:

Related code:

Export to .csv 

Our last step is to export our Dataframe to a .csv file. Give it a name via the export form:

Code for that last form:

Pat on the back if you’ve followed till here! You’ve achieved so much over the course of this article!

I cannot wait to take it to the next level in my next column, with more advanced data modeling/visualization techniques!

I’d like to thank the following people:

  • Tyler Reardon, who’s helped me to integrate his anti-spoofing tool into this notebook!
  • Paul Adams from Octamis and my dear compatriot Olivier Papon for their expert advice
  • Last but not least, Kudos to Hamlet Batista or JR Oakes – Thanks guys for being so inspirational to the SEO community!

Please reach me out on Twitter if questions, or if you need further assistance. Any feedback (including pull requests! :)) is also greatly appreciated!

Happy Pythoning!

This year’s SMX Advanced will feature a brand-new SEO for Developers track with highly-technical sessions – many in live-coding format – focused on using code libraries and architecture models to develop applications that improve SEO. SMX Advanced will be held June 8-10 in Seattle. Register today.


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


About The Author

Charly Wargnier is a seasoned digital marketing consultant based in the UK, leaning on over a decade of in-the-trenches SEO, BI and Data engineering experience. Charly has worked both in-house and agency-side, primarily for large enterprises in Retail and Fashion, and on a wide range of fronts including complex technical SEO issues, site performance, data pipelining and visualization frameworks. When he isn’t working, he enjoys coding for good and spending quality time with his family – cooking, listening to Jazz music and playing chess, in no particular order!



Continue Reading
Click to comment

You must be logged in to post a comment Login

Leave a Reply

SEO

How to drive digital innovation necessary during the pandemic

Published

on


30-second summary:

  • COVID-19 has kept consumers in their homes, which has led to significant spikes in internet use and companies scrambling to digitize in order to meet customers where they are.
  • The ability to quickly develop digital capabilities will continue to be critical for meeting customer needs and ensuring organizations’ survival.
  • To remain competitive, companies must enhance the digital customer experiences they offer through upgraded social media, optimized conversion, strategies, better marketing research, an effective internal website search, and fresh customer touchpoints.

Emerging digital technologies like artificial intelligence (AI) and cloud computing enticed leaders with their agility and efficiency. Many companies planned to make digitization a goal for the new decade.

In hindsight, they probably wish they hadn’t waited.

The novel coronavirus upended every aspect of our lives. As businesses and governments around the world try to combat the pandemic, millions of consumers sit inside their homes. And where do people go during a government-mandated lockdown? Online.

The unprecedented shift to remote work and online learning, combined with a dramatic increase in movie streaming, videoconferencing, and social media traffic, has led to significant spikes in internet use. In this same time frame, big tech companies — the businesses at the forefront of digital innovation — have flourished, as have brands that capitalized on the power of social media engagement.

The biggest trick to digitization right now is meeting customers where they are. For example, my company, Teknicks, is working with an online K-12 speech and occupational therapy provider. When schools began transitioning to remote learning, students’ needs changed, too. We helped the provider pivot its value proposition and messaging to accommodate school districts’ new realities. By focusing on teletherapy tools and reassuring parents, we’ve seen substantial growth and brand recognition during the pandemic.

Until we find a vaccine for the novel coronavirus, your customers will likely engage with you through online channels. The ability to develop digital capabilities quickly will continue to be critical for meeting customer needs and ensuring survival for your organization. With that in mind, here’s how you can enhance your digital customers’ experiences:

1. Upgrade your social media

It’s not hard to be good at social media marketing — it’s hard to be great. As you build your audience on websites like Facebook and Instagram, be sure to engage with followers consistently. Create a content calendar mapping out your posts and sharing strategies and stick to it. These platforms are also a great channel for customer service, allowing you to provide personalized support and become instantaneously useful (something that customer support tickets and chatbots never seem to be).

If you already have a sizable engaged audience, it’s time to work on your content strategy. Don’t build your content strategy around keywords. Instead, focus on your audiences’ needs. A truly effective content strategy will be customized for the platform you’re on and will account for the user behavior most characteristic of that platform. Naturally, you will use keywords and phrases that are optimized for discoverability while maintaining authenticity.

One key strategy is to conduct marketing research using a survey. This tactic goes well beyond traditional keyword research and generates content ideas directly from your targeted audience, not a keyword tool. Surveying your prospective customers allows them to tell you what type of content they want to consume, significantly increasing the likelihood of engagement. Often, this strategy is the key to successful marketing strategy. I’ll go into more detail below.

2. Focus on and prioritize conversion optimization

Ideally, your website looks good and loads quickly, but those qualities alone don’t make a website great. The user experience that your website offers is ultimately what determines whether customers bounce in droves or actually stick around. Attempting to boost your initial traffic will exponentially increase customer acquisition costs, so improving your conversion rates via website optimization is a more affordable (and profitable) solution.

We often see double-digit increases in conversion rates on our first test. We typically focus on the most trafficked pages to increase the likelihood of big, impactful wins. There is an entire science behind conversion optimization, but the core fundamentals have remained the same for years.

To make sure your website’s architecture is seamless and intuitive, develop a conversion rate optimization strategy that works for you. This will require you to ask visitors for feedback, experiment with different messaging options, and regularly review your analytics, among other things. The idea is to get to know your visitors well. It takes work, but it will pay off over time as the incremental conversion rate increases impact top-line revenue.

3. Conduct marketing research surveys

With the right insights, you can turn every engagement into a memorable and valuable experience for both you and your customers. The best way to get customer insights is to ask. Design a survey of up to 10 questions in a variety of formats along with some screening questions to make sure the feedback you get is actually useful.

When designing, consider your potential customers’ preferences and pain points. For example, if you know your audience is mostly on Instagram, asking “What do you like about social media?” won’t be as effective as “What makes Instagram posts better than Facebook posts?” Once the survey’s drafted, post it to your social channels and send it out to your mailing list. You want to understand which messages resonate with your audience before you spend a cent on marketing. Learning how to conduct marketing research is one of the most important marketing skills you can attain.

Asking individual customers how they feel about various messaging options can give you a goldmine of useful data to help inform the language and design choices you make. Not every customer will choose to participate in a survey, but some will. Show them you appreciate their input by offering a small discount or another incentive once the survey is completed. You’ll be surprised by how many responses you get and how beneficial the precursory information is.

4. Review your internal website search

As much as you’d love for every visitor to spend hours exploring every nook and cranny of your website, most will want to get on with their lives after they’ve found what they came for. To make the process faster, you should offer some sort of internal website search functionality. If you don’t already have one, add a search box to your navigation menu.

Not every website has one, and even the ones that do have very surface-level functions. However, search bars are a valuable asset that can increase internal sessions and conversion. Internal website searchers are 216% likelier to convert, according to WebLinc. Search bars assist your visitors and expand your understanding of user behavior, providing you with the information you need in order to adjust your website accordingly.

Evaluate the effectiveness of your internal search, taking notice of how it finds and organizes the content after a search. Most native search functionality is very basic and just looks for the presence of “search term,” but you may want to test out more advanced filters that help users more effectively find the information they are looking for.

I recommend looking at the search data monthly to see what users have been looking for. Be sure to review what searches yielded zero results and which searches brought up irrelevant content. Identify areas that can be approved and understand your content gaps that need additional content to support the demand.

5. Identify new customer touchpoints

Innovation is all about using new technology to improve old processes. While your typical customer journey might depend on your industry and business, chances are good that you can find ways to enhance it with emerging technologies.

Evaluating whether an emerging technology is a fit for your business and whether you should invest in testing it out, starts with (drumroll …) a survey. As we discussed earlier, surveys can answer just about anything you want to know about your target audience. Go ahead and ask your audience if they own or use the emerging tech and validate its place in the customer journey.

Take the new home buying process, for example. David Weekley Homes, the largest privately-held home builder in the U.S., wanted to better understand whether voice-enabled devices can play a role in the customer journey. The company also wanted to propose a voice app idea to the audience and understand how they felt about the emerging technology concept. By conducting a survey, we uncovered that 81% of the respondents would consider the voice app idea to be somewhat to extremely valuable and 70% would possibly to definitely use the voice app if it existed.

The increasing usage of voice search and voice-enabled devices also offers an opportunity for consumer brands to make it easier than ever for customers to find their products. Tide, for example, has capitalized on marketing on Amazon’s Alexa Skills platform to remove a step from the purchasing process. Customers can use the company’s skill to order Tide products without having to pull up the Amazon app or go to the Tide website. In that way, new tech makes an old process (purchasing detergent) more frictionless than ever.

The COVID-19 pandemic has made digital innovation a business imperative. Regardless of your industry, you should look for ways to anticipate and meet customer needs. Your customers expect a seamless digital experience. If you can’t provide it, they won’t have to leave their homes to find someone else that can.

Nick Chasinov is the founder and CEO of Teknicks, a research-based internet marketing agency certified by Google in Analytics, Tag Manager, and a Google Premier AdWords partner.



Source link

Continue Reading

SEO

Core Web Vitals, E-A-T, or AMP?

Published

on


30-second summary:

  • The biggest Google update of the year is called the Page Experience update.
  • Core Web Vitals are part of that update, and they are definitely ranking factors to keep in mind, especially when optimizing images.
  • AMP is no longer the only way to get a “Top Stories” feature on mobile. Starting in 2021, any news webpage can become a “Top Story”.
  • Combining AMP’s privacy concerns and cost of operation might mean that AMP will disappear within a couple of years.
  • E-A-T is not a ranking factor right now, and we don’t know if it will become one in the future.

2020. What a year. History is happening around us, and Google? Well, Google keeps on revamping their search algorithms. Over the years, there have been many many major algorithm updates, as Google worked to keep us on our toes. 2020 was no different: in one fell swoop, we got the news about a Page Experience update and AMP news. All the while the debate about whether or not you need E-A-T for ranking rages on. How do the Core Web Vitals stand in changing the search game in 2021?

Let’s go over each of these innovations and see which will change the way we do SEO, and which will fade into obscurity sooner rather than later.

1. Importance of core web vitals for SEO

Core Web Vitals were part of Page Experience update, and, by far, caused the biggest ruckus.

There’s a lot to learn about Core Web Vitals, but they boil down to the three biggest issues on our webpages:

  1. LCP — Largest Contentful Paint, which deals with the loading speed of the largest single object on the page.
  2. FID — First Input Delay, which means the reaction time of the page to the first user input after (whether they click, tap, or press any keys).
  3. CLS — Cumulative Layout Shift — this is the measure of how much the content of the page jumps while loading content, mostly visual content, after opening.

How core web vitals influences rankings

Of course, some SEO experts think that the entire Page Experience update is nothing special, and could even: “[…] distract, […] from the core mission of communication and storytelling,”.

And, sure, most of Page experience update is simply an assembly of things we’ve known for a while: use HTTPS, be mobile-friendly, control your page speed, and so on.

But Core Web Vitals are a bit different and can influence the SEO practice in unexpected ways. Key factor that’s already changing rankings is Cumulative Layout Shift.

As most SEO experts know, for a while an important part of image optimization was using the <decoding=async> attribute in the <img> tag to avoid losing page speed while rendering the page.

Using <decoding=async> could lead to some seriously janky pages if coders didn’t specify the height and width of every single image to be rendered. Some websites did it anyway, for example, Wikipedia on most of its pages has a predefined space for images created ahead of time.

Core Web Vitals and other ranking factors for 2021 - Wikipedia

But as SEO experts we didn’t have to worry about pages being jumpy all too much, as that didn’t influence the rankings. Now with CLS being formally announced as a ranking factor, things will change for a whole slew of websites and SEO experts.

We’ll need to make sure that every webpage is coded with CLS in mind, with the needed space for every image defined ahead of time, to avoid the layout shifts.

The verdict

Overall, of course, it’s too early to tell, and more work by SEO’s around the web needs to be done here. However, it seems that if you aren’t used to focusing on technical SEO, Core Web Vitals becoming ranking signals might not influence your day-to-day work at all.

However, if you are conducting complicated technical SEO, then Core Web Vitals will definitely change the way you work in as-yet unexpected ways.

2. Importance of AMP for SEO

The AMP’s relevance today is kind of an open question. While it’s always been great as a quick-and-easy way to increase page speed, the privacy concerns have been voiced over and over again since the technology’s very inception.

But in 2020, significant changes are afoot, since, within the same Page Experience update, Google announced that there’s finally no requirement for us to create AMP pages to occupy the “Top Stories” SERP feature.

That’s a pretty huge step for anybody trying to accrue as many SERP features as they can, and, in particular, for news websites.

Core Web Vitals and other search ranking factors for 2021 - Top Stories

How AMP influences rankings

If we believe John Muellers’ words, then AMP is not a ranking factor. Seems plain and simple enough. But of course, things aren’t so simple, because AMP comes with pretty significant gains in page speed, and speed is an important ranking factor.

Thanks to AMP’s pre-rendering combined with some severe design limitations, AMP webpages often really do win in page speed, even if not in ranking as is.

The “Top Stories” SERP feature, however, was a huge benefit to using an AMP for any news agency with a website, and it’s easy to understand why. Just look at how much of the page is occupied by the “Top Stories” results.

Not only do “Top Stories” automatically get top 1 ranking on the SERP, but they also sport a logo of the website posting them, standing out even more from the boring old blue-link SERP.

This means that for a few years now news websites were essentially forced into using AMP to get into a “Top Stories” SERP feature on mobile since it absorbs a whole lot of clicks.

On the other hand, it takes quite a lot of resources to support AMP versions of the webpages, because you are basically maintaining a whole additional version of your website.

Added to which, a page that’s been properly optimized for speed might not need AMP for those speed gains, as well.

The verdict

While it’s tough to imagine that AMP will fade away completely within the next couple of years, AMP’s privacy issues combined with the cost of maintaining it might spell the end of it being a widely used practice.

Now, with the “Top Stories” becoming available to non-AMP pages, there’s virtually no reason to jeopardize the users’ security for speed gains you could get by proper optimization.

3. Importance of E-A-T for SEO

Expertise. Authority. Trust. All perfectly positive words and something we should all strive for in our professional lives. But what about search optimization?

Coming straight from Google’s Quality Rater Guidelines, E-A-T has been the talk of the town for a good moment now. Let’s dive in and see how they might change the way we optimize for search.

How E-A-T influences rankings

For most of us, they don’t really.

Sure, Quality Rater Guidelines provide valuable insights into Google’s ranking process. However, E-A-T is one of the lesser-important factors we should be focusing on, partly because these are nebulous, abstract concepts, and partly because Google doesn’t exactly want us to.

As Google’s official representatives informed us, E-A-T is not in itself a ranking factor.

Receiving follow-up questions, Google’s John Mueller then reiterated that point, and Ben Gomes, Google’s VP of search engineering confirmed that quality raters don’t influence any page’s rankings directly.

However, in practice, we often see that the so-called YMYL websites already can’t rank without having some expertise and authority established. A very popular example is that it’s virtually impossible to rank a website providing medical advice without an actual doctor writing the articles.

The problem here is that expertise, authority, and trustworthiness are not easily interpreted by the search algorithms, which only understand code.

And, at the moment, there seems to be no surefire way for Google to transform these signals into rankings, except to read the feedback of their quality raters before each algorithm update.

The verdict

While using E-A-T to rank websites might sound like an inarguable benefit for the searcher, there is a couple of concerns that aren’t easily solved, namely:

  1. Who exactly will be determining the E-A-T signals, and according to which standard?
  2. The introduction of such factors creates a system where the smaller and newer websites are punished in rankings for not having the trustworthiness that they couldn’t realistically acquire.

Responding to both of these concerns requires time and effort on the search engine’s side.

As things stand right now, E-A-T is not something to keep in mind while doing day-to-day SEO operations.

Let’s imagine a fantastical scenario where a webmaster/SEO expert has some free time. Then they might want to work on E-A-T, to try and stay ahead of the curve.

On the other hand, there simply isn’t any proof that Google will actually use E-A-T. Or that, even if used, these signals will become major ranking factors. For this reason, E-A-T shouldn’t be your priority ahead of traditional SEO tasks like link building and technical optimization.

Additionally, consider this. The entire Quality Rater Guidelines is 168 pages long. However, a comprehensive explanation of what E-A-T is and why it might be calculated a certain way will take many more pages than that.

Conclusion

As of the time of this writing, the Core Web Vitals seems to be the most important ranking news to come out in 2020 in practical terms. However, search is an extremely volatile field: what worked two weeks ago may not work today, and what works today might not work for most of us.

The matters are further complicated because we’re fighting an uneven battle: it’s simply not in search engines’ best interest to give us a full and detailed picture of how ranking works, lest we abuse it.

This is why it’s crucial to keep our hand on the pulse of optimization news and changes occurring every single day. With constant efforts from our SEO community to work out the best way to top rankings, it’s possible for us to close that gap and know for sure which trends are paramount, and which we can allow ourselves to overlook.

Aleh Barysevich is Founder and CMO at SEO PowerSuite and Awario.





Source link

Continue Reading

SEO

How to optimize and use effectively

Published

on


30-second summary:

  • Partial match domains refer to when your domain name has partially included the main keyword that you are trying to rank for.
  • There are many aspects that make it different from how the exact match domain works.
  • Tudor Lodge Consultants share a quick guide to help you succeed at partial match domains, understand the caveats, and optimize effectively.

Partial match domains refer to when your domain name has partially included the main keyword that you are trying to rank for.

Commonly used by SEO professionals to gain an advantage when it comes to ranking in the search engines or from business owners who have a company name that is closely linked to the services they offer or area they work in.

Examples of partial matches include having vital keywords like “insurance”, “loans”, or “casino” in the domain name or adding words like “hub”, “network”, or “quick” to the beginning or end of the domain, such as casinohub.com, everydayinsurance.com or quickmoney.com

This is different from an exact match domain (EMD) which stipulates the exact keywords you are trying to rank for in your domain name e.g carinsurance.com, plumbing.com, bestcasinos.com

Content created in partnership with Tudor Lodge Consultants.

Why can partial match domains be an issue?

Historically, having an exact match or partial match domain was a sure-fire way to rank top for your target keywords – only for Google to weigh this down considerably in recent years as a way to make SEO positions more ‘earned’ rather than ‘gained.’

Partial match and exact match domain have been shown to have a higher click-through-rate (CTR) in search results – largely because they mention the exact words that the customer is looking for. Unsurprisingly, these domains can be worth thousands and are put on sale through the likes of GoDaddy and 123 Reg.

Whilst having a partial match domain can be an advantage for SEO, there is always the risk of exposing your business to a Google penalty, especially as Google’s guidelines become more strict and give preference to brands that demonstrate good use of the content, link-building, varied traffic sources, and user experience.

Although you may demonstrate very good SEO results initially, you may find yourself compromised during the next algorithm update, which could have a massive impact on your website and its rankings – and make it very challenging to recover from the penalty. Not to mention, the financial implications to you and your client.

Therefore, being conscious of partial matches and how they work for SEO is of vital importance.

When partial match domains are high risk

Partial matches are high risk when optimizing in an industry that is very highly competitive and prone to algorithm updates – such as casino or gamblings, loans and credit, finance and insurance, web hosting, FX, and more.

Reason 1: There is a risk that you may use too many keywords in your URL, meta-data, and content and this is deemed as keyword stuffing by Google and is therefore penalized in the next update.

Reason 2: You may be generating links back to the site, but getting your brand name linked back to the site might be considered overkill if it mentions high-risk words like “casino”, “loans”, or “insurance” too often.

When partial match domains are low risk

Partial match domains are low risk when targeting local SEO searches (that is, a location) or the keywords are not competitive.

After all, if you have the domain name malibu-hairdressers.com, there are only going to be a handful of hairdressers in the Malibu area to compete against and this is a viable name for a company in that area. Also, local SEO searches are not often included in algorithm updates, which makes them a safer bet and you can always gain good and free exposure through the three results that feature on Google Local Listings.

If your keywords are not competitive and you are more or less the only person in your industry, you should be low risk, since you may not need many optimizations to get to position one of Google and the role of keyword stuffing does not come into play as much.

In addition, if your website is an information resource, you are trying to capture lots of search phrases and not heavily relying on just a few that might be struck by an algorithm. A website that is full of guides or news, should generate content and links more naturally, even though it has a partial match domain. Successful examples of sites like this include searchenginewatch.com, moneyadviceservice.co.uk, and smcrcompliance.com.

How to optimize partial match domains

1. Be as natural as possible

If you have a partial match domain and are already optimizing it, try to make the SEO as natural as possible. Create good quality content guides or blog posts and when getting links, drive them towards these pages, not your money pages.

If you are linking back money pages, use anchor like ‘read more’ or ‘find out more’ to hyperlink back to them. Try to stay clear or exact match or partial match anchor text as this could be seen as too spammy. It’s not too late to message all the links you have generated so far and get them redirected to safer pages or blog posts on your website. This approach may take longer but will be much more safer and effective long-term.

2. Manage your keyword stuffing

Try and avoid using the main keyword like “casino” or “insurance” too often. One of the simplest ways is to mention it one only in the meta-title, meta-description, and URL too.

Rather than: quickcarinsurance.com/car-insurance

Use: quickcarinsurance.com/car

3. Try to avoid using one from the start

If you can avoid using a partial match domain from the start, this would be ideal. As SEOs, we never know what is round the corner with Google’s guidelines, but we can certainly see the trend of brands taking center stage on page one. So with this in mind, try using a brand name if you can.

Be clever with your domain name: You do not necessarily have to include the money word to get the value of a high-click-rate. You can be smart with your domain choices, such as the company Fetch.com which is a pick-up delivery app, or Paw.com for dog accessories, or GetIndemnity.co.uk, the large business insurance broker. Think of good synonyms or words connected to the brand, without compromising your Google ranking.

4. Manage the expectations of your client

The majority of SEO clients want quick results, even though we really need six to 12 months (or longer) to show the full impact of our work. When pitching to a client with a partial match or exact match domain, you need to manage expectations that there might be a fall in rankings during the course of a year due to an algorithm change – and you may need to make changes for this to recover. Someone with a long-term view on their SEO will appreciate this, but someone who wants quick results will not and will likely demand their money back before you know it.



Source link

Continue Reading

Trending

Copyright © 2019 Plolu.