My original plan was to cover this topic: “How to Build a Bot to Automate your Mindless Tasks using Python and BigQuery”. I made some slight course changes, but hopefully, the original intention remains the same!
The inspiration for this article comes from this tweet from JR Oakes. 🙂
I think I just have the inspiration I was looking for for my next @sejournal column “How to build a bot to automate your mindless tasks using #python and @bigquery” 🤓 Thanks JR! 🍺 https://t.co/dQqULIH2p2
— Hamlet 🇩🇴 (@hamletbatista) July 12, 2019
As Uber released an updated version of Ludwig and Google also announced the ability to execute Tensorflow models in BigQuery, I thought the timing couldn’t be better.
In this article, we will revisit the intent classification problem I addressed before, but we will replace our original encoder for a state of the art one: BERT, which stands for Bidirectional Encoder Representations from Transformers.
This small change will help us improve the model accuracy from a 0.66 combined test accuracy to 0.89 while using the same dataset and no custom coding!
Here is our plan of action:
- We will rebuild the intent classification model we built on part one, but we will leverage pre-training data using a BERT encoder.
- We will test it again against the questions we pulled from Google Search Console.
- We will upload our queries and intent predictions data to BigQuery.
- We will connect BigQuery to Google Data Studio to group the questions by their intention and extract actionable insights we can use to prioritize content development efforts.
- We will go over the new underlying concepts that help BERT perform significantly better than our previous model.
Setting up Google Colaboratory
As in part one, we will run Ludwig from within Google Colaboratory in order to use their free GPU runtime.
First, run this code to check the Tensorflow version installed.
import tensorflow as tf; print(tf.__version__)
Let’s make sure our notebook uses the right version expected by Ludwig and that it also supports the GPU runtime.
I get 1.14.0 which is great as Ludwig requires at least 1.14.0
Under the Runtime menu item, select Python 3 and GPU.
You can confirm you have a GPU by typing:
At the time of this writing, you need to install some system libraries before installing the latest Ludwig (0.2). I got some errors that they later resolved.
!apt-get install libgmp-dev libmpfr-dev libmpc-dev
When the installation failed for me, I found the solution from this StackOverflow answer, which wasn’t even the accepted one!
!pip install ludwig
You should get:
Successfully installed gmpy-1.17 ludwig-0.2
Prepare the Dataset for Training
We are going to use the same question classification dataset that we used in the first article.
After you log in to Kaggle and download the dataset, you can use the code to load it to a dataframe in Colab.
Configuring the BERT Encoder
Instead of using the parallel CNN encoder that we used in the first part, we will use the BERT encoder that was recently added to Ludwig.
This encoder leverages pre-trained data that enables it to perform better than our previous encoder while requiring far less training data. I will explain how it works in simple terms at the end of this article.
Let’s first download a pretrained language model. We will download the files for the model BERT-Base, Uncased.
I tried the bigger models first, but hit some roadblocks due to their memory requirements and the limitations in Google Colab.
Unzip it with:
The output should look like this:
Archive: uncased_L-12_H-768_A-12.zip creating: uncased_L-12_H-768_A-12/ inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.meta inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.data-00000-of-00001 inflating: uncased_L-12_H-768_A-12/vocab.txt inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.index inflating: uncased_L-12_H-768_A-12/bert_config.json
Now we can put together the model definition file.
Let’s compare it to the one we created in part one.
I made a number of changes. Let’s review them.
I essentially changed the encoder from parallel_cnn to bert and added extra parameters required by bert: config_path, checkpoint_path, word_tokenizer, word_vocab_file, padding_symbol, and unknown_symbol.
Most of the values come from the language model we downloaded.
I added a few more parameters that I figured out empirically: batch_size, learning_rate and word_sequence_length_limit.
The default values Ludwig uses for these parameters don’t work for the BERT encoder because they are way off compared to the pre-trained data. I found some working values in the BERT documentation.
The training process is the same as we’ve done previously. However, we need to install bert-tensorflow first.
!pip install bert-tensorflow
!ludwig experiment --data_csv Question_Classification_Dataset.csv --model_definition_file model_definition.yaml
We beat our previous model performance after only two epochs.
The final improvement was 0.89 combined test accuracy after 10 epochs. Our previous model took 14 epochs to get to .66.
This is pretty remarkable considering we didn’t write any code. We only changed some settings.
It is incredible and exciting how fast deep learning research is improving and how accessible it is now.
Why BERT Performs So Well
There are two primary advantages from using BERT compared to traditional encoders:
- The bidirectional word embeddings.
- The language model leveraged through transfer learning.
Bidirectional Word Embeddings
When I explained word vectors and embeddings in part one, I was referring to the traditional approach (I used a GPS analogy of coordinates in an imaginary space).
Traditional word embedding approaches assign the equivalent of a GPS coordinate to each word.
Let’s review the different meanings of the word “Washington” to illustrate why this could be a problem in some scenarios.
- George Washington (person)
- Washington (State)
- Washington D.C. (City)
- George Washington Bridge (bridge)
The word “Washington” above represents completely different things and a system that assigns the same coordinates regardless of context, won’t be very precise.
If we are in Google’s NYC office and we want to visit “Washington”, we need to provide more context.
- Are we planning to visit the George Washington memorial?
- Do we plan to drive south to visit Washington, D.C.?
- Are we planning a cross country trip to Washington State?
As you can see in the text, the surrounding words provide some context that can more clearly define what “Washington” might mean.
If you read from left to right, the word George, might indicate you are talking about the person, and if you read from right to left, the word D.C., might indicate you are referring to the city.
But, you need to read from left to right and from right to left to tell you actually want to visit the bridge.
BERT works by encoding different word embeddings for each word usage, and relies on the surrounding words to accomplish this. It reads the context words bidirectionally (from left to right and from right to left).
Back to our GPS analogy, imagine an NYC block with two Starbucks coffee shops in the same street. If you want to get to a specific one, it would be much easier to refer to it by the businesses that are before and/or after.
Transfer learning is probably one of the most important concepts in deep learning today. It makes many applications practical even when you have very small datasets to train on.
Traditionally, transfer learning was primarily used in computer vision tasks.
You typically have research groups from big companies (Google, Facebook, Stanford, etc.) train an image classification model on a large dataset like that from Imagenet.
This process would take days and generally be very expensive. But, once the training is done, the final part of the trained model is replaced, and retrained on new data to perform similar but new tasks.
This process is called fine tuning and works extremely well. Fine tuning can take hours or minutes depending on the size of the new data and is accessible to most companies.
Let’s get back to our GPS analogy to understand this.
Say you want to travel from New York City to Washington state and someone you know is going to Michigan.
Instead of renting a car to go all the way, you could hike that ride, get to Michigan, and then rent a car to drive from Michigan to Washington state, at a much lower cost and driving time.
BERT is one of the first models to successful apply transfer learning in NLP (Natural Language Processing). There are several pre-trained models that typically take days to train, but you can fine tune in hours or even minutes if you use Google Cloud TPUs.
Automating Intent Insights with BigQuery & Data Studio
Now that we have a trained model, we can test on new questions we can grab from Google Search Console using the report I created on part one.
We can run the same code as before to generate the predictions.
This time, I also want to export them to a CSV and import into BigQuery.
test_df.join(predictions)[["Query", "Clicks", "Impressions", "Category0_predictions", "Category2_predictions"]].to_csv("intent_predictions.csv")
First, log in to Google Cloud.
!gcloud auth login --no-launch-browser
Open the authorization window in a separate tab and copy the token back to Colab.
Create a bucket in Google Cloud Storage and copy the CSV file there. I named my bucket bert_intent_questions.
This command will upload our CSV file to our bucket.
!gsutil cp -r intent_predictions.csv gs://bert_intent_questions
You should also create a dataset in BigQuery to import the file. I named my dataset bert_intent_questions
!bq load --autodetect --source_format=CSV bert_intent_questions.intent_predictions gs://bert_intent_questions/intent_predictions.csv
After we have our predictions in BigQuery, we can connect it to Data Studio and create a super valuable report to helps us visualize which intentions have the greatest opportunity.
After I connected Data Studio to our BigQuery dataset, I created a new field: CTR by dividing impressions and clicks.
As we are grouping queries by their predicted intentions, we can find content opportunities where we have intentions with high search impressions and low number of clicks. Those are the lighter blue squares.
How the Learning Process Works
I want to cover this last foundational topic to expand the encoder/decoder idea I briefly covered in part one.
Let’s take a look at the charts below that help us visualize the training process.
But, what exactly is happening here? How it the machine learning model able to perform the tasks we are training on?
The first chart shows how the error/loss decreases which each training steps (blue line).
But, more importantly, the error also decreases when the model is tested on “unseen” data. Then, comes a point where no further improvements take place.
I like to think about this training process as removing noise/errors from the input by trial and error, until you are left with what is essential for the task at hand.
There is some random searching involved to learn what to remove and what to keep, but as the ideal output/behavior is known, the random search can be super selective and efficient.
Let’s say again that you want to drive from NYC to Washington and all the roads are covered with snow. The encoder, in this case, would play the role of a snowblower truck with the task of carving out a road for you.
It has the GPS coordinates of the destination and can use it to tell how far or close it is, but needs to figure out how to get there by intelligent trial and error. The decoder would be our car following the roads created by the snowblower for this trip.
If the snowblower moves too far south, it can tell it is going in the wrong direction because it is getting farther from the final GPS destination.
A Note on Overfitting
After the snowblower is done, it is tempting to just memorize all the turns required to get there, but that would make our trip inflexible in the case we need to take detours and have no roads carved out for that.
So, memorizing is not good and is called overfitting in deep learning terms. Ideally, the snowblower would carve out more than one way to get to our destination.
In other words, we need as generalized routes as possible.
We accomplish this by holding out data during the training process.
We use testing and validation datasets to keep our models as generic as possible.
A Note on Tensorflow for BigQuery
I tried to run our predictions directly from BigQuery, but hit a roadblock when I tried to import our trained model.
!bq query --use_legacy_sql=false "CREATE MODEL bert_intent_questions.BERT OPTIONS (MODEL_TYPE='TENSORFLOW', MODEL_PATH='gs://bert_intent_questions/*')"
BigQuery complained about the size of the model exceeded their limit.
Waiting on bqjob_r594b9ea2b1b7fe62_0000016c34e8b072_1 ... (0s) Current status: DONE BigQuery error in query operation: Error processing job 'sturdy-now-248018:bqjob_r594b9ea2b1b7fe62_0000016c34e8b072_1': Error while reading data, error message: Total TensorFlow data size exceeds max allowed size; Total size is at least: 1319235047; Max allowed size is: 268435456
I reached out to their support and they offered some suggestions. I’m sharing them here in case someone finds the time to test them out.
Resources to Learn More
When I started taking deep learning classes, I didn’t see BERT or any of the latest state of the art neural network architectures.
However, the foundation I received, has helped me pick up new concepts and ideas fairly quickly. One of the articles that I found most useful to learn the new advances was this one: The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning).
I also found this one very useful: Paper Dissected: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” Explained and this other one from the same publication: Paper Dissected: “XLNet: Generalized Autoregressive Pretraining for Language Understanding” Explained.
BERT has recently been beaten by a new model called XLNet. I am hoping to cover it in a future article when it becomes available in Ludwig.
The Python momentum in the SEO community continues to grow. Here are some examples:
Paul Shapiro brought Python to the MozCon stage earlier this month. He shared the scripts he discussed during his talk.
I was pleasantly surprised when I shared a code snippet in Twitter and Tyler Reardon, a fellow SEO, quickly spotted a bug I missed because he created a similar code independently.
Big shoutout to @TylerReardon who spotted a bug in my code pretty quickly! It is already fixed https://t.co/gvypIOVuBp
I thought I was comparing the IP from the log and the one from the DNS, but I was comparing the log IP twice! 😅
— Hamlet 🇩🇴 (@hamletbatista) July 25, 2019
Michael Weber shared his awesome ranking predictor that uses a multi-layer perceptron classifier and Antoine Eripret shared a super valuable robot.txt change monitor!
I should also mention that JR contributed a very useful Python piece for opensource.com that shows practical uses cases of the Google Natural Language API.
All screenshots taken by author, July 2019
How to increase organic traffic: 14 Practical tips
- Organic traffic is the best shareware way to attract visitors who already want to make a deal. You should conduct a good SEO analysis and take care of the quality of your site to increase it.
- You can get more organic visits if you develop a strategy, eliminate technical errors of your site, use its good mobile version, make correct external and internal optimization, optimize URLs, update the site content regularly, develop a blog with unique content, analyze competitors, and promote your site through social networks, press releases, newsjacking, emails, and messengers.
- The correct implementation of the above-mentioned tasks will provide a long-lasting result for you.
Ordinary users trust SERP more than advertising and links marked as “ads”. Correctly performed optimization, troubleshooting and the use of promising channels will quickly bring a good result.
14 Practical tips to increase organic traffic
You can get organic visits using a set of working methods, tools, and recommendations. The best 14 ways are summarized in the review below.
1. Developing a strategy to increase organic traffic
The solution to any problem begins with the development of a strategy to leave room for financial and time planning. Strategy development is carried out in stages:
- You should set goals and objectives at first
- Then, identify weaknesses using a comprehensive site audit and analyze the competitive environment
- As the next step, you can eliminate identified errors and problems
- Also, you need to select priority methods to attract organic traffic and increase the position of the site in SERP
- Then, map the work and budget, prepare a content plan (golden rule for a content plan – 60/30/10 – third-party content 60%, unique content 30%, advertising 10%)
- If you need, you should select specialists and form technical tasks
- As the last step, perform tasks, analyze results using Google Analytics
Attracting organic traffic is a rather lengthy process that distinguishes it from contextual advertising. Ads start working immediately after launch. However, organic traffic will work for a long time without any additional investments.
2. Elimination of technical errors of the site
You can identify and eliminate technical errors of the site using the following methods:
- Surface self-check
- Comprehensive site audit with the help of professionals
- Usage of paid and free services. It’s an optimal solution for those who want to get a quick result with minimal financial investments. Services allow you to identify SEO errors and ones in other key positions. The best of them are Semrush, Ahrefs, and SEOptimizer
You should eliminate identified errors by yourself or with the help of professionals. It’ll make your website more attractive to users and search engines. After the site audit, you can get rid of duplicates, speed up the download of the site, identify affiliates, and solve other problems.
3. Mobile version of the site
More than 65% of internet users prefer to select and order products from mobile devices. You can’t lose such a huge audience and should take care of your site mobile version. It allows you to increase target audience coverage several times, increase sales and subscriptions. You can create a separate mobile version or use an adaptive design of your main site. In the last case, there will be an automatic adjustment to the screens of different devices.
4. Correct external and internal website optimization
It’s aimed to obtain links from third-party sites. External links that aren’t protected from indexing transfer a part of a donor weight to the acceptor site. When working on building an external link mass, you should consider:
- Donor site trust, spam level of backlinks. The first index should be high, the second one – low
- Rules of posting links. It’s recommended to surround them with content
- Donor site topics (should be related)
- Frequency of placement. You should increase the link juice gradually. It’s especially important for young sites that have a low level of trust in search engines. A sharp increase can lead to the pessimization of the acceptor site
It helps to make the site relevant to those queries you carry out the promotion. It consists of:
- Keyword list collection
- Keywords grouping
- Preparing and publishing content optimized with LSI and SEO
- Formation and optimization of meta tags: title and description, headings and subheadings, image tags
- Creation of robots.txt files and sitemap.xml (if it’s not generated automatically)
- Interlinking and other related work
It’s important to ensure that meta tags and content are supplemented with relevant keywords but are not spammed. Otherwise, you can fall under search engine filters.
5. URLs optimization
You can complement URLs with keywords. It makes them more understandable for website visitors. When optimizing URLs, it’s recommended:
- Use from three to five relevant words, longer links will be cut off in the SERP
- Use hyphens rather than underscores
- Take into account spam indicators. Keywords from URLs are added to the overall frequency on the page
Optimized URLs look more attractive so visitors click on them more likely.
6. Regular content updates
Content updates are a rather important factor which influences on ranking. We speak about updating previously posted materials as well as publishing new ones. It helps to keep pace, increase credibility, have a positive effect on indexing.
You should carry out updates regularly following the content plan. It allows you to work with new keywords and attract organic traffic from search engines.
A blog is a valuable resource necessary for attracting organic traffic not only for commercial but also for information requests. We used to carefully choose the goods before the deal. A blog with interesting and relevant content increases chances that after reading the review, the visitor will perform the target action.
On the blog, you can publish news, information materials, as well as infographics, video reviews – everything that can attract attention and encourage visitors to make a deal. When writing articles for a blog, you can use the links to the catalog. So that the client can immediately buy the product they like without spending time searching the site.
8. Expertise and uniqueness of the content
Usage of non-unique content is a deliberately losing thing. As a result of it, you can get a claim from the copyright holder. Therefore, it’s necessary to create and optimize your content that will provide organic visits. This rule applies not only to texts but also to photos, pictures, videos. In the case of publishing someone else’s content, you must obtain the permission of the copyright holder and give a link to the source.
There is one more caveat – expertise, which plays an important role in ranking issues. Search engines don’t focus on quality optimization but on the semantic uniqueness and benefit that the content of the site can bring to the visitor. The content should answer the question that the user enters in the search bar. If the materials contain outdated, uninteresting, or knowingly untruthful data, the visitor will leave the site. An increasing number of failures will hurt ranking.
9. Promotion in social networks
Social networks are an effective tool with which you can manage opinions and drive traffic to your website. You can create a group for communication with potential customers and publish their announcements, information about promotions, discounts, updates of the assortment, and other content that encourages them to click on the link. Before starting the campaign on social networks, you need to analyze groups of your competitors, look at the situation with ordinary user’s eyes. If the posts are interesting, the subscribers will start to like and share them. This will provide additional free advertising and reach.
10. Competitive analysis
To be the first, you should know what is happening in the competition. To solve this problem, you need to use an audit which will help:
- Define a keywords cluster
- Keep abreast of all events, updates and new products introduced by competitors
- Form advertising budgets and solve other strategic tasks
For audit, you can use online services, questionnaires, secret shoppers, newsletter subscription, analysis of social networks groups, and other tools. You can use the information you’ve got to improve and optimize your website.
11. Press releases on third party resources
Regular publication of press releases on popular sites will help to solve several problems. The first one is traffic attraction, the second – external optimization. News sites visitors click the links willingly. The only negative aspect is that it’s difficult to place such publications. You should make the most of your efforts to get a positive result in outreach and lead generating.
12. Using newsjacking
Newsjacking is one of the varieties of guerrilla marketing that provides unobtrusive advertising. The latter is served against the background of an important event not being a priority. The plus is that users will often visit the site using both search queries and aggregators or news portals. The main rule is to link the offer with a really interesting and important event. Otherwise, the tool will not work.
13. Email marketing setup
From year to year, newsletters demonstrate their effectiveness. They allow you not only to communicate with customers but also to receive visits to the site. To configure the newsletter, you must have your contact base. To collect the latter, you need to place a simple registration or subscription form on the site consisting of a minimum number of lines. After that, you can establish communication with customers, notifying them of promotions, catalog updates, and other important events.
14. Mailing in messengers
Mailing in messengers is similar to emails. However, messages in Facebook Messenger, Snapchat, or WhatsApp have a higher percentage of opening. A smartphone is always near the person, such messages are more familiar and convenient. Therefore, you should not ignore the potential of this channel. Before starting such mailing, it’s necessary to ask the client whether he/she doesn’t mind receiving advertising materials. Otherwise, the sender (you) may be blocked.
To round up
Correct external and internal optimization, work in social networks and messengers, competitive analysis, technical errors eliminating, and usability improving is priority tasks to increase organic traffic. You can perform some tasks on your own. Other ones will have to be entrusted to professionals. The correct implementation of these tasks will provide a long-lasting result, an increase in organic traffic, sales, and an influx of hot customers.
Bing Maps API, Google and YouTube ads and targeted SEO
SEO2 weeks ago
How to effectively think of SEO at every stage
SEO2 weeks ago
Three tactics to support your efforts
Video Marketing1 week ago
Top 10 Most Common Video Mistakes to Avoid
SEO3 weeks ago
Five SEO content types to power and grow your business through 2020
SEO2 weeks ago
Top five B2B digital marketing tips during COVID-19 times
SEO2 weeks ago
Studying the anatomy of a successful high-conversion landing page
Video Marketing2 weeks ago
How Red Hat Created a Shorty-Nominated Podcast About Open Source Technology
SEO3 weeks ago
Eight great tips for internal site search optimization