Connect with us


How to Uncover Powerful Data Stories with Python



There are many emotional and powerful stories hidden in gobs of data just waiting to be found.

When these stories get told, they have the power to change careers, businesses, and whole groups of people.

Take Whirlpool, for example. They discovered a socio-economic problem that they could leverage with their brand.

They mined data to find a social cause to align with and discovered that every day 4,000 students drop out of school because they cannot afford to keep their clothes clean.

Whirlpool donated washers and dryers to the schools with the most at-risk children and tracked attendance.

The brand found 90% of these students had improved attendance rates and close to the same amount of children had improved class participation. The campaign was so effective that it won a number of awards, including the Cannes Lions Grand Prix for Creative Data Collection and Research.

While big brands can afford to hire award-winning creative agencies that can produce campaigns like this one, for most small businesses, that is out of the question.

One way to get into the spotlight is to find powerful stories that are yet to be discovered because of the gap that exists between marketers and data scientists.

I introduced a simple framework to do this which is around reframing already popular visualizations. The opportunity to reframe exists because marketers and developers operate in silos.

How to Uncover Powerful Data Stories with Python

As a marketer, when you handoff a data project to a developer, the first thing they do is remove the context.

The developer’s job is to generalize. But, when you get their results back, you need to add the context back so you can personalize.

Without the user context, the developer is unable to ask the right questions that can lead to making strong emotional connections.

In this article, I’m going to walk you over one example to show you how you can come up with powerful visualization and data stories by piggybacking on popular ones.

Here is our plan of action.

  • We are going to rebuild a popular data visualization from the subreddit Data is Beautiful.
  • We will collect data from public web pages (including some of it from moving charts).
  • We will reframe the visualization by asking different questions than the original author.

Our Reframed Visualization

How to Uncover Powerful Data Stories with Python

This is what our reframed visualization looks like. It shows the best Disney rides ranked by how much fun they would be for different age groups.

How to Uncover Powerful Data Stories with Python

This is the original one shared on Reddit. It shows the best Disney rides compared by how long they last and how long you need to wait in line.

Our Rebuilt Visualization

How to Uncover Powerful Data Stories with Python

Our first step is to rebuild the original visualization shared in the subreddit. The data scientist shared the data sources he used, but not the code.

This gives us a great opportunity to learn how to scrape data and visualize it in Python.

I will share some code snippets as usual, but you can find all the code in this Google Colab notebook.

Extracting Our Source Data

The original visualization contains two datasets, one with the duration of the rides and another with their average wait time.

Let’s first collect the ride durations from this page

We are going to complete these steps to extract the ride durations:

  1. Use Google Chrome to get an HTML DOM element selector with the ride durations.
  2. Use requests-html to extract the elements from the source page.
  3. Use a simple regular expression for duration numbers.

How to Uncover Powerful Data Stories with Python

Next, we need to collect the average wait times from this page

How to Uncover Powerful Data Stories with Python

This is a more challenging extraction because the data we want is in the moving charts.

We are going to complete these steps to extract the average wait times:

  1. Use requests-html to extract the JavaScript snippets from the source page.
  2. Use regular expressions to extract the data rows from the JavaScript code and also the ride name/title of the chart.
  3. Use a Jinja2 template to stich together a custom JavaScript function that returns the values we extracted in step 2.
  4. Use Py_mini_racer to execute the custom JavaScript function and get the data in Python format.

In order to convert the JavaScript data embedded in the charts to Python, we are going to perform a clever trick.

We are going to stitch together JavaScript functions using fragments of the code we are scraping.

We will use delimiters to define which fragments we will extract and use a Jinja2 template to work them together in a JavaScript function that runs correctly. The function will return a dictionary with the duration of our rides.

We will execute such functions using an obscure library called Py_mini_racer. That library runs JavaScript code from Python, returning Python objects that we can use.

I tried to use the PyV8 engine from Google, but couldn’t get it to work. It seems the project has been abandoned.

Now, we have the two datasets we need to produce our chart, but there is some processing we need to do first.

Processing Our Source Data

We need to combine the datasets we scraped, clean them up, calculate average, etc.

We are going to complete these steps:

  1. Split the extracted dataset into two Python dictionaries. One with the timestamps and one with the wait times per ride.
  2. Filter rides with fewer than 64 data points to keep the same number of data rows per ride.
  3. Calculate the average number of wait time per ride.
  4. Combine average wait time per ride and ride duration into one data frame.
  5. Eliminate rows with empty columns.

Here is what the final data frame looks like.

How to Uncover Powerful Data Stories with Python

Visualizing Our Data

We are almost in the finish line. In this step, we get to do the fun part! Visualizing the data frame we created.

We are going to complete these steps:

  1. Convert pandas data frame to a row-oriented dictionary. The X-axis is the Average Wait Time and the Y-axis is Ride Duration. The label is the Ride name.
  2. Use Plotly to generate a labeled scatter plot.

You need to manually drag the labels around to make them more legible.

How to Uncover Powerful Data Stories with Python

We finally have a visualization that closely resembles the original one we found on Reddit.

In our final step, we will produce an original visualization built from the same data we collected for this one.

Reframing Our Data

Rebuilding the original visualization took serious work and we are not producing anything new. We will address that in this final section.

The original visualization lacked an emotional hook. What if the rides are not fun for me?

We will pull an additional dataset: the ratings per ride by different age groups. This will help us visualize not sure the best rides that will have less wait time, but also which ones would be more fun for a particular age group.

We are going to complete these steps to reframe the original visualization:

  1. We want to know which age groups will have the most fun per ride.
  2. We will fetch the average ride ratings per age group from
  3. We will calculate an “Enjoyment Score” per ride and age group, which is the number of minutes per ride divided by average minutes of wait time.
  4. We will use Plotly to display a bar chart with the results.

How to Uncover Powerful Data Stories with Python

This is the page with our extra data.

We scrape it just like we pulled the ride durations.

Let’s summarize the original data frame using a new metric: an Enjoyment Score. 🙂

We define it as the average duration by wait time. The bigger the number, the more fun we should have as we have to wait less in line.

This is what the updated data frame looks like with our new Enjoyment Score metric.

How to Uncover Powerful Data Stories with Python

Now, let’s visualize it.

Finally, we get this beautiful and super valuable visualization.

How to Uncover Powerful Data Stories with Python

Resources & Community Projects

Last January, I received an email that kickstarted my “Python crusade”. Braintree had rejected RankSense’s application for a merchant account because they saw SEO as a high-risk category.

Right next to fortune tellers, mail-order brides and “get rich quick” schemes!

We had worked on the integration for three weeks. I felt really mad and embarrassed.

I had been enjoying my time in the data science and AI community last year. I was learning a lot of cool stuff and having fun.

I’ve been in the SEO space for probably too long. Sadly, my generation made the big mistake of letting speculation and magic tricks rule the perception of what SEO is about.

As a result of this, too many businesses have fallen prey to charlatans.

I had the choice to leave the SEO community or try to encourage the new generation to drive change so our community could be a fun and proud place to be.

I decided to stay, but I was afraid that trying to drive change by myself with minimal social presence would be impossible.

Fortunately, I watched this powerful video, wrote this sort of manifesto, and put my head down to write practical Python articles every month.

I’m excited to see that in less than six months, Python is everywhere in the SEO community and the momentum keeps growing.

I’m really excited about our community and the brilliant future ahead.

Now, let me continue to bring light to the awesome projects we continue to churn out each month. So, exciting to see more people joining the Python bandwagon. 🐍 🔥

Tyler shared a project to auto-generate meta descriptions using a Text Rank summarizer.

Hugo shared his first script that automates exporting SEMrush reports.

Jeffrey is working on an AI tool to break the writer’s block and open-sourced his Python backend.

Charly is working on a URL translator and classifier.

More Resources:

Image Credits

All screenshots taken by author, October 2019
In-post images: Provided by author

Continue Reading
Click to comment

You must be logged in to post a comment Login

Leave a Reply


Google Search Console unparsable structured data report data issue



Google has informed us that you may see a spike in errors in the unparsable structured data report within Google Search Console. This is a bug in the reporting system and you do not need to worry. The issue happened between January 13, 2020 and January 16, 2020.

The bug. Google wrote on the data anomalies page “Some users may see a spike in unparsable structured data errors. This was due to an internal misconfiguration that will be fixed soon, and can be ignored.” This was dated January 13, 2020 through January 16, 2020.

To be fixed. Google said they will fix the issue with the internal misconfiguration. It is, however, unclear if the data will be fixed or if you will see a spike in those errors between those date ranges.

Unparsable structured data report. The unparsable structured data report is accessible within Google Search Console by clicking here. The report aggregates structured data syntax errors. It puts all the parsing issues, including structured data syntax errors, that specifically prevented Google from identifying the feature type.

Why we care. The main thing here is that if you see a spike in errors in that report between January 13th and 16th, do not worry. It is a bug with the report and not an issue with your web site. Go back to the report in a few days and make sure that you do not see errors occurring after the 17th of January to be sure you have no technical issues.

About The Author

Barry Schwartz a Contributing Editor to Search Engine Land and a member of the programming team for SMX events. He owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on very advanced SEM topics. Barry’s personal blog is named Cartoon Barry and he can be followed on Twitter here.

Continue Reading


Google rolls out organic ‘Popular Products’ listings in mobile search results



Several years ago now, Google made the significant move to turn product search listings into an entirely paid product. Shopping campaigns, as they’re now called, have accounted for an increasing share of retail search budgets ever since. More recently, however, Google has been augmenting organic search results with product listings. It’s in a product search battle with Amazon, after all. On Thursday, the company announced the official rollout of “Popular Products” for apparel, shoe and similar searches in mobile results.

Organic product listings. Google has been experimenting with ways to surface product listings in organic search results, including Popular Products, which has been spotted for several months now. The section is powered by those organic feeds. Google says it identifies popular products from merchants to show them in a single spot, allowing users to filter by style, department and size type. The listings link to the retailers’ websites.

Popular Products is now live in Google mobile search results.

Why we care. This is part of a broader effort by Google to enhance product search experiences as it faces increasing competition from Amazon and other marketplaces as well as social platforms. Earlier this week, Google announced it has acquired Pointy, a hardware solution for capturing product and inventory data from small local merchants that can then be used in search results (and ads).

In the past few years, Google has also prompted retailers to adopt product schema markup on their sites by adding support for it in Search and Image search results. Then last spring, Google opened up Merchant Center to all retailers, regardless if they were running Shopping campaigns. Any retailer can submit their feed in real-time to Google to make their products eligible in search results.

Ad revenue was certainly at the heart of the shift to paid product listings, but prior to the move, product search on Google was often a terrible user experience with search listings often not matching what was on the landing page, from availability to pricing to even the very product. The move to a paid solution imposed quality standards that forced merchants to clean up their product data and provide it to Google in a structured manner in the form of product feeds through Google Merchant Center.

About The Author

Ginny Marvin is Third Door Media’s Editor-in-Chief, running the day to day editorial operations across all publications and overseeing paid media coverage. Ginny Marvin writes about paid digital advertising and analytics news and trends for Search Engine Land, Marketing Land and MarTech Today. With more than 15 years of marketing experience, Ginny has held both in-house and agency management positions. She can be found on Twitter as @ginnymarvin.

Continue Reading


Google buys Pointy to bring SMB store inventory online



Google is acquiring Irish startup Pointy, the companies announced Tuesday. Pointy has solved a problem that vexed startups for more than a decade: how to bring small, independent retailer inventory online.

The terms of the deal were not disclosed, but Pointy had raised less than $20 million so it probably wasn’t an expensive buy for Google. But it could have a significant impact for the future of product search.

Complements local inventory feeds. This acquisition will help Google offer more local inventory data in Google My Business (GMB) listings, knowledge panels and ads especially. It complements Google Shopping Campaigns’ local inventory ads, which are largely utilized by enterprise merchants and first launched in 2013.

Numerous companies over the last decade tried to solve the challenge of how to bring small business product inventory online. However, most failed because the majority of SMB retailers lack sophisticated inventory management systems that can generate product feeds and integrate with APIs.

Pointy POS hardware

Source: Pointy

How Pointy works. The company created a simple way to get local store inventory online and then showcase that inventory in organic search results or paid search ads. It utilizes a low-cost hardware device that attaches to a point-of-sale barcode scanner (see image above). It’s compatible with multiple other POS systems, including Square.

Once the device is installed, it captures every product sold by the merchant and then creates a digital record of products, which can be pushed out in paid or organic results. (The company also helps small retailers set up local inventory ads using the data.) Pointy also creates local inventory pages for each store and product, which are optimized and can rank for product searches.

Pointy doesn’t actually understand real-time inventory. Cleverly, however, it uses machine learning algorithms to estimate this by measuring product purchase frequency. The system assumes local retailers are going to stock frequently purchased items. That’s an oversimplification, but is essentially how it works.

Pointy said it a blog post that it “serve[s] local retailers in almost every city and every town in the U.S. and throughout Ireland.”

Why we care. The Pointy acquisition will likely help Google in at least three ways:

  • Provide more structured, local inventory data for consumers to find in Search.
  • Generate more advertising revenue over time from independent retailers.
  • Help Google more effectively compete with Amazon in product search.

Notwithstanding the fact that e-commerce outperformed traditional retail over the holidays, most people spend the bulk of their shopping budgets offline and prefer to shop locally. Indeed, Generation Z prefers to shop in stores, according to an A.T. Kearney survey.

One of the reasons that people shop at Amazon is because they can find products they’re looking for. They often don’t know where to find a particular product locally. But if more inventory data becomes available, the more people may opt to buy from local stores instead.

About The Author

Greg Sterling is a Contributing Editor at Search Engine Land. He writes about the connections between digital and offline commerce. He previously held leadership roles at LSA, The Kelsey Group and TechTV. Follow him Twitter or find him on LinkedIn.

Continue Reading


Copyright © 2019 Plolu.