Blog

At Sisense, we continuously help people build, diagnose, and update their data. When approaching a table of data for the first time, it helps our understanding to see summarizing statistics.

By looking at such staples as the range, sum, and average, you can get a quick glance at the shape and distribution of the data before taking a deep dive.

Here we will demonstrate how we append this data to our SQL tables. ​

Understanding Blog Post Metrics

Whenever we write a blog post, we want to understand if readers are visiting our homepage and signing up for a trial. We have tables blog_pings, homepage_pings, and signup_pings that let us observe when users visit the homepage after reading a blog post.

We want to count visitors to the blog even if they did not visit the homepage. Since blog_pings includes everyone in homepage_pings, which in turn includes all of signup_pings, we use left join to merge these records without losing any of the rows.

We want to count visitors to the blog even if they did not visit the homepage. Since blog_pings includes everyone in homepage_pings, which in turn includes all of signup_pings, we use left join to merge these records without losing any of the rows.

select post , count(distinct blog_pings.cookie) as blog , count(distinct homepage_pings.cookie) as homepage , count(distinct signup_pings.cookie) as signups from blog_pings left join homepage_pings on blog_pings.cookie = homepage_pings.cookie left join signup_pings on blog_pings.cookie = signup_pings.cookie group by 1

Running this query gives us the counts of visitors who visited the blog page, the home page, and those who signed up for a trial.

This table makes it very easy to analyze the conversion rates of the most active and least active blog posts, but not how an average post performed just by scrolling through the table. We can add the average and standard deviation as headers to the table to guide the analyst when checking up on a blog post.

First, we bundle the previous query into a temporary table using with blog_signups as (…)

with blog_signups as ( select post , count(distinct blog_pings.cookie) as blog , count(distinct homepage_pings.cookie) as homepage , count(distinct signup_pings.cookie) as signups from blog_pings left join homepage_pings on blog_pings.cookie = homepage_pings.cookie left join signup_pings on blog_pings.cookie = signup_pings.cookie group by 1 )

Now we can pull statistics like the total, mean, and standard deviation from blog_signups.

select 'Total', sum(blog), sum(homepage), sum(signups) from blog_signups union select 'Average', avg(blog), avg(homepage), avg(signups) from blog_signups union select 'Standard Deviation', stddev(blog), stddev(homepage), stddev(signups) from blog_signups

Having these statistics is great, but it becomes tedious to check multiple data sources. We can address this by simply folding our statistics into the main table by unioning them together.

select 'Total', sum(blog), sum(homepage), sum(signups) from blog_signups union select 'Average', avg(blog), avg(homepage), avg(signups) from blog_signups union select 'Standard Deviation', stddev(blog), stddev(homepage), stddev(signups) from blog_signups union select * from blog_signups

This will make sorting the table difficult, since the Total, Average, and Standard Deviation are on either ends of the range of the data. To help us in sorting the data, we can add an index column that we will use as a primary sorting key. First, we add an index to blog_signups:

with blog_signups as ( select 1 as index , post , count(distinct blog_pings.cookie) as blog , count(distinct homepage_pings.cookie) as homepage , count(distinct signup_pings.cookie) as signups from blog_pings left join homepage_pings on blog_pings.cookie = homepage_pings.cookie left join signup_pings on blog_pings.cookie = signup_pings.cookie group by 1, 2 )

Then we index the statistics in our order preference:

select 4, 'Total', sum(blog), sum(homepage), sum(signups) from blog_signups union select 3, 'Average', avg(blog), avg(homepage), avg(signups) from blog_signups union select 2, 'Standard Deviation', stddev(blog), stddev(homepage), stddev(signups) from blog_signups

Now we wrap this entire query into another with block which we call signup_statistics:

with signup_statistics as ( with blog_signups as (...) select 4, 'Total', sum(blog), sum(homepage), sum(signups) from blog_signups union select 3, 'Average', avg(blog), avg(homepage), avg(signups) from blog_signups union select 2, 'Standard Deviation', stddev(blog), stddev(homepage), stddev(signups) from blog_signups union select * from blog_signups ) select post , blog , homepage , signups from signup_statistics order by index desc , signups desc

And there we have it! Now whenever an author wants to check up on their post, they will be able to more easily understand how their post fared against the distribution.