In order to use BigQuery you need to create a project in the development console and the open one of the tables in the public dataset.

Without signing up to a free trial, or without signing up for a paid plan BigQuery will let you run a certain number of search queries per day, but even with that limitation in place, this is still quite useful.

I thought about finding subreddits where users feel more excited about the posts we're interested in than what they usually see on that subreddit. However, that caused the most relevant subs to fall beneath the average (probably because users of the most relevant sub are not so excited about posts on the most relevant topic in that subreddit).

Another way to rank them is: higher ranking for subreddits with low posts/subscribers ratio since the time a post stays on the first page is longer (because it doesn't get pushed down so fast by new posts) and it gets viewed more (since there are many subscribers).

SELECT F.*, CONCAT("http://reddit.com/r/", F.display_name) AS sub_url FROM ( -- delta_over_average measures how much above the subreddit average SELECT E.display_name AS display_name, E.all_posts_cnt AS all_posts_cnt, E.matched_posts_cnt AS matched_posts_cnt, E.matched_avg_vote AS matched_avg_vote, E.all_avg_vote AS all_avg_vote, E.subscribers AS subscribers, (E.all_posts_cnt/E.subscribers) AS posts_subscribers_ratio, (((E.matched_avg_vote - E.all_avg_vote)/E.all_avg_vote)*100.0) AS delta_over_average FROM ( SELECT subs.display_name AS display_name, matching.cnt AS matched_posts_cnt, subreddit_counts.cnt AS all_posts_cnt, matching.matched_avg_vote as matched_avg_vote, subreddit_counts.all_avg_vote as all_avg_vote, subs.subscribers as subscribers FROM ( -- find all subreddits that have posts matching the keywords -- and stats for each of those subreddits SELECT subreddit, COUNT (id) AS cnt, AVG (ups - downs) AS matched_avg_vote FROM [fh-bigquery:reddit_posts.full_corpus_201509] WHERE REGEXP_MATCH( LOWER (selftext), '(postgis|qgis|openstreetmap|spatial query|spatial index)' ) GROUP BY subreddit ) matching JOIN ( -- all subreddits and stats for each SELECT subreddit, COUNT (id) as cnt, AVG (ups - downs) AS all_avg_vote FROM [fh-bigquery:reddit_posts.full_corpus_201509] GROUP BY subreddit ) subreddit_counts ON matching.subreddit = subreddit_counts.subreddit JOIN ( -- all subreddits with a minimum number of subscribers SELECT * FROM [fh-bigquery:reddit.subreddits_201509] WHERE subscribers > 100 ) subs -- make sure the post is from that subreddit ON subreddit_counts.subreddit = subs.display_name ) E WHERE E.matched_posts_cnt > 10 ---- uncommenting this would only select subreddits where the matching posts ---- are voted better than the average post ---- for that subreddit -- AND E.matched_avg_vote >= E.all_avg_vote ) F WHERE F.posts_subscribers_ratio > 0 AND F.all_posts_cnt > 50 -- ORDER BY F.posts_subscribers_ratio ASC; ORDER BY F.delta_over_average DESC ;

Something more useful would also use some insight from Reddit's hot formula but I'm looking for a low cost solution (on top of this impediment, you would also need the vote data table in the reddit dataset, which is currently unavailable in the sample bigquery datasets, however.. a full data dump is available here).

There's two problems that this query does not address: