Learn more about Google’s serverless, highly scalable and lightning fast Data Warehouse here : BigQuery.

In this article we will be looking at how to access and query data stored in BigQuery using R and Python.

Using R

We will be using the bigrquery package created by Hadley Wickham which provides a very simple and easy to use interface to Google’s BigQuery API. Go ahead and install the package if you haven’t already.

#Install and load packages

install.packages("bigrquery")

library(bigrquery)

Before you can start querying, you need to authorize bigrquery so that it can access the BigQuery projects. The easiest way to do this is to use the bq_auth() function which opens up a browser where you can sign-in to your google account and grant permissions. These credentials are cached in the following folder ~/.R/gargle/gargle-oauth below your home folder by default so that you don’t have to do it everytime.

The downside to using this method is that it requires user-interaction to perform the authentication. If you need to authenticate without user interaction, you need to create a service account on GCP and use a service token to authenticate. This method is more preferable especially if you are trying to build applications or run workloads on virtual machines.

Creating a Service Account on GCP.

Login to your Google Cloud Console and navigate to IAM and admin.

2. Go to Service Accounts from the navigation menu on the left and click on “Create Service Account”

3. Fill in the “Service Account Name” and “Description”. Click on Create.

4. For the Service account to be able to access the bigquery datasets, you need to grant the following permissions. Grant ‘BigQuery Admin’ role and click “Continue”. Optionally you can grant your personal account access to manage this service account.

5. Click on “Create Key” and select “JSON” and then click “Create”. The Service Account Key must now be downloaded to your local disk which can be used to authenticate. Finally click “Done”.

Now that we have the service token let’s go back to R and use this to authenticate. You use the same function as above but point it to the json file that has the credentials.

#Authentication

bq_auth(path = "/path/to/your/service-account.json")

Now let’s put everything together and see how to query a dataset.

#Install and load packages

install.packages("bigrquery")

library(bigrquery) #Authenticate

bq_auth(path = "/path/to/your/service-account.json") #Querying

project = "your project name"

sql = '

SELECT year,count(*)

FROM `bigquery-public-data.samples.natality`

GROUP BY year

ORDER BY year

' query_res <- bq_table_download(bq_project_query(project,sql))

query_res

Using Python

Google has Client Libraries for many of their services and we will be using the official bigquery client library for python. Install the following packages in python if you don’t have them already, you can even install them directly from a Jupyter notebook as below.

#Install and load packages

%pip install google-cloud

%pip install google-cloud-bigquery from google.cloud import bigquery

from google.oauth2 import service_account #Needed for authentication

We will be using the same JSON file which is the service token containing the credentials that we obtained as above to authenticate.

Now let’s put everything together and see how to query a dataset.

#Install and load packages

%pip install google-cloud

%pip install google-cloud-bigquery

%pip install pandas from google.cloud import bigquery

from google.oauth2 import service_account #Needed for authentication

import pandas as pd #Authentication

key_path = "/path/to/your/service-account.json"

key_path,

scopes=["

) credentials = service_account.Credentials.from_service_account_file(key_path,scopes=[" https://www.googleapis.com/auth/cloud-platform "], #Querying

sql = """

SELECT year,count(*) as num_births

FROM `bigquery-public-data.samples.natality`

GROUP BY year

ORDER BY year LIMIT 10

""" client = bigquery.Client(credentials=credentials,

project=credentials.project_id)

query_res = client.query(sql) query_res_df = query_res.result().to_dataframe()

query_res_df.head(10)

And that’s it, Happy Querying !