Today was playing with a report in PowerBI and I got this idea of exporting data to BigQuery from PowerQuery, let me tell you something, it is very easy and it works rather well, PowerQuery is an amazing technology ( and it is free).

in PowerBI,you can export from R or Python visuals but there are a limitation of 150K rows, but if you use PowerQuery, there is no limitation ( I tried with a table of 23 Millions records and it works)

here is the code using Python, but you can use R

import pandas as pd

import os

from google.cloud import bigquery

dataset['SETTLEMENTDATE']=pd.to_datetime(dataset['SETTLEMENTDATE'])

dataset['INITIALMW']=pd.to_numeric(dataset['INITIALMW'])

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "C:/BigQuery/test-990c2f64d86d.json"

client = bigquery.Client()

dataset_ref = client.dataset('work')

table_ref = dataset_ref.table('test')

job_config = bigquery.LoadJobConfig()

job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE

job_config.schema = [

bigquery.SchemaField("SETTLEMENTDATE", "TIMESTAMP"),

bigquery.SchemaField("DUID", "STRING"),

bigquery.SchemaField("INITIALMW", "FLOAT"),

bigquery.SchemaField("UNIT", "STRING")]

job = client.load_table_from_dataframe(dataset, table_ref, job_config=job_config)

job.result() # Waits for table load to complete.

interesting after the step in Python we get a table, simply expand it

here is the total rows of the table in PowerBI

the results in BigQuery

ok, PowerQuery flow can execute many times, it is a black magic knowledge that’s only a handful of people knows, but in this cases, it does not matter, the BigQuery job truncate the tables every time, so there is no risk of data duplication.

probably you may ask why do that if there are a lot of data preparation tools that natively support BigQuery, based on my own experience, most of my data sources are Excel files and PowerQuery is just very powerful and versatile specially if you deal with “dirty” format.