Editor's note: Join Felipe Hoffa and Michael Manoochehri tomorrow, September 6th at 11:00AM PST on Google Developers Live where they'll discuss correlation with BigQuery.





Google BigQuery is designed to make it easy to analyze large amounts of data quickly. We are always looking into how to make BigQuery even more powerful, so today we'll introduce a feature that we couldn't wait to share with you: Pearson correlation

SELECT CORR(a.data, b.data) corr, a.room room, count(*) c FROM ( SELECT TIME(USEC_TO_TIMESTAMP(INTEGER(Timestamp / 60000000) * 60000000)) time, AVG(DATA) data, room FROM [io_sensor_data.moscone_io13] WHERE DATE(USEC_TO_TIMESTAMP(Timestamp- 8*60*60000000)) = '2013-05-16' AND sensortype='temperature' GROUP EACH BY time, room) a JOIN EACH ( SELECT TIME(USEC_TO_TIMESTAMP(INTEGER(Timestamp / 60000000) * 60000000)) time, AVG(data) data, room FROM [io_sensor_data.moscone_io13] WHERE DATE(USEC_TO_TIMESTAMP(Timestamp- 8*60*60000000)) = '2013-05-17' AND sensortype='temperature' GROUP EACH BY time, room) b ON a.time=b.time AND a.room = b.room GROUP EACH BY room HAVING corr IS NOT NULL AND c > 800 ORDER EACH BY corr DESC

corr room c 0.9387693711 sf desk 1st floor 1331 0.8488553811 chrome east 1418 0.8423597116 chrome hobbit 1372 0.8162574011 chrome west 1401 0.7696065852 chrome north 1374 ... ... ... -0.1048712561 Room 1 1390 -0.1508345595 keynote crowd 1358 -0.5467798237 android east 1402

SELECT CORR(a.data, b.data) corr, a.sensortype a_sensortype, b.sensortype b_sensortype, a.room room, count(*) c FROM ( SELECT TIME(USEC_TO_TIMESTAMP(INTEGER(Timestamp / 60000000) * 60000000)) time, AVG(DATA) data, room, sensortype FROM [io_sensor_data.moscone_io13] WHERE DATE(USEC_TO_TIMESTAMP(Timestamp- 8*60*60000000)) = '2013-05-16' AND sensortype != 'pressure' AND sensortype != 'altitude' AND room != 'None' GROUP EACH BY time, room, sensortype) a JOIN EACH ( SELECT TIME(USEC_TO_TIMESTAMP(INTEGER(Timestamp / 60000000) * 60000000)) time, AVG(data) data, room, sensortype FROM [io_sensor_data.moscone_io13] WHERE DATE(USEC_TO_TIMESTAMP(Timestamp- 8*60*60000000)) = '2013-05-16' AND sensortype != 'pressure' AND sensortype != 'altitude' AND room != 'None' GROUP EACH BY time, room, sensortype) b ON a.time=b.time AND a.room = b.room WHERE a.sensortype > b.sensortype GROUP EACH BY room, a_sensortype, b_sensortype HAVING corr IS NOT NULL AND c > 800 ORDER EACH BY corr DESC