Can we run a static code analysis tool for JavaScript inside BigQuery? Yes we can.

This is a BigQuery query that runs JSHint, and reports on the most common errors in a sample of all open source JavaScript code in GitHub:

#set UDF Source URI option to "gs://fh-bigquery/js/jshint-2.5.11.js" SELECT x error, COUNT(*) files_affected

FROM js(

(

SELECT content, sample_path, sample_repo_name

FROM [fh-bigquery:github_extracts.contents_js]

WHERE LENGTH(content) BETWEEN 1000 AND 1800

AND ABS(HASH(id))%1000=0 # sampling

),

content, sample_path, sample_repo_name,

"[

{name: 'x', type:'string'},

{name: 'sample_path', type:'string'},

{name: 'sample_repo_name', type:'string'},

{name: 'content', type:'string'}]",

"function(r, emit) {

JSHINT(r.content, {'maxdepth':2});

// data = JSHINT.data();

errors = JSHINT.errors;

set_errors=new Set(errors.map(

function(x) {

if(x && 'raw' in x) {return x.raw}}));

set_errors.forEach(function(x) {

if(!x) {return;}

emit({

x: x,

sample_repo_name: r.sample_repo_name,

sample_path: r.sample_path,

});

});

}")

GROUP BY 1

ORDER BY 2 DESC

LIMIT 100

7.4s elapsed, 103 GB processed

Some notes:

This is some heavy weight JavaScript code — we are running a static JavaScript code analyzer inside BigQuery — and it works. That’s pretty cool.

I’m running this code over a sample of all JS files (see query for current filters). There’s a lot that you can do with BigQuery and SQL, and as we push the boundaries some code will run better if we work over smaller datasets. In the meantime it would be nice if there was a lighter weight JSHint equivalent.

I’m using JSHint 2.5.11 as newer versions fail. Ping me if you find out how to solve this.

The above query does not following the official BigQuery UDF supported syntax. See the docs for the correct format, but I’m using this style as it’s easier to share this way.

More resources for GitHub on BigQuery: https://medium.com/@hoffa/b3576fd2b150