We will use the following columns to make our model:

mother_age: Reported age of the mother when giving birth,

mother_married: Whether the mother was married when she gave birth,

mother_race: The race of the mother,

gestation_weeks: The number of weeks of the pregnancy,

weight_gain_pounds: How much weight the mother gained during pregnancy,

is_male: whether the child is male.

Step 2: Create the model

We can use the CREATE MODEL function to create our model. The only options we need to specify are the model type (linear_regression) and the target variable (weight_pounds).

We want to use linear regression because we are predicting a continuous quantity (i.e birth weight.) If we instead wanted to predict a category, we would use logistic regression.

The following code will create and train our model. It should take about 15 minutes to train.

# Provide name of model

CREATE OR REPLACE MODEL `bigquery_ml_example.simple_natality_model` # Specify options

OPTIONS

(model_type='linear_reg',

input_label_cols=['weight_pounds']) AS # Provide training data

SELECT

mother_age,

mother_married,

CAST(mother_race as STRING) as mother_race, # race is a category, not a number.

gestation_weeks,

weight_gain_pounds,

is_male,

weight_pounds

FROM

`bigquery-public-data.samples.natality`

WHERE

weight_pounds IS NOT NULL # Filter for rows containing data we want to predict.

The mother_race column is a category, not a number; so we cast it to a string. BigQuery ML will automatically one hot encode category data for us. This saves a lot of effort in data wrangling.

Step 3: Evaluate the model

We can see how well our model performed by using the ML.EVALUATE function. The function takes a model name, and a table. The table should have the same schema as the table used to create the model.