interviewer: Welcome, can I get you coffee or anything? Do you need a break?

me: No, I've probably had too much coffee already!

interviewer: Great, great. And are you OK with writing code on the whiteboard?

me: It's the only way I code!

interviewer: ...

me: That was a joke.

interviewer: OK, so are you familiar with "fizz buzz"?

me: ...

interviewer: Is that a yes or a no?

me: It's more of a "I can't believe you're asking me that."

interviewer: OK, so I need you to print the numbers from 1 to 100, except that if the number is divisible by 3 print "fizz", if it's divisible by 5 print "buzz", and if it's divisible by 15 print "fizzbuzz".

me: I'm familiar with it.

interviewer: Great, we find that candidates who can't get this right don't do well here.

me: ...

interviewer: Here's a marker and an eraser.

me: [thinks for a couple of minutes]

interviewer: Do you need help getting started?

me: No, no, I'm good. So let's start with some standard imports:

import numpy as np import tensorflow as tf

interviewer: Um, you understand the problem is fizzbuzz, right?

me: Do I ever. So, now let's talk models. I'm thinking a simple multi-layer-perceptron with one hidden layer.

interviewer: Perceptron?

me: Or neural network, whatever you want to call it. We want the input to be a number, and the output to be the correct "fizzbuzz" representation of that number. In particular, we need to turn each input into a vector of "activations". One simple way would be to convert it to binary.

interviewer: Binary?

me: Yeah, you know, 0's and 1's? Something like:

def binary_encode ( i , num_digits ): return np . array ([ i >> d & 1 for d in range ( num_digits )])

interviewer: [stares at whiteboard for a minute]

me: And our output will be a one-hot encoding of the fizzbuzz representation of the number, where the first position indicates "print as-is", the second indicates "fizz", and so on:

def fizz_buzz_encode ( i ): if i % 15 == 0 : return np . array ([ 0 , 0 , 0 , 1 ]) elif i % 5 == 0 : return np . array ([ 0 , 0 , 1 , 0 ]) elif i % 3 == 0 : return np . array ([ 0 , 1 , 0 , 0 ]) else : return np . array ([ 1 , 0 , 0 , 0 ])

interviewer: OK, that's probably enough.

me: That's enough setup, you're exactly right. Now we need to generate some training data. It would be cheating to use the numbers 1 to 100 in our training data, so let's train it on all the remaining numbers up to 1024:

NUM_DIGITS = 10 trX = np . array ([ binary_encode ( i , NUM_DIGITS ) for i in range ( 101 , 2 ** NUM_DIGITS )]) trY = np . array ([ fizz_buzz_encode ( i ) for i in range ( 101 , 2 ** NUM_DIGITS )])

interviewer: ...

me: Now we need to set up our model in tensorflow. Off the top of my head I'm not sure how many hidden units to use, maybe 10?

interviewer: ...

me: Yeah, possibly 100 is better. We can always change it later.

NUM_HIDDEN = 100

We'll need an input variable with width NUM_DIGITS, and an output variable with width 4:

X = tf . placeholder ( "float" , [ None , NUM_DIGITS ]) Y = tf . placeholder ( "float" , [ None , 4 ])

interviewer: How far are you intending to take this?

me: Oh, just two layers deep -- one hidden layer and one output layer. Let's use randomly-initialized weights for our neurons:

def init_weights ( shape ): return tf . Variable ( tf . random_normal ( shape , stddev = 0.01 )) w_h = init_weights ([ NUM_DIGITS , NUM_HIDDEN ]) w_o = init_weights ([ NUM_HIDDEN , 4 ])

And we're ready to define the model. As I said before, one hidden layer, and let's use, I don't know, ReLU activation:

def model ( X , w_h , w_o ): h = tf . nn . relu ( tf . matmul ( X , w_h )) return tf . matmul ( h , w_o )

We can use softmax cross-entropy as our cost function and try to minimize it:

py_x = model ( X , w_h , w_o ) cost = tf . reduce_mean ( tf . nn . softmax_cross_entropy_with_logits ( py_x , Y )) train_op = tf . train . GradientDescentOptimizer ( 0.05 ) . minimize ( cost )

interviewer: ...

me: And, of course, the prediction will just be the largest output:

predict_op = tf . argmax ( py_x , 1 )

interviewer: Before you get too far astray, the problem you're supposed to be solving is to generate fizz buzz for the numbers from 1 to 100.

me: Oh, great point, the predict_op function will output a number from 0 to 3, but we want a "fizz buzz" output:

def fizz_buzz ( i , prediction ): return [ str ( i ), "fizz" , "buzz" , "fizzbuzz" ][ prediction ]

interviewer: ...

me: So now we're ready to train the model. Let's grab a tensorflow session and initialize the variables:

with tf . Session () as sess : tf . initialize_all_variables () . run ()

Now let's run, say, 1000 epochs of training?

interviewer: ...

me: Yeah, maybe that's not enough -- so let's do 10000 just to be safe.

And our training data are sequential, which I don't like, so let's shuffle them each iteration:

for epoch in range ( 10000 ): p = np . random . permutation ( range ( len ( trX ))) trX , trY = trX [ p ], trY [ p ]

And each epoch we'll train in batches of, I don't know, 128 inputs?

BATCH_SIZE = 128

So each training pass looks like

for start in range ( 0 , len ( trX ), BATCH_SIZE ): end = start + BATCH_SIZE sess . run ( train_op , feed_dict = { X : trX [ start : end ], Y : trY [ start : end ]})

and then we can print the accuracy on the training data, since why not?

print ( epoch , np . mean ( np . argmax ( trY , axis = 1 ) == sess . run ( predict_op , feed_dict = { X : trX , Y : trY })))

interviewer: Are you serious?

me: Yeah, I find it helpful to see how the training accuracy evolves.

interviewer: ...

me: So, once the model has been trained, it's fizz buzz time. Our input should just be the binary encoding of the numbers 1 to 100:

numbers = np . arange ( 1 , 101 ) teX = np . transpose ( binary_encode ( numbers , NUM_DIGITS ))

And then our output is just our fizz_buzz function applied to the model output:

teY = sess . run ( predict_op , feed_dict = { X : teX }) output = np . vectorize ( fizz_buzz )( numbers , teY ) print ( output )

interviewer: ...

me: And that should be your fizz buzz!

interviewer: Really, that's enough. We'll be in touch.

me: In touch, that sounds promising.

interviewer: ...

Postscript

I didn't get the job. So I tried actually running this (code on GitHub), and it turned out it got some of the outputs wrong! Thanks a lot, machine learning!

In [ 185 ]: output Out [ 185 ]: array ([ '1' , '2' , 'fizz' , '4' , 'buzz' , 'fizz' , '7' , '8' , 'fizz' , 'buzz' , '11' , 'fizz' , '13' , '14' , 'fizzbuzz' , '16' , '17' , 'fizz' , '19' , 'buzz' , '21' , '22' , '23' , 'fizz' , 'buzz' , '26' , 'fizz' , '28' , '29' , 'fizzbuzz' , '31' , 'fizz' , 'fizz' , '34' , 'buzz' , 'fizz' , '37' , '38' , 'fizz' , 'buzz' , '41' , '42' , '43' , '44' , 'fizzbuzz' , '46' , '47' , 'fizz' , '49' , 'buzz' , 'fizz' , '52' , 'fizz' , 'fizz' , 'buzz' , '56' , 'fizz' , '58' , '59' , 'fizzbuzz' , '61' , '62' , 'fizz' , '64' , 'buzz' , 'fizz' , '67' , '68' , '69' , 'buzz' , '71' , 'fizz' , '73' , '74' , 'fizzbuzz' , '76' , '77' , 'fizz' , '79' , 'buzz' , '81' , '82' , '83' , '84' , 'buzz' , '86' , '87' , '88' , '89' , 'fizzbuzz' , '91' , '92' , '93' , '94' , 'buzz' , 'fizz' , '97' , '98' , 'fizz' , 'fizz' ], dtype = '<U8' )

I guess maybe I should have used a deeper network.