

Google released TensorFlow (www.tensorflow.org), an open source machine learning library, last November which attracted huge attention in the field of AI. TensorFlow is also known as “Machine Learning for Everyone” since it is relatively easy to hands-on even for those who don’t have much experience in machine learning. Today we are excited to announce that TensorFlow is now available on Rescale’s platform. This means you can learn to create and train your machine learning models using TensorFlow with just a web browser. I’ll walk you through how in this blog post.

Let’s Start With a Simple Case

We’ll start from the first official TensorFlow tutorial: MNIST for ML beginners. It introduces what the MNIST is and how to model and train it with softmax regression, a basic machine learning method, in TensorFlow. Here we’ll be focusing on how to set the job up and run it on the Rescale platform.

You can create the python script in a local editor mnist_for_beginners.py:

# Load the MNIST data from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) # Define the model import tensorflow as tf x = tf.placeholder(tf.float32, [None, 784]) W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) y = tf.nn.softmax(tf.matmul(x, W) + b) # Train the model y_ = tf.placeholder(tf.float32, [None, 10]) cross_entropy = -tf.reduce_sum(y_*tf.log(y)) train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy) init = tf.initialize_all_variables() sess = tf.Session() sess.run(init) for i in range(1000): batch_xs, batch_ys = mnist.train.next_batch(100) sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) # Evaluating the model correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

The script above is just putting all the snippets together. Now, we need to run that on Rescale’s GPU hardware.

First, you need to create an account, if you still haven’t, click here to create one.

If you want to skip the hassle of setting up the job step-by-step, you can also click here to view the tutorial job and clone it into your own account.

After account registration, login to Rescale and click “+ New Job” button on the top left to create a new job.



Click “upload from this computer” and upload your python script to Rescale.



Click “Next” to go to the Software Settings page and choose TensorFlow from the software list. Currently, 0.71 is the only supported version on Rescale, so choose this version and type “python ./mnist_for_beginners.py” in the Command field. Select “Next” to go to the Hardware Settings page.



In Hardware Settings, choose core type Jade and select 4 cores. This job is not very compute intensive, so we choose the minimum valid number of cores. We can skip the post-processing for this example, and click “Submit” on the Review page to submit the job.





It will take 4 – 5 minutes to launch the server and 1 minute to run the job. When the job is running, you can use Rescale’s live tailing feature to monitor the files in the working directory.

After the job is finished, you can view the files from the results page. Let’s take a look at process_output.log which is the output from that python script we uploaded. At the third line from the bottom, we can verify that the accuracy is 91.45%.



A More Advanced Model

In the second TensorFlow tutorial, a more advanced model is built with a multilayer convolutional network to increase the accuracy to 99.32%.

To run this advanced model on Rescale, you can simply repeat the process of the first one and replace the python script with the new model from the tutorial. You can also view and clone an existing job from here.

Single GPU vs. Multiple GPU Performance Speedup Test

If you have more than one GPU on your machine, TensorFlow can utilize all of them for better performance. In this section, we are going to do a performance benchmark on a single K520 GPU machine vs. a 4 K520 GPUs machine and test performance speedups.

The CIFAR10 Convolutional Neural Network example is used as our benchmarking job. From the result below we can see that with 4 times the number of GPUs, the examples being processed per second are only 2.37 times the single GPU performance.



Work Ahead

TensorFlow has just released a new distributed version (v0.8) on 4/13/2016 which can distribute the workload across the GPUs on multiple machines. It would be very interesting to see its performance under a multi-node-multi-GPU cluster. Before that, we’ll make the process to launch a multi-node-multi-GPU cluster with TensorFlow support on Rescale as simple as possible.

Import this job to your account