This suggests using the training data to compute average

darknesses for each digit, . When presented with a

new image, we compute how dark the image is, and then guess

that it's whichever digit has the closest average darkness. This

is a simple procedure, and is easy to code up, so I won't

explicitly write out the code - if you're interested it's in the

GitHub repository . But it's a big improvement over random

guessing, getting of the test images correct, i.e.,

percent accuracy.

It's not difficult to find other ideas which achieve accuracies in

the to percent range. If you work a bit harder you can get

up over percent. But to get much higher accuracies it helps

to use established machine learning algorithms. Let's try using

one of the best known algorithms, the support vector machine

or SVM . If you're not familiar with SVMs, not to worry, we're

not going to need to understand the details of how SVMs work.

Instead, we'll use a Python library called scikit-learn , which

provides a simple Python interface to a fast C-based library for

SVMs known as LIBSVM .

If we run scikit-learn's SVM classifier using the default

settings, then it gets 9,435 of 10,000 test images correct. (The

code is available here .) That's a big improvement over our

naive approach of classifying an image based on how dark it is.

Indeed, it means that the SVM is performing roughly as well as

our neural networks, just a little worse. In later chapters we'll

introduce new techniques that enable us to improve our neural

networks so that they perform much better than the SVM.

That's not the end of the story, however. The 9,435 of 10,000

result is for scikit-learn's default settings for SVMs. SVMs have

a number of tunable parameters, and it's possible to search for

parameters which improve this out-of-the-box performance. I

won't explicitly do this search, but instead refer you to this blog

post by Andreas Mueller if you'd like to know more. Mueller

shows that with some work optimizing the SVM's parameters

it's possible to get the performance up above 98.5 percent