A brief review of Martin Ford's book that features interviews with 23 of the most well-known and brightest minds working on AI.

In this blog post, I am (briefly) reviewing Christoph Molnar's *Interpretable Machine Learning Book*. Then, I am writing about two classic generalized linear models, linear and logistic regression. Mainly, this blog post explains the relationship between feature weights and predictions and demonstrates how to construct confidence intervals via Python.

Not too long ago, in the Summer of 2018, I was super excited to join the Department of Statistics at the University of Wisconsin-Madison after obtaining my Ph.D. after ~5 long and productive years. Now, two semesters later after finals' week, I finally found some quiet days to look back on what's happened since then. In this post, I am sharing a short reflection as well as a some of the exciting projects my students were working on.

I thought that it would be nice to have short and concise summaries of recent projects handy, to share them with a more general audience, including colleagues and students. So, I challenged myself to use fewer than 1000 words without getting distracted by the nitty-gritty details and technical jargon. In this post, I mainly cover some of my recent research in collaboration with the [iPRoBe Lab](http://iprobe.cse.msu.edu) that falls under the broad category of developing approaches to hide specific information in face images. The research discussed in this post is about "maximizing privacy while preserving utility."

This final article in the series *Model evaluation, model selection, and algorithm selection in machine learning* presents overviews of several statistical hypothesis testing approaches, with applications to machine learning model and algorithm comparisons. This includes statistical tests based on target predictions for independent test sets (the downsides of using a single test set for model comparisons was discussed in previous articles) as well as methods for algorithm comparisons by fitting and evaluating models via cross-validation. Lastly, this article will introduce *nested cross-validation*, which has become a common and recommended a method of choice for algorithm comparisons for small to moderately-sized datasets.

Machine learning has become a central part of our life -- as consumers, customers, and hopefully as researchers and practitioners! Whether we are applying predictive modeling techniques to our research or business problems, I believe we have one thing in common : We want to make good predictions! Fitting a model to our training data is one thing, but how do we know that it generalizes well to unseen data? How do we know that it doesn't simply memorize the data we fed it and fails to make good predictions on future samples, samples that it hasn't seen before? And how do we select a good model in the first place? Maybe a different learning algorithm could be better-suited for the problem at hand? Model evaluation is certainly not just the end point of our machine learning pipeline. Before we handle any data, we want to plan ahead and use techniques that are suited for our purposes. In this article, we will go over a selection of these techniques, and we will see how they fit into the bigger picture, a typical machine learning workflow.

In this second part of this series, we will look at some advanced techniques for model evaluation and techniques to estimate the uncertainty of our estimated model performance as well as its variance and stability. Then, in the next article, we will shift the focus onto another task that is one of the main pillar of successful, real-world machine learning applications -- Model Selection.

Almost every machine learning algorithm comes with a large number of settings that we, the machine learning researchers and practitioners, need to specify. These tuning knobs, the so-called hyperparameters, help us control the behavior of machine learning algorithms when optimizing for performance, finding the right balance between bias and variance. Hyperparameter tuning for performance optimization is an art in itself, and there are no hard-and-fast rules that guarantee best performance on a given dataset. In Part I and Part II, we saw different holdout and bootstrap techniques for estimating the generalization performance of a model. We learned about the bias-variance trade-off, and we computed the uncertainty of our estimates. In this third part, we will focus on different methods of cross-validation for model evaluation and model selection. We will use these cross-validation techniques to rank models from several hyperparameter configurations and estimate how well they generalize to independent datasets.

Here, I want to present a simple and conservative approach of implementing a weighted majority rule ensemble classifier in scikit-learn that yielded remarkably good results when I tried it in a kaggle competition. For me personally, kaggle competitions are just a nice way to try out and compare different approaches and ideas -- basically an opportunity to learn in a controlled environment with nice datasets.

Principal Component Analysis (PCA) is a simple yet popular and useful linear transformation technique that is used in numerous applications, such as stock market predictions, the analysis of gene expression data, and many more. In this tutorial, we will see that PCA is not just a “black box”, and we are going to unravel its internals in 3 basic steps.

This article offers a brief glimpse of the history and basic concepts of machine learning. We will take a look at the first algorithmically described neural network and the gradient descent algorithm in context of adaptive linear neurons, which will not only introduce the principles of machine learning but also serve as the basis for modern multilayer neural networks in future articles.

This has really been quite a journey for me lately. And regarding the frequently asked question “Why did you choose Python for Machine Learning?” I guess it is about time to write my script. In this article, I really don’t mean to tell you why you or anyone else should use Python. But read on if you are interested in my opinion.

It's been about time. I am happy to announce that "Python Machine Learning" was finally released today! Sure, I could just send an email around to all the people who were interested in this book. On the other hand, I could put down those 140 characters on Twitter (minus what it takes to insert a hyperlink) and be done with it. Even so, writing "Python Machine Learning" really was quite a journey for a few months, and I would like to sit down in my favorite coffeehouse once more to say a few words about this experience.

MusicMood – A Machine Learning Model for Classifying Music by Mood Based on Song Lyrics

In this article, I want to share my experience with a recent data mining project which probably was one of my most favorite hobby projects so far. It's all about building a classification model that can automatically predict the mood of music based on song lyrics.

Turn Your Twitter Timeline into a Word Cloud – using Python

Last week, I posted some visualizations in context of Happy Rock Song data mining project, and some people were curious about how I created the word clouds. Learn how to create YOUR personal Twitter Timeline!

Naive Bayes and Text Classification – Introduction and Theory

Naive Bayes classifiers, a family of classifiers that are based on the popular Bayes’ probability theorem, are known for creating simple yet well performing models, especially in the fields of document classification and disease prediction. In this first part of a series, we will take a look at the theory of naive Bayes classifiers and introduce the basic concepts of text classification. In following articles, we will implement those concepts to train a naive Bayes spam filter and apply naive Bayes to song classification based on lyrics.

Kernel tricks and nonlinear dimensionality reduction via RBF kernel PCA

The focus of this article is to briefly introduce the idea of kernel methods and to implement a Gaussian radius basis function (RBF) kernel that is used to perform nonlinear dimensionality reduction via KBF kernel principal component analysis (kPCA).

Predictive modeling, supervised machine learning, and pattern classification — the big picture

When I was working on my next pattern classification application, I realized that it might be worthwhile to take a step back and look at the big picture of pattern classification in order to put my previous topics into context and to provide and introduction for the future topics that are going to follow.

Linear Discriminant Analysis – Bit by Bit

I received a lot of positive feedback about the step-wise Principal Component Analysis (PCA) implementation. Thus, I decided to write a little follow-up about Linear Discriminant Analysis (LDA) — another useful linear transformation technique. Both LDA and PCA are commonly used dimensionality reduction techniques in statistics, pattern classification, and machine learning applications. By implementing the LDA step-by-step in Python, we will see and understand how it works, and we will compare it to a PCA to see how it differs.

Dixon's Q test for outlier identification – A questionable practice

I recently faced the impossible task to identify outliers in a dataset with very, very small sample sizes and Dixon's Q test caught my attention. Honestly, I am not a big fan of this statistical test, but since Dixon's Q-test is still quite popular in certain scientific fields (e.g., chemistry) that it is important to understand its principles in order to draw your own conclusion of the presented research data that you might stumble upon in research articles or scientific talks.

About Feature Scaling and Normalization – and the effect of standardization for machine learning algorithms

I received a couple of questions in response to my previous article (Entry point: Data) where people asked me why I used Z-score standardization as feature scaling method prior to the PCA. I added additional information to the original article, however, I thought that it might be worthwhile to write a few more lines about this important topic in a separate article.

Entry Point Data – Using Python's sci-packages to prepare data for Machine Learning tasks and other data analyses

In this short tutorial I want to provide a short overview of some of my favorite Python tools for common procedures as entry points for general pattern classification and machine learning tasks, and various other data analyses.

Molecular docking, estimating free energies of binding, and AutoDock's semi-empirical force field

Discussions and questions about methods, approaches, and tools for estimating (relative) binding free energies of protein-ligand complexes are quite popular, and even the simplest tools can be quite tricky to use. Here, I want to briefly summarize the idea of molecular docking, and give a short overview about how we can use AutoDock 4.2's hybrid approach for evaluating binding affinities.

An introduction to parallel programming using Python's multiprocessing module – using Python's multiprocessing module

The default Python interpreter was designed with simplicity in mind and has a thread-safe mechanism, the so-called "GIL" (Global Interpreter Lock). In order to prevent conflicts between threads, it executes only one statement at a time (so-called serial processing, or single-threading). In this introduction to Python's multiprocessing module, we will see how we can spawn multiple subprocesses to avoid some of the GIL's disadvantages and make best use of the multiple cores in our CPU.

Kernel density estimation via the Parzen-Rosenblatt window method – explained using Python

The Parzen-window method (also known as Parzen-Rosenblatt window method) is a widely used non-parametric approach to estimate a probability density function *p(**x**)* for a specific point *p(**x**)* from a sample *p(**x** n )* that doesn't require any knowledge or assumption about the underlying distribution.

Numeric matrix manipulation – The cheat sheet for MATLAB, Python NumPy, R, and Julia

At its core, this article is about a simple cheat sheet for basic operations on numeric matrices, which can be very useful if you working and experimenting with some of the most popular languages that are used for scientific computing, statistics, and data analysis.

The key differences between Python 2.7.x and Python 3.x with examples

Many beginning Python users are wondering with which version of Python they should start. My answer to this question is usually something along the lines 'just go with the version your favorite tutorial was written in, and check out the differences later on.'\ But what if you are starting a new project and have the choice to pick? I would say there is currently no 'right' or 'wrong' as long as both Python 2.7.x and Python 3.x support the libraries that you are planning to use. However, it is worthwhile to have a look at the major differences between those two most popular versions of Python to avoid common pitfalls when writing the code for either one of them, or if you are planning to port your project.

5 simple steps for converting Markdown documents into HTML and adding Python syntax highlighting

In this little tutorial, I want to show you in 5 simple steps how easy it is to add code syntax highlighting to your blog articles.

Creating a table of contents with internal links in IPython Notebooks and Markdown documents

Many people have asked me how I create the table of contents with internal links for my IPython Notebooks and Markdown documents on GitHub. Well, no (IPython) magic is involved, it is just a little bit of HTML, but I thought it might be worthwhile to write this little how-to tutorial.

A Beginner's Guide to Python's Namespaces, Scope Resolution, and the LEGB Rule

A short tutorial about Python's namespaces and the scope resolution for variable names using the LEGB-rule with little quiz-like exercises.

Diving deep into Python – the not-so-obvious language parts

Some while ago, I started to collect some of the not-so-obvious things I encountered when I was coding in Python. I thought that it was worthwhile sharing them and encourage you to take a brief look at the section-overview and maybe you'll find something that you do not already know - I can guarantee you that it'll likely save you some time at one or the other tricky debugging challenge.

Implementing a Principal Component Analysis (PCA) – in Python, step by step

In this article I want to explain how a Principal Component Analysis (PCA) works by implementing it in Python step by step. At the end we will compare the results to the more convenient Python PCA() classes that are available through the popular matplotlib and scipy libraries and discuss how they differ.

Installing Scientific Packages for Python3 on MacOS 10.9 Mavericks

I just went through some pain (again) when I wanted to install some of Python's scientific libraries on my second Mac. I summarized the setup and installation process for future reference.\ If you encounter any different or additional obstacles let me know, and please feel free to make any suggestions to improve this short walkthrough.

A thorough guide to SQLite database operations in Python

After I wrote the initial teaser article "SQLite - Working with large data sets in Python effectively" about how awesome SQLite databases are via sqlite3 in Python, I wanted to delve a little bit more into the SQLite syntax and provide you with some more hands-on examples.