by mish4 in Computer Vision, Machine Learning / Pattern Recognition Tags: detection, foreground background modeling, gaussian mixture models, mixture model, stauffer grimson

Recently I have spent a large chunk of my time implementing the adaptive mixture model paper written by Stauffer and Grimson from 1998. It is an old paper which introduces the idea of modeling the value of a pixel over time with K Gaussian components. You would expect that most of the time a pixel has an intensity corresponding to the background scene, and some of the time the pixel’s intensity jumps indicating a foreground object. If we can learn the background distribution quickly then we can treat pixels that fall inside the background distribution as such and those that don’t fall inside are classified as foreground. When I read this paper the first time I thought it was great, it seemed adaptive enough to deal with ‘real-life’ challenging data. I was optimistic and decided to implement this algorithm in Matlab which shouldn’t take that long right? It was after working on the implementation that I realized I actually hated the paper, and hated implementing the algorithm.

See the pain begins when you realize just how many questions are unanswered by the paper. You realize that the algorithm depends on many parameters and heuristics! In hopes that I missed something I would constantly go back and re-read the paper over and over. I didn’t find what I was looking for. The paper did little to address how algorithm parameters are chosen. It didn’t discuss how sensitive the algorithm was to the parameters. It didn’t go into how such an algorithm should be initialized and whether this is important. The algorithm description itself had some ambiguity, for example when you update the components of your mixture based on a new pixel value do you only update one component or all that “match” (defined in the paper as components for which your pixel is within 2.5 sigmas). When replacing a component how do we decide which one to replace? Do we just find the one with the lowest weight, or perhaps the lowest ratio of weight to sigma. When it came to implementation choices I just went with the flow and picked what made sense to me. This parameter value seems fine, this heuristic will probably work, and these data structures are probably okay. When I finally finished the implementation (which took longer than I thought) it was painfully slow at 1-2 minutes per frame. Worse, I didn’t even know if it was correct. In fact testing whether the implementation was correct seemed impossible as the performance of the algorithm was so dependent on the data and parameters! My friend Ankur mentioned I should use a synthetic data set for testing. It hit me that this algorithm was indeed complicated! Even on my simple synthetic data set I wasn’t getting the results I had expected.

I knew that the parameters of the algorithm had to be chosen carefully but how. Spending some time on Google I stumbled upon a survey paper which looks at many other papers which try and model the background with mixtures of Gaussians. There were almost 200 references in this survey paper and many of the surveyed papers were ones describing how to choose parameters for this base algorithm. Holy crap I thought, there are a million ways to improve this algorithm. What did I get myself into. This led me to read several other papers on choosing the learning rates in the algorithm. Maybe with a few of the improvements suggested by these other papers the algorithm could finally yield decent detection results; not ones ridden with false detections.

I rewrote my Matlab implementation to avoid loops as much as possible. I learned to vectorize everything I could, and perform my operations on a multidimensional array level as opposed to the element by element level. I used reshape to go between matrix and vector forms. I used sub2ind religiously to go from multidimensional array indices to single vector indices. My implementation was now at 4-5 seconds per frame much much better. But choosing parameters was still difficult.

One of the difficulties with learning parameters is you don’t want to learn too quickly. If you quickly learn your background distribution you may find that the covariance of that component quickly becomes very small. This then spawns a new component to be introduced which is close to the original component and also models the background. It gains support relatively quickly but because the original background component is heavily weighted this component is treated as foreground. So to avoid this you can try learning slowly, but this ends up backfiring when your initial frame has a foreground object in it. If you learn too slowly it will take too many frames before your foreground object is regarded as foreground and not background. It’s difficult to choose the starting point of the learning rate. Another problem is when a new foreground object appears it can gain support pretty quickly taking away support from the background component. This can cause the foreground component to be treated as part of the background because it has gained enough support. To avoid this we could make the learning rate really small so that foreground components can’t gain support too quickly but again this means we will struggle to adapt quick enough in other situations. My intuition is that if you can get a good enough initialization of the background components and you place a large initial weight on them, then you can get away with adapting slowly as long as your data’s background doesn’t change quickly. To a large extent making this algorithm work is tough! That also explains why there are 100’s of papers and a survey paper on the subject! I still have much to figure out.

Later on, I will likely post my Matlab implementation. I have found a way to get perfect results on the synthetic data set I created using the strategy above. I hope to try a similar strategy on my research data set. I guess the moral of the story is paper’s are a double edged sword, and the implementation can be a real pain in the ass. Getting something to work on your data can be much much harder than it seems at first. It is taking me a lot of patience to deal with crappy results after putting in a fair amount of effort. If anything exciting happens I’ll update this post.

So long for now.