I had tinkered with Genetic Algorithms for a bit during college, working on a method to optimize stock portfolios using historical trends. Algorithms inspired from biology have always fascinated me with their elegance and intuitiveness. If you can relate an algorithm with something other than diagrams/numbers, there’s always a good chance you will understand and use it better. With concepts such as Swarm/Colony Optimization or Evolutionary Algorithms, the ease of understanding can be attributed to their roots in what we learn right in school. Of course, they aren’t always the right way to go about doing things- especially when there exists an efficient and mathematically sound way of reaching the goal. For example, if Lagrangians can get you there fast enough, you probably don’t need a Particle Optimizer. The main reason being that most biologically inspired algorithms come with no guarantee- given a good definition of ‘utility’ and time, they can provide a good or even great solution- but not necessarily the best one(as promised by analytical algorithms). However, if your solution space is too large and/or you don’t fully understand the nature of your utility function(or it depends on too many factors), its a good idea to try stochastic optimization methods like genetic algorithms.

If you are new to genetic algorithms, heres a good place to learn the basics. All GAs follow the same basic format of working. What changes is how you represent your candidate solutions (the chromosomes) and how you perform key operations on them (genetic operators– selection, crossover, mutation). Selection is usually governed by the fitness function, which is perhaps the most important part of your GA- defining how good a solution is, for the problem you are trying to solve. Usually, the easier it is for you to define your fitness function, the less likely it is that you need a genetic algorithm for your problem (ironically).

Anyways, I was thinking of how one could use a genetic algorithm as a recommendation system. The link between the two is clearer if you think of recommendation-sets as evolving instead of getting generated. Let me explain a little more…

1. Chromosome

Your individual chromosome would be a set of recommendations, such as [r1, r2, …, rk]. By analogy, each of the ri‘s would denote a gene. The length of the recommendation-sets could be fixed or variable, depending on what you start out with, and what your crossover/mutation operators are.

2. Fitness function

Dealing with recommendation-sets instead of individual recommendations would allow you to judge the ‘fitness’ of a set of recommendations as a whole. The fitness of a candidate set would typically be:

a. Directly proportional to the average relevance of a recommendation to the user. (Using similarity functions such as the Jaccard coefficient on vectorial representations of recos and users- how you do it is upto you).

b. Inversely proportional to how many times the user has seen one of those recommendations in the past, and how long before.

c. Directly proportional to how diverse the recommendations are. (Maybe use something similar to an entropy for this?)

3. Selection

Selection could be done using a simple Roulette wheel, or using better explore/exploit techniques such as Boltzmann selection or Sigma scaling. These techniques allow you to explore more of your domain in the earlier stages, when you don’t know much about what the ideal solution should be. As the GA iterations progress, they give greater weightage to better solutions in the selection for ‘mating’.

4. Crossover

Crossovers could be simple single-point or two-point. Since the order of recommendations does not matter, even uniform crossovers would do the trick. Depending on the way you implement this, you could have variable-length recommendation sets (But if you prefer to not have too huge or too small sets, make the fitness function correspondingly dependent on the length of your candidate).

5. Mutation

Simply speaking, this would mean replacing/adding/subtracting a random recommendation in the set. This could be tricky if the domain of your recommendations is too large. In such a case, you could use some heuristics, such as adding items from ones most frequently bought/used, those used by users most similar to the current user, etc.

6. When to perform the iterations?

Every time there is a sufficient change in the user’s profile or the environment(other users/more recommendation items being added), you could run a given number of iterations of the GA. OR, when you have exhausted all the sets generated in the last run of the GA, by showing them one by one. The advantage of this approach(and the whole idea of using GAs for recommendation systems) is that you could use the candidates generated in the last run, as the starting population of the next run. Usually, the user’s profile and environment in consideration wouldn’t be radically different from the last run, and hence the past ‘best’ candidates would be a good place to start the new evolution process.

Ofcourse, this doesn’t solve many of the problems currently plaguing recommendation systems, such as cold-start. However, using biological evolution as the blueprint to keep generating newer sets of recommendations could be an interesting way to go about things :-).