This will be a fast errata to the previous one. This time I will expand the oldest performance mantra:

The fastest code is the one that doesn’t execute. Second to that is the one that executes once

Last time I’ve forgot to mention one very important optimization. It was one of two steps that allowed me to go from 1530 to 484 seconds in the sample run.

Let’s look once more at the code:

public float Similarity ( IDictionary < int , float > otherVector ) { var num = this . Sum ( ingredient => { float value = 0 ; if ( otherVector . TryGetValue ( ingredient . Key , out value )) return ingredient . Value * value ; return 0 ; }); var denom = Lengh ()* otherVector . Length (); return num / denom ; }

and Length looking like this:

public float Length ( Vector a ){ return ( float ) Math . Sqrt ( a . Sum ( value => value * value )); }

Similarity will be called for every pair of recipes and that means a lot:

(182184*182184)/2 = 16 595 504 928 times

But there is one fragment that doesn’t change depending on input parameters, so it means i can calculate it less often.

Can You spot it?

It’s Length.

Changing it’s code to this:

public float Length () { if ( _wasIngredientWeightsChanged ) { _len = ( float ) Math . Sqrt ( _ingredientWeightsInternal . Values . Sum ( value => value * value )); _wasIngredientWeightsChanged = false ; } return _len ; }

Allowed me to go from 968 seconds to 745 seconds for the sample. Scaling it to all recipes gives:

(745 / 2199) * 182184 ~ 17 hours (starting from 34 hours)

Conslusion

“Can I make it faster” is not the only question to ask when optimizing. Sometimes asking “does it have to execute this often?” is also a very valid question.