Different versions

The version of the library is not something that gets much attention usually. Few people care about whether they are using version 0.19.0 or 0.19.1, unless they found a bug and want to complain about it or, much better indeed, want to submit a pull request to fix it. Otherwise, the current version is unsurprisingly similar to the last version, except for the latest features and bugs, so who cares…

But what if you do a quick pip install <library_name>— upgrade and your model results change?

I was recently re-running some older code and noticed that results of a model had changed despite the data and (hyper-)parameter choice being as before. After some puzzlement, I noticed the only difference in the code was that I had upgraded to a newer version of the library.

On the one hand, this isn’t something that we generally expect, since we assume that model consistency was checked before releasing a new version.

On the other hand, this is open source and there are no rules. Results can obviously change if the codebase changes. There may be tests, but most likely on standard datasets such as iris and Boston house prices. These are easy data sets with strong correlations that drive model performance. Even big codebase changes would not make much of a difference in model performance for these kinds of datasets, but there could be changes for your dataset and you may be the first person to test it.

So I wondered how much of a difference versioning makes in data science libraries? Are younger libraries such as lightGBM and catBoost more susceptible to such changes than more established ones like scikit-learn and xgboost?