After hot discussions on the article about "The Big Calculator" I felt like checking some other projects related to scientific computations. The first program that came to hand was the open-source project OpenMS dealing with protein mass spectrometry. This project appeared to have been written in a very serious and responsible way. Developers use at least Cppcheck to analyze their project. That's why I didn't hope to find anything sensational left unnoticed by that tool. On the other hand, I was curious to see what bugs PVS-Studio would be able to find in the code after Cppcheck. If you want to know this too, follow me.

OpenMP support in PVS-Studio had been dropped after version 5.20. If you have any questions, feel free to contact our support.

So, there exists a project named OpenMS. I won't dare to explain what it is designed to do in my own words, for I may say something stupid. Here's just an extract from the product description on Wikipedia:

OpenMS is an open-source project for data analysis and processing in protein mass spectrometry and is released under the 2-clause BSD licence. OpenMS has tools for many common data analysis pipelines used in proteomics, providing algorithms for signal processing, feature finding (including de-isotoping), visualization in 1D (spectra or chromatogram level), 2D and 3D, map mapping and peptide identification. It supports label-free and isotopic-label based quantification (such as iTRAQ and TMT and SILAC). Furthermore, it also supports metabolomics workflows and DIA/SWATH targeted analysis.

Taken from: Wikipedia. OpenMS.

The project is of a medium size yet rather complex. The source code's size is 20 Mbytes plus a lot of third-party libraries (Boost, Qt, Zlib, and so on). The project exploits templates very extensively. You can download the source code from the SourceForge website.

I can say for sure that static analysis is employed in the OpenMS development process. Presence of the "cppcheck.cmake" file and comments like this:

if (i != peptide.size()) // added for cppcheck

indicate that the developers use Cppcheck at least. I also saw mentions of Cpplint and the file "cpplint.py". That's a really serious way to do the job. Well done!

Now let's see what PVS-Studio has managed to find in the project.

Note. The project C++ files have the '*.C' extension for some reason. So, don't be confused when you see a C++ code sample located in a '*.C' file.

1. Defects related to OpenMP

It is very seldom that I come across projects employing the OpenMP technology. You know, I even think sometimes of removing all the OpenMP-related diagnostics from the analyzer. That's why I was genuinely surprised to see these warnings in the message list. I have checked dozens of projects during the last year and I have never seen a warning on OpenMP. Well, glad to see there's somebody using this technology.

There were false positives among those messages, but a few reported genuine bugs.

DoubleReal ILPDCWrapper::compute(....) const { .... DoubleReal score = 0; .... #pragma omp parallel for schedule(dynamic, 1) for (SignedSize i = 0; i < (SignedSize)bins.size(); ++i) { score += computeSlice_(fm, pairs, bins[i].first, bins[i].second, verbose_level); } return score; }

PVS-Studio's diagnostic message: V1205 Data race risk. Unprotected concurrent operation with the 'score' variable. ilpdcwrapper.c 213

The sum is calculated incorrectly. The variable 'score' is not protected from simultaneous use by different threads.

Other warnings are not that critical, but I think we still should take a look at them. Every exception must be caught inside parallel sections. If an exception leaves a parallel section, it will most likely lead to a crash. This subject is discussed in more detail in the following posts: "OpenMP and exceptions", "Processing of exceptions inside parallel sections".

An exception can be generated explicitly through using the throw operator, or it may occur when calling the new (std::bad_alloc) operator.

The first way. The function getTheoreticalmaxPosition() may throw an exception.

Size getTheoreticalmaxPosition() const { if (!this->size()) { throw Exception::Precondition(__FILE__, __LINE__, __PRETTY_FUNCTION__, "There must be at least one trace to ......"); } .... } virtual void run() { .... #pragma omp parallel for for (SignedSize i = 0; i < (SignedSize)seeds.size(); ++i) { .... f.setMZ( traces[traces.getTheoreticalmaxPosition()].getAvgMZ()); .... } .... }

PVS-Studio's diagnostic message: V1301 The 'throw' keyword cannot be used outside of a try..catch block in a parallel section. featurefinderalgorithmpickedhelperstructs.h 199

The second way. Calling the 'new' operator might lead to throwing an exception.

TraceFitter<PeakType>* chooseTraceFitter_(double& tau) { // choose fitter if (param_.getValue("feature:rt_shape") == "asymmetric") { LOG_DEBUG << "use asymmetric rt peak shape" << std::endl; tau = -1.0; return new EGHTraceFitter<PeakType>(); } .... } virtual void run() { .... #pragma omp parallel for for (SignedSize i = 0; i < (SignedSize)seeds.size(); ++i) { .... TraceFitter<PeakType>* fitter = chooseTraceFitter_(egh_tau); .... } .... }

PVS-Studio's diagnostic message: V1302 The 'new' operator cannot be used outside of a try..catch block in a parallel section. featurefinderalgorithmpicked.h 1926

Other defects of this kind:

V1301 featurefinderalgorithmpicked.h 1261

V1301 mzmlfile.h 114

V1301 rawmssignalsimulation.c 598

V1301 rawmssignalsimulation.c 1152

V1301 chromatogramextractor.h 103

V1301 chromatogramextractor.h 118

V1302 featurefinderalgorithmpicked.h 1931

V1302 rawmssignalsimulation.c 592

V1302 rawmssignalsimulation.c 601

V1302 openswathanalyzer.c 246

2. Misprints

std::vector< std::pair<std::string, long> > spectra_offsets; std::vector< std::pair<std::string, long> > chromatograms_offsets; template <typename MapType> void MzMLHandler<MapType>::writeFooter_(std::ostream& os) { .... int indexlists; if (spectra_offsets.empty() && spectra_offsets.empty() ) { indexlists = 0; } else if (!spectra_offsets.empty() && !spectra_offsets.empty() ) { indexlists = 2; } else { indexlists = 1; } .... }

PVS-Studio's diagnostic messages:

V501 There are identical sub-expressions 'spectra_offsets.empty()' to the left and to the right of the '&&' operator. mzmlhandler.h 5288

V501 There are identical sub-expressions '!spectra_offsets.empty()' to the left and to the right of the '&&' operator. mzmlhandler.h 5292

These checks are very strange. The container 'spectra_offsets' is checked twice. There must be a misprint and actually two different containers must be checked: 'spectra_offsets' and 'chromatograms_offsets'.

template <typename MapType> void MzMLHandler<MapType>::characters( const XMLCh* const chars, const XMLSize_t) { .... if (optionalAttributeAsString_(data_processing_ref, attributes, s_data_processing_ref)) { data_.back().meta.setDataProcessing( processing_[data_processing_ref]); } else { data_.back().meta.setDataProcessing( processing_[data_processing_ref]); } .... }

PVS-Studio's diagnostic message: V523 The 'then' statement is equivalent to the 'else' statement. mzmlhandler.h 534

If you look at other similar code fragments, you can guess what should have been written there:

processing_[data_processing_ref]

processing_[default_processing_]

Many misprints relate to dealing with exception throwing. These mistakes are very trivial: the keyword 'throw' is missing. Due to that, a temporary object is created and gets destroyed at once. For example:

inline UInt asUInt_(const String & in) { UInt res = 0; try { Int tmp = in.toInt(); if (tmp < 0) { Exception::ConversionError( __FILE__, __LINE__, __PRETTY_FUNCTION__, ""); } res = UInt(tmp); } catch (Exception::ConversionError) { error(LOAD, String("UInt conversion error of \"") + in + "\""); } return res; }

PVS-Studio's diagnostic message: V596 The object was created but it is not being used. The 'throw' keyword could be missing: throw ConversionError(FOO); xmlhandler.h 247

Similar misprints can be found in the following fragments:

inclusionexclusionlist.c 281

inclusionexclusionlist.c 285

precursorionselectionpreprocessing.c 257

modificationsdb.c 419

modificationsdb.c 442

svmtheoreticalspectrumgeneratorset.c 103

logconfighandler.c 285

logconfighandler.c 315

suffixarraytrypticcompressed.c 488

tooldescription.c 147

tofcalibration.c 147

The last misprint I've noticed:

inline typename Value<Pipe>::Type const & operator*() { tmp.i1 = *in.in1; tmp.i2 = *in.in2; tmp.i3 = *in.in2; return tmp; }

PVS-Studio's diagnostic message: V525 The code containing the collection of similar blocks. Check items 'in1', 'in2', 'in2' in lines 112, 113, 114. pipe_joiner.h 112

The correct code should look like this:

tmp.i1 = *in.in1; tmp.i2 = *in.in2; tmp.i3 = *in.in3;

3. Strange condition

CompressedInputSource::CompressedInputSource( const String & file_path, const char * header, MemoryManager * const manager) : xercesc::InputSource(manager) { if (sizeof(header) / sizeof(char) > 1) { head_[0] = header[0]; head_[1] = header[1]; } else { head_[0] = '\0'; head_[1] = '\0'; } .... }

PVS-Studio's diagnostic message: V514 Dividing sizeof a pointer 'sizeof (header)' by another value. There is a probability of logical error presence. compressedinputsource.c 52

If we divide the pointer size by the byte size, we'll always get a value larger than one. At least, I don't know such an intricate architecture where it isn't so. That's why it's some mistake here.

A similar strange check can be found here: compressedinputsource.c 104

4. Returning a reference to a local object

template <typename TStringSet, typename TSpec> inline Iter<TStringSet, ConcatVirtual<TSpec> > const & operator++(Iter<TStringSet, ConcatVirtual<TSpec> > & me, int) { Iter<TStringSet, ConcatVirtual<TSpec> > before = me; goNext(me); return before; }

PVS-Studio's diagnostic message: V558 Function returns the reference to temporary local object: before. iter_concat_virtual.h 277

The function returns a reference to the temporary variable 'before'. When leaving the function, this variable will be destroyed. Using a reference to a destroyed object may have unexpected outcome.

The fixed operator looks like this:

template <typename TStringSet, typename TSpec> inline Iter<TStringSet, ConcatVirtual<TSpec> > const operator++(Iter<TStringSet, ConcatVirtual<TSpec> > & me, int) { ... }

A similar trouble is with the '--' operator: iter_concat_virtual.h 310

5. Inaccurate calculations

typedef size_t Size; typedef double DoubleReal; void updateMeanEstimate(const DoubleReal & x_t, DoubleReal & mean_t, Size t) { DoubleReal tmp(mean_t); tmp = mean_t + (1 / (t + 1)) * (x_t - mean_t); mean_t = tmp; }

PVS-Studio's diagnostic message: V636 The '1 / (t + 1)' expression was implicitly casted from 'int' type to 'double' type. Consider utilizing an explicit type cast to avoid the loss of a fractional part. An example: double A = (double)(X) / Y;. masstracedetection.c 129

The "(1 / (t + 1))" expression is always equal to zero or one. It is determined by the fact that this expression is integer. Perhaps the programmer intended to get quite a different value. I'm not familiar with the program logic, but I guess the following thing was meant:

tmp = mean_t + (1.0 / (t + 1)) * (x_t - mean_t);

I also didn't like that instead of the M_PI constant explicit values are used which are, moreover, not very inaccurate. This is not an error of course, but it's still no good. Here's an example:

bool PosteriorErrorProbabilityModel::fit( std::vector<double> & search_engine_scores) { .... incorrectly_assigned_fit_param_.A = 1 / sqrt(2 * 3.14159 * pow(incorrectly_assigned_fit_param_.sigma, 2)); .... }

PVS-Studio's diagnostic message: V624 The constant 3.14159 is being utilized. The resulting value could be inaccurate. Consider using the M_PI constant from <math.h>. posteriorerrorprobabilitymodel.c 92

Other similar bugs:

posteriorerrorprobabilitymodel.c 101

posteriorerrorprobabilitymodel.c 110

posteriorerrorprobabilitymodel.c 155

posteriorerrorprobabilitymodel.c 162

6. Array index out of bounds

static const Int CHANNELS_FOURPLEX[4][1]; static const Int CHANNELS_EIGHTPLEX[8][1]; ExitCodes main_(int, const char **) { .... if (itraq_type == ItraqQuantifier::FOURPLEX) { for (Size i = 0; i < 4; ++i) { std::vector<std::pair<String, DoubleReal> > one_label; one_label.push_back(std::make_pair<String, DoubleReal>( String("Channel ") + String(ItraqConstants::CHANNELS_FOURPLEX[i][0]), DoubleReal(ItraqConstants::CHANNELS_FOURPLEX[i][0]))); labels.push_back(one_label); } } else //ItraqQuantifier::EIGHTPLEX { for (Size i = 0; i < 8; ++i) { std::vector<std::pair<String, DoubleReal> > one_label; one_label.push_back(std::make_pair<String, DoubleReal>( String("Channel ") + String(ItraqConstants::CHANNELS_FOURPLEX[i][0]), DoubleReal(ItraqConstants::CHANNELS_FOURPLEX[i][0]))); labels.push_back(one_label); } } .... }

PVS-Studio's diagnostic message: V557 Array overrun is possible. The value of 'i' index could reach 7. itraqanalyzer.c 232

This bug can be actually put into the category of Copy-Paste related bugs. But anyway, let it be "array index out of bounds" - it sounds scarier this way. And, after all, this classification is pretty relative; you can put one and the same bug into different categories.

In this sample, the 'CHANNELS_EIGHTPLEX' array must have been handled in the 'else' branch. There's a comment proving that:

else //ItraqQuantifier::EIGHTPLEX

However, the copied-and-pasted code fragment was modified only partially. It results in the CHANNELS_FOURPLEX array having a smaller size.

A similar bug can be found here (also caused by Copy-Paste): tmtanalyzer.c 225

One more sample.

DoubleReal masse_[255]; ///< mass table EdwardsLippertIterator::EdwardsLippertIterator(const EdwardsLippertIterator & source) : PepIterator(source), f_file_(source.f_file_), actual_pep_(source.actual_pep_), spec_(source.spec_), tol_(source.tol_), is_at_end_(source.is_at_end_), f_iterator_(source.f_iterator_), f_entry_(source.f_entry_), b_(source.b_), e_(source.e_), m_(source.m_), massMax_(source.massMax_) { for (Size i = 0; i < 256; i++) { masse_[i] = source.masse_[i]; } }

PVS-Studio's diagnostic message: V557 Array overrun is possible. The value of 'i' index could reach 255. edwardslippertiterator.c 134

The masse_ array is being incorrectly handled in the copying constructor: the array consists of 255 items, while 256 items are copied.

The fixed loop looks like this:

for (Size i = 0; i < 255; i++) { masse_[i] = source.masse_[i]; }

An even better practice is to avoid using magic constants.

7. Obsolete way of calling 'new' operator

svm_problem * LibSVMEncoder::encodeLibSVMProblem(....) { .... node_vectors = new svm_node *[problem->l]; if (node_vectors == NULL) { delete[] problem->y; delete problem; return NULL; } .... }

PVS-Studio's diagnostic message: V668 There is no sense in testing the 'node_vectors' pointer against null, as the memory was allocated using the 'new' operator. The exception will be generated in the case of memory allocation error. libsvmencoder.c 177

The check "if (node_vectors == NULL)" is pointless. If memory cannot be allocated, an exception is thrown. It results in the program behaving quite differently than the programmer expects. For instance, a memory leak might occur.

There are other similar checks implemented in an obsolete manner:

file_page.h 728

libsvmencoder.c 160

Conclusion

I think the OpenMS developers will benefit from using PVS-Studio as well in addition to Cppcheck, Cpplint - especially if doing it regularly. So, I invite you to write us at support@viva64.com. We can grant you a free registration key so that you can do a complete check of OpenMS.