Introduction

While Stockholm was holding the 118th Nobel Week, I was sitting in our office, where we develop the PVS-Studio static analyzer, working on an analysis review of the ROOT project, a big-data processing framework used in scientific research. This code wouldn't win a prize, of course, but the authors can definitely count on a detailed review of the most interesting defects plus a free license to thoroughly check the project on their own.

A new diagnostic's debut

int AddFunction(const ROOT::Math::IMultiGenFunction & func) { ROOT::Math::IMultiGenFunction * f = func.Clone(); if (!f) return 0; fFunctions.push_back(f); return fFunctions.size(); } template<class FuncIterator> bool SetFunctionList( FuncIterator begin, FuncIterator end) { bool ret = true; for (FuncIterator itr = begin; itr != end; ++itr) { const ROOT::Math::IMultiGenFunction * f = *itr; ret &= AddFunction(*f); } return ret; }

ret &= AddFunction(*f);

ROOT is a modular scientific software toolkit. It provides all the functionalities needed to deal with big data processing, statistical analysis, visualisation and storage. It is mainly written in C++. ROOT was born at CERN , at the heart of the research on high-energy physics. Every day, thousands of physicists use ROOT applications to analyze their data or to perform simulations. PVS-Studio is a tool for detecting software bugs and potential vulnerabilities in the source code of programs written in C, C++, C#, and Java. It runs on 64-bit Windows, Linux, and macOS and can analyze source code written for 32-bit, 64-bit, and embedded ARM platforms. V1046 Unsafe usage of the bool' and 'int' types together in the operation '&='. GSLMultiRootFinder.h 175First off, here's a wonderful bug found by the beta version of PVS-Studio, which I was using for this review.Thefunction traverses an iterator list. If at least one iterator is invalid, the function returns, orotherwise.Thefunction can returneven for valid iterators. Let's figure out why. Thefunction returns the number of valid iterators on thelist. That is, adding non-null iterators will cause the list to incrementally grow in size: 1, 2, 3, 4, and so on. This is where the bug comes into play:Since the function returns a value of typerather than, the '&=' operation will returnfor even values because the least significant bit of an even number is always set to zero. This is how one subtle bug can break the return value ofeven when its arguments are valid.

Errors in conditional expressions

virtual void HandleDiagnostic(....) override { .... bool isROOTSystemModuleDiag = module && ....; bool isSystemModuleDiag = module && module && module->IsSystem; if (!isROOTSystemModuleDiag && !isSystemModuleDiag) fChild->HandleDiagnostic(DiagLevel, Info); .... }

TAuthenticate::TAuthenticate(TSocket *sock, const char *remote, const char *proto, const char *user) { .... // If generic THostAuth (i.e. with wild card or user == any) // make a personalized memory copy of this THostAuth if (strchr(fHostAuth->GetHost(),'*') || strchr(fHostAuth->GetHost(),'*') || fHostAuth->GetServer() == -1 ) { fHostAuth = new THostAuth(*fHostAuth); fHostAuth->SetHost(fqdn); fHostAuth->SetUser(checkUser); fHostAuth->SetServer(servtype); } .... }

Int_t TProofMonSenderML::SendSummary(TList *recs, const char *id) { .... if (fSummaryVrs == 0) { if ((dsn = recs->FindObject("dataset"))) recs->Remove(dsn); } else if (fSummaryVrs == 0) { // Only the first records xrecs = new TList; xrecs->SetOwner(kFALSE); TIter nxr(recs); TObject *o = 0; while ((o = nxr())) { if (!strcmp(o->GetName(), "vmemmxw")) break; xrecs->Add(o); } } .... }

template <typename Index, typename Value> void TKDTree<Index, Value>::UpdateRange(....) { .... if (point[fAxis[inode]]<=fValue[inode]){ //first examine the node that contains the point UpdateRange(GetLeft(inode),point, range, res); UpdateRange(GetRight(inode),point, range, res); } else { UpdateRange(GetLeft(inode),point, range, res); UpdateRange(GetRight(inode),point, range, res); } .... }

V523 The 'then' statement is equivalent to the 'else' statement. TContainerConverters.cxx 51

V523 The 'then' statement is equivalent to the 'else' statement. TWebFile.cxx 1310

V523 The 'then' statement is equivalent to the 'else' statement. MethodMLP.cxx 423

V523 The 'then' statement is equivalent to the 'else' statement. RooAbsCategory.cxx 394

bool SelectionRules::AreAllSelectionRulesUsed() const { for(auto&& rule : fClassSelectionRules){ .... std::string file_name_value; if (!rule.GetAttributeValue("file_name", file_name_value)) file_name_value.clear(); if (!file_name_value.empty()) { // <= // don't complain about defined_in rules continue; } const char* attrName = nullptr; const char* attrVal = nullptr; if (!file_name_value.empty()) { // <= attrName = "file name"; attrVal = file_name_value.c_str(); } else { attrName = "class"; if (!name.empty()) attrVal = name.c_str(); } ROOT::TMetaUtils::Warning(0,"Unused %s rule: %s

", attrName, attrVal); } .... }

TString TTabCom::DetermineClass(const char varName[]) { .... c = file1.get(); if (!file1 || c <= 0 || c == '*' || c != '(') { Error("TTabCom::DetermineClass", "variable \"%s\" not defined?", varName); goto cleanup; } .... }

if (.... || c == '*' || c != '(') { .... }

V590 Consider inspecting this expression. The expression is excessive or contains a misprint. TFile.cxx 3963

V590 Consider inspecting this expression. The expression is excessive or contains a misprint. TStreamerInfoActions.cxx 3084

Int_t TProofServ::HandleSocketInput(TMessage *mess, Bool_t all) { .... if (Int_t ret = fProof->AddWorkers(workerList) < 0) { Error("HandleSocketInput:kPROOF_GETSLAVEINFO", "adding a list of worker nodes returned: %d", ret); } .... }

V593 Consider reviewing the expression of the 'A = B < C' kind. The expression is calculated as following: 'A = (B < C)'. TProofServ.cxx 3897

enum EPruneMethod {kExpectedErrorPruning=0, kCostComplexityPruning, kNoPruning}; void TMVA::MethodDT::ProcessOptions() { .... if (fPruneStrength < 0) fAutomatic = kTRUE; else fAutomatic = kFALSE; if (fAutomatic && fPruneMethod==!DecisionTree::kCostComplexityPruning){ Log() << kFATAL << "Sorry automatic pruning strength determination is ...." << Endl; } .... }

Pointer handling errors

void TSynapse::SetPre(TNeuron * pre) { if (pre) { Error("SetPre","this synapse is already assigned to a pre-neuron."); return; } fpre = pre; pre->AddPost(this); }

void TSynapse::SetPre(TNeuron * pre) { if (fpre) { Error("SetPre","this synapse is already assigned to a pre-neuron."); return; } fpre = pre; pre->AddPost(this); }

V522 Dereferencing of the null pointer 'post' might take place. TSynapse.cxx 74

bool RScanner::shouldVisitDecl(clang::NamedDecl *D) { if (auto M = D->getOwningModule()) { // <= 2 return fInterpreter.getSema().isModuleVisible(M); } return true; } bool RScanner::VisitNamespaceDecl(clang::NamespaceDecl* N) { if (fScanType == EScanType::kOnePCM) return true; if (!shouldVisitDecl(N)) // <= 1 return true; if((N && N->isImplicit()) || !N){ // <= 3 return true; } .... }

V595 The 'file' pointer was utilized before it was verified against nullptr. Check lines: 141, 153. TFileCacheRead.cxx 141

V595 The 'fFree' pointer was utilized before it was verified against nullptr. Check lines: 2029, 2038. TFile.cxx 2029

V595 The 'tbuf' pointer was utilized before it was verified against nullptr. Check lines: 586, 591. TGText.cxx 586

V595 The 'fPlayer' pointer was utilized before it was verified against nullptr. Check lines: 3425, 3430. TProof.cxx 3425

V595 The 'gProofServ' pointer was utilized before it was verified against nullptr. Check lines: 1192, 1194. TProofPlayer.cxx 1192

V595 The 'projDataTmp' pointer was utilized before it was verified against nullptr. Check lines: 791, 804. RooSimultaneous.cxx 791

#define SafeDelete(p) { if (p) { delete p; p = 0; } } void TCanvas::Close(Option_t *option) { .... if (fCanvasImp) SafeDelete(fCanvasImp); .... }

Array handling errors

size_t find_last_non_alnum(const std::string &str, std::string::size_type index = std::string::npos) { .... char tmp = Line.GetText()[Cursor]; Line[Cursor] = Line[Cursor - 1]; Line[Cursor] = tmp; .... }

bool BasicMinimizer::SetVariableValue(unsigned int ivar, double val) { if (ivar > fValues.size() ) return false; fValues[ivar] = val; return true; }

V557 Array overrun is possible. The 'ivar' index is pointing beyond array bound. BasicMinimizer.cxx 186

V557 Array overrun is possible. The 'ivar' index is pointing beyond array bound. BasicMinimizer.cxx 194

V557 Array overrun is possible. The 'ivar' index is pointing beyond array bound. BasicMinimizer.cxx 209

V557 Array overrun is possible. The 'ivar' index is pointing beyond array bound. BasicMinimizer.cxx 215

V557 Array overrun is possible. The 'ivar' index is pointing beyond array bound. BasicMinimizer.cxx 230

Int_t TDataMember::GetArrayDim() const { if (fArrayDim<0 && fInfo) { R__LOCKGUARD(gInterpreterMutex); TDataMember *dm = const_cast<TDataMember*>(this); dm->fArrayDim = gCling->DataMemberInfo_ArrayDim(fInfo); // fArrayMaxIndex should be zero if (dm->fArrayDim) { dm->fArrayMaxIndex = new Int_t[fArrayDim]; for(Int_t dim = 0; dim < fArrayDim; ++dim) { dm->fArrayMaxIndex[dim] = gCling->DataMemberInfo_MaxIndex(fInfo,dim); } } } return fArrayDim; }

llvm::StringRef ROOT::TMetaUtils::DataMemberInfo__ValidArrayIndex(....) { .... while (current!=0) { // Check the token if (isdigit(current[0])) { for(i=0;i<strlen(current);i++) { if (!isdigit(current[0])) { if (errstr) *errstr = current; if (errnum) *errnum = NOT_INT; return llvm::StringRef(); } } } else { // current token is not a digit .... } .... } .... }

V501 There are identical sub-expressions to the left and to the right of the '&&' operator: module && module rootcling_impl.cxx 3650Let's start with the least harmful bug. Thepointer is checked twice. One of the checks is probably redundant, yet it would still be wise to fix it to avoid any confusion in the future. V501 There are identical sub-expressions 'strchr(fHostAuth->GetHost(), '*')' to the left and to the right of the '||' operator. TAuthenticate.cxx 300Thestring is scanned for the '*' character twice. One of these checks was probably meant to look for the '?' character as these two characters are typically the ones used to specify various wildcard masks. V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 163, 165. TProofMonSenderML.cxx 163Thevariable is compared with zero twice, so execution never reaches the code in thebranch. And there's quite a bit of code there… V523 The 'then' statement is equivalent to the 'else' statement. TKDTree.cxx 805The same block of code, which is a copy-paste clone, is executed no matter the condition. I guess there's a confusion between the wordsandThe project is full of suspicious spots like that: V547 Expression '!file_name_value.empty()' is always false. SelectionRules.cxx 1423This is probably not a bug; the analyzer just found some code that can be simplified. Since the return value ofis already checked at the beginning of the loop, the second duplicate check can be removed, thus throwing away a good amount of unnecessary code. V590 Consider inspecting the '!file1 || c <= 0 || c == '*' || c != '('' expression. The expression is excessive or contains a misprint. TTabCom.cxx 840Here's the problem part of the conditional expression reported by the analyzer:The check for the asterisk character won't affect the condition's result. This part will always be true for any character other than '('. You can easily check it for yourself by drawing a truth table.Two more warnings about conditions with strange logic: V593 Consider reviewing the expression of the 'A = B < C' kind. The expression is calculated as following: 'A = (B < C)'. TProofServ.cxx 1903This bug reveals itself only in the case of the program's faulty behavior. Thevariable is supposed to store the return code of thefunction and write that value to the log in case of error condition. But it doesn't work as intended. The condition lacks additional parentheses forcing the desired order of evaluation. What thevariable actually stores is not the return code but the result of the logical comparison, i.e. either 0 or 1.Another similar problem: V768 The enumeration constant 'kCostComplexityPruning' is used as a variable of a Boolean-type. MethodDT.cxx 283Hm… Why negate the constant value? I suspect the negation character is a typo, which now distorts the execution logic. V522 Dereferencing of the null pointer 'pre' might take place. TSynapse.cxx 61I did my best trying to understand this strange code, and it seems the idea was to avoid assigning a new value to thefield. If so, the programmer is accidentally checking the wrong pointer. The current implementation leads to dereferencing a null pointer if you pass thevalue to thefunction.I think this snippet should be fixed as follows:This, however, wouldn't prevent the passing of a null pointer to the function, but at least this version is more logically consistent than the original one.A slightly modified clone of this code can be found in another spot: V595 The 'N' pointer was utilized before it was verified against nullptr. Check lines: 484, 488. Scanner.cxx 484This is an extremely dangerous piece of code! Thepointer isn't checked for null before it gets dereferenced the first time. What is more, you can't see it happen here because the dereference takes place inside thefunction.This diagnostic traditionally generates a bunch of relevant warnings. Here are just a few examples:The next one is not a bug, but it's yet another example of how macros encourage writing faulty or redundant code. V571 Recurring check. The 'if (fCanvasImp)' condition was already verified in line 799. TCanvas.cxx 800Thepointer is checked twice, with one of the checks already implemented in themacro. One of the problems with macros is that they are difficult to navigate from within the code, which is the reason why many programmers don't examine their contents before use. V519 The 'Line[Cursor]' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 352, 353. Editor.cpp 353The elementis assigned a new value, which then immediately gets overwritten. That doesn't look right… V557 Array overrun is possible. The 'ivar' index is pointing beyond array bound. BasicMinimizer.cxx 130Making this mistake when checking array indexes is a recent trend; we see it in almost every third project. While indexing into an array inside a loop is easy – you typically use the '=' operator, not '>'. Otherwise you risk indexing one element beyond the array's bound.This bug was cloned throughout the code a few times: V621 Consider inspecting the 'for' operator. It's possible that the loop will be executed incorrectly or won't be executed at all. TDataMember.cxx 554In theloop, the developers apparently meant to compare thevariable withrather than. The value ofis negative, which is guaranteed by the condition at the beginning of the function. Consequently, this loop will never execute. V767 Suspicious access to element of 'current' array by a constant index inside a loop. TClingUtils.cxx 3082This code is parsing and checking some string. If thestring's first character (i.e. at index 0) has been recognized as a number, the loop will traverse all the rest characters to make sure all of them are numbers. Well, at least that's the idea. The problem is, thecounter is not used in the loop. The condition should be rewritten so that it checksrather than

Memory leak

void TDataMember::Init(bool afterReading) { .... TList *optionlist = new TList(); //storage for options strings for (i=0;i<token_cnt;i++) { if (strstr(tokens[i],"Items")) { ptr1 = R__STRTOK_R(tokens[i], "()", &rest); if (ptr1 == 0) { Fatal("TDataMember","Internal error, found \"Items....",GetTitle()); return; } ptr1 = R__STRTOK_R(nullptr, "()", &rest); if (ptr1 == 0) { Fatal("TDataMember","Internal error, found \"Items....",GetTitle()); return; } .... } .... } .... // dispose of temporary option list... delete optionlist; .... }

memset again

void TMD5::Transform(UInt_t buf[4], const UChar_t in[64]) { UInt_t a, b, c, d, x[16]; .... // Zero out sensitive information memset(x, 0, sizeof(x)); }

Miscellaneous

LogLikelihoodFCN & operator = (const LogLikelihoodFCN & rhs) { SetData(rhs.DataPtr() ); SetModelFunction(rhs.ModelFunctionPtr() ); fNEffPoints = rhs.fNEffPoints; fGrad = rhs.fGrad; fIsExtended = rhs.fIsExtended; fWeight = rhs.fWeight; fExecutionPolicy = rhs.fExecutionPolicy; }

template <typename Value_t, typename Container_t> inline RTensor<Value_t, Container_t> RTensor<Value_t, Container_t>::Transpose() { if (fLayout == MemoryLayout::RowMajor) { fLayout = MemoryLayout::ColumnMajor; } else if (fLayout == MemoryLayout::ColumnMajor) { fLayout = MemoryLayout::RowMajor; } else { std::runtime_error("Memory layout is not known."); } .... }

V596 The object was created but it is not being used. The 'throw' keyword could be missing: throw runtime_error(FOO); Forest.hxx 137

const char *TGHtml::GetPctWidth(TGHtmlElement *p, char *opt, char *ret) { int n, m, val; .... if (n < 0 || n > 100) return z; if (opt[0] == 'h') { val = fCanvas->GetHeight() * 100; } else { val = fCanvas->GetWidth() * 100; } if (!fInTd) { snprintf(ret, 15, "%d", val / n); // <= } else { .... } .... }

if (n <= 0 || n > 100) return z;

TProofServ::TProofServ(Int_t *argc, char **argv, FILE *flog) : TApplication("proofserv", argc, argv, 0, -1) { .... if (!logmx.IsDigit()) { if (logmx.EndsWith("K")) { xf = 1024; logmx.Remove(TString::kTrailing, 'K'); } else if (logmx.EndsWith("M")) { xf = 1024*1024; logmx.Remove(TString::kTrailing, 'M'); } if (logmx.EndsWith("G")) { xf = 1024*1024*1024; logmx.Remove(TString::kTrailing, 'G'); } } .... }

V646 Consider inspecting the application's logic. It's possible that 'else' keyword is missing. TFormula_v5.cxx 3702

V646 Consider inspecting the application's logic. It's possible that 'else' keyword is missing. RooAbsCategory.cxx 604

void TMVA::MethodKNN::ReadWeightsFromStream(std::istream& is) { .... while (!is.eof()) { std::string line; std::getline(is, line); if (line.empty() || line.find("#") != std::string::npos) { continue; } .... } .... }

while (!is.eof() && !is.fail()) { .... }

while (is) { .... }

TFormLeafInfoMultiVarDim::TFormLeafInfoMultiVarDim( const TFormLeafInfoMultiVarDim& orig) : TFormLeafInfo(orig) { fNsize = orig.fNsize; fSizes.Copy(fSizes); // <= fCounter2 = orig.fCounter2?orig.fCounter2->DeepCopy():0; fSumOfSizes = orig.fSumOfSizes; fDim = orig.fDim; fVirtDim = orig.fVirtDim; fPrimaryIndex = orig.fPrimaryIndex; fSecondaryIndex = orig.fSecondaryIndex; }

Conclusion