About PVS-Studio

About Apache Hive

Running the analysis

Downloaded Apache Hive from GitHub;

Read the guide on starting the Java analyzer and launched the analysis;

Got the analyzer's report, studied it, and wrote out the most interesting cases.

Predetermined conditions

void initOptions() { .... if (key == null || value == null || !key.startsWith("hplsql.")) { // <= continue; } else if (key.compareToIgnoreCase(Conf.CONN_DEFAULT) == 0) { .... } else if (key.startsWith("hplsql.conn.init.")) { .... } else if (key.startsWith(Conf.CONN_CONVERT)) { .... } else if (key.startsWith("hplsql.conn.")) { .... } else if (key.startsWith("hplsql.")) { // <= .... } }

private static TypeDescription getTypeDescriptionFromTableProperties(....) { .... if (tableProperties != null) { final String columnNameProperty = ....; final String columnTypeProperty = ....; if ( !Strings.isNullOrEmpty(columnNameProperty) && !Strings.isNullOrEmpty(columnTypeProperty)) { List<String> columnNames = columnNameProperty.length() == 0 ? new ArrayList<String>() : ....; List<TypeInfo> columnTypes = columnTypeProperty.length() == 0 ? new ArrayList<TypeInfo>() : ....; .... } } } .... }

V6007 Expression 'columnTypeProperty.length() == 0' is always false. OrcRecordUpdater.java(239)

private void generateDateTimeArithmeticIntervalYearMonth(String[] tdesc) throws Exception { .... String colOrScalar1 = tdesc[4]; .... String colOrScalar2 = tdesc[6]; .... if (colOrScalar1.equals("Col") && colOrScalar1.equals("Column")) // <= { .... } else if (colOrScalar1.equals("Col") && colOrScalar1.equals("Scalar")) { .... } else if (colOrScalar1.equals("Scalar") && colOrScalar1.equals("Column")) { .... } }

V6007 Expression 'colOrScalar1.equals(«Scalar»)' is always false. GenVectorCode.java(3475)

V6007 Expression 'colOrScalar1.equals(«Column»)' is always false. GenVectorCode.java(3486)

V6007 Expression 'characters == null' is always false. RandomTypeUtil.java(43)

V6007 Expression 'writeIdHwm > 0' is always false. TxnHandler.java(1603)

V6007 Expression 'fields.equals("*")' is always true. Server.java(983)

V6007 Expression 'currentGroups != null' is always true. GenericUDFCurrentGroups.java(90)

V6007 Expression 'this.wh == null' is always false. New returns not-null reference. StorageBasedAuthorizationProvider.java(93), StorageBasedAuthorizationProvider.java(92)

and so on...

NPE

private void handleFragmentCompleteExternalQuery(QueryInfo queryInfo) { if (queryInfo.isExternalQuery()) { ReadWriteLock dagLock = getDagLock(queryInfo.getQueryIdentifier()); if (dagLock == null) { LOG.warn("Ignoring fragment completion for unknown query: {}", queryInfo.getQueryIdentifier()); } boolean locked = dagLock.writeLock().tryLock(); ..... } }

private boolean lockBuffer(LlapBufferOrBuffers buffers, ....) { LlapAllocatorBuffer buffer = buffers.getSingleLlapBuffer(); if (buffer != null) { // <= return lockOneBuffer(buffer, doNotifyPolicy); } LlapAllocatorBuffer[] bufferArray = buffers.getMultipleLlapBuffers(); for (int i = 0; i < bufferArray.length; ++i) { if (lockOneBuffer(bufferArray[i], doNotifyPolicy)) continue; for (int j = 0; j < i; ++j) { unlockSingleBuffer(buffer, true); // <= } .... } .... } .... private void unlockSingleBuffer(LlapAllocatorBuffer buffer, ....) { boolean isLastDecref = (buffer.decRef() == 0); // <= if (isLastDecref) { .... } }

A shift gone wild

private void shiftRightDestructive(int wordShifts, int bitShiftsInWord, boolean roundUp) { if (wordShifts == 0 && bitShiftsInWord == 0) { return; } assert (wordShifts >= 0); assert (bitShiftsInWord >= 0); assert (bitShiftsInWord < 32); if (wordShifts >= 4) { zeroClear(); return; } final int shiftRestore = 32 - bitShiftsInWord; // check this because "123 << 32" will be 123. final boolean noRestore = bitShiftsInWord == 0; final int roundCarryNoRestoreMask = 1 << 31; final int roundCarryMask = (1 << (bitShiftsInWord - 1)); // <= .... }

public void logSargResult(int stripeIx, boolean[] rgsToRead) { .... for (int i = 0, valOffset = 0; i < elements; ++i, valOffset += 64) { long val = 0; for (int j = 0; j < 64; ++j) { int ix = valOffset + j; if (rgsToRead.length == ix) break; if (!rgsToRead[ix]) continue; val = val | (1 << j); // <= } .... } .... }

Getting carried away by logging

private static ImmutableList<PersistedRuntimeStats> extractStatsFromPlanMapper (....) { .... if (stat.size() > 1 || sig.size() > 1) { StringBuffer sb = new StringBuffer(); sb.append(String.format( "expected(stat-sig) 1-1, got {}-{} ;", // <= stat.size(), sig.size() )); .... } .... if (e.getAll(OperatorStats.IncorrectRuntimeStatsMarker.class).size() > 0) { LOG.debug( "Ignoring {}, marked with OperatorStats.IncorrectRuntimeStatsMarker", sig.get(0) ); continue; } .... }

A stolen exception

private List<MPartitionColumnStatistics> getMPartitionColumnStatistics(....) throws NoSuchObjectException, MetaException { boolean committed = false; try { .... /*some actions*/ committed = commitTransaction(); return result; } catch (Exception ex) { LOG.error("Error retrieving statistics via jdo", ex); if (ex instanceof MetaException) { throw (MetaException) ex; } throw new MetaException(ex.getMessage()); } finally { if (!committed) { rollbackTransaction(); return Lists.newArrayList(); } } }

V6051 The use of the 'return' statement in the 'finally' block can lead to the loss of unhandled exceptions. ObjectStore.java(808)

Miscellaneous

@Override public List<LlapServiceInstance> getAllInstancesOrdered(....) { .... Collections.sort(list, new Comparator<LlapServiceInstance>() { @Override public int compare(LlapServiceInstance o1, LlapServiceInstance o2) { return o2.getWorkerIdentity().compareTo(o2.getWorkerIdentity()); // <= } }); .... }

public static long divideUnsignedLong(long dividend, long divisor) { if (divisor < 0L) { /*some comments*/ return (compareUnsignedLong(dividend, divisor)) < 0 ? 0L : 1L; } if (dividend >= 0) { // Both inputs non-negative return dividend / divisor; // <= } else { .... } }

V6020 Mod by zero. The range of the 'divisor' denominator values includes zero. SqlMathUtil.java(309)

V6020 Divide by zero. The range of the 'divisor' denominator values includes zero. SqlMathUtil.java(276)

V6020 Divide by zero. The range of the 'divisor' denominator values includes zero. SqlMathUtil.java(312)

public static Operator<? extends OperatorDesc> findSourceRS(....) { .... List<Operator<? extends OperatorDesc>> parents = ....; if (parents == null | parents.isEmpty()) { // reached end e.g. TS operator return null; } .... }

public static VectorColumnAssign buildObjectAssign(VectorizedRowBatch outputBatch, int outColIndex, PrimitiveCategory category) throws HiveException { VectorColumnAssign outVCA = null; ColumnVector destCol = outputBatch.cols[outColIndex]; if (destCol == null) { .... } else if (destCol instanceof LongColumnVector) { switch(category) { .... case LONG: outVCA = new VectorLongColumnAssign() { .... } .init(.... , (LongColumnVector) destCol); break; case TIMESTAMP: outVCA = new VectorTimestampColumnAssign() { .... }.init(...., (TimestampColumnVector) destCol); // <= break; case DATE: outVCA = new VectorLongColumnAssign() { .... } .init(...., (LongColumnVector) destCol); break; case INTERVAL_YEAR_MONTH: outVCA = new VectorLongColumnAssign() { .... }.init(...., (LongColumnVector) destCol); break; case INTERVAL_DAY_TIME: outVCA = new VectorIntervalDayTimeColumnAssign() { .... }.init(...., (IntervalDayTimeColumnVector) destCol);// <= break; default: throw new HiveException(....); } } else if (destCol instanceof DoubleColumnVector) { .... } .... else { throw new HiveException(....); } return outVCA; }

V6042 The expression is checked for compatibility with type 'A' but is cast to type 'B'. VectorColumnAssignFactory.java(390)

@Override public boolean equals(Object obj) { if (getClass() != obj.getClass()) { // <= return false; } Var var = (Var)obj; if (this == var) { return true; } else if (var == null || // <= var.value == null || this.value == null) { return false; } .... }

V6060 The 'value' reference was utilized before it was verified against null. ParquetRecordReaderWrapper.java(168), ParquetRecordReaderWrapper.java(166)

V6060 The 'defaultConstraintCols' reference was utilized before it was verified against null. HiveMetaStore.java(2539), HiveMetaStore.java(2530)

V6060 The 'projIndxLst' reference was utilized before it was verified against null. RelOptHiveTable.java(683), RelOptHiveTable.java(682)

V6060 The 'oldp' reference was utilized before it was verified against null. ObjectStore.java(4343), ObjectStore.java(4339)

and so on...

Conclusion

For the past ten years, the open-source movement has been one of the key drivers of the IT industry's development, and its crucial component. The role of open-source projects is becoming more and more prominent not only in terms of quantity but also in terms of quality, which changes the very concept of how they are positioned on the IT market in general. Our courageous PVS-Studio team is not sitting idly and is taking an active part in strengthening the presence of open-source software by finding hidden bugs in the enormous depths of codebases and offering free license options to the authors of such projects. This article is just another piece of that activity! Today we are going to talk about Apache Hive. I've got the report — and there are things worth looking at.The static code analyzer PVS-Studio , which has been around for more than 10 years now, is a multi-functional and easy-to-integrate software solution. At present, it supports C, C++, C#, and Java and runs on Windows, Linux, and macOS.PVS-Studio is a paid B2B solution used by numerous teams in a number of companies. If you want to try the analyzer, visit this page to download the distribution and request a trial key.If you are an open-source geek or, say, a student, you can take advantage of one of our free license options The amount of data has been growing at an enormous rate over the past years. The standard databases can no longer cope with this rapid growth, which is where the term Big Data comes from along with other related notions (such as processing, storage, and other operations on big data). Apache Hadoop is currently thought to be one of the pioneering Big Data technologies. Its primary tasks are storing, processing, and managing large amounts of data. The main components comprising the framework are Hadoop Common, HDFS Hadoop MapReduce , and Hadoop YARN . Over time, a large ecosystem of related projects and technologies has developed around Hadoop, many of which originally started as part of the project and then budded off to become independent. Apache Hive is one of them.Apache Hive is a distributed data warehouse. It manages the data stored in HDFS and provides the query language based on SQL (HiveQL) to handle that data. More detail on the project can be found here It didn't take much effort or time to start the analysis. Here's my algorithm:The analysis results are as follows: 1456 warnings of High and Medium levels (602 and 854 respectively) on 6500+ files.Not all warnings refer to genuine bugs. That's quite normal; you have to tweak the analyzer's settings before you start using it regularly. After that, you typically expect a fairly low rate of false positives ( example ).I left out the 407 warnings (177 High- and 230 Medium-level) triggered by the test files. I also ignored the V6022 diagnostic (since you can't reliably distinguish between faulty and correct fragments when you are not familiar with the code), which was triggered as many as 482 times. Neither did I examine the 179 warnings generated by the V6021 diagnostic.In the end, I still had enough warnings to go with, and since I didn't tweak the settings, there is still some percentage of false positives among them. There's just no point including too many warnings in an article like this :). So we'll just talk about what caught my eye and looked curious enough.Among the diagnostics examined for this analysis, V6007 holds a record for the number of issued warnings. A little over 200 messages!!! Some look harmless, others suspicious, and some others are genuine bugs after all! Let's take a look at some of them. V6007 Expression 'key.startsWith(«hplsql.»)' is always true. Exec.java(675)That's quite a lengthy if-else-if construct! The analyzer doesn't like the condition in the lastbecause if execution reaches it, it will mean it's true. Indeed, if you look at the first line of this whole if-else-if construct, you'll see that it already contains the opposite check, so if the string doesn't begin with, execution will immediately skip over to the next iteration. V6007 Expression 'columnNameProperty.length() == 0' is always false. OrcRecordUpdater.java(238)The comparison of thestring's length with zero will always return. This happens because this comparison follows thecheck. So if execution reaches our condition, it will mean that thestring is surely neither null nor empty.The same is true for thestring one line later: V6007 Expression 'colOrScalar1.equals(«Column»)' is always false. GenVectorCode.java(3469)The good old copy-paste. From the viewpoint of the current logic, the stringmight have two different values at once, which is impossible. Obviously, the checks should have the variableon the left andon the right.Similar warnings a few lines below:As a result, this if-else-if construct will never do anything.A few more V6007 warnings: V6008 Potential null dereference of 'dagLock'. QueryTracker.java(557), QueryTracker.java(553)A null object is caught, logged, and… the program just keeps running. As a result, the check is followed by a null pointer dereference. Ouch!The developers must have actually wanted the program to exit the function or throw some special exception in the case of getting a null reference. V6008 Null dereference of 'buffer' in function 'unlockSingleBuffer'. MetadataCache.java(410), MetadataCache.java(465)Another potential NPE. If execution reaches themethod, it will mean theobject is null. Suppose that's what has happened! If you look at themethod, you'll notice how our object is dereferenced right in the first line. Gotcha! V6034 Shift by the value of 'bitShiftsInWord — 1' could be inconsistent with the size of type: 'bitShiftsInWord — 1' = [-1… 30]. UnsignedInt128.java(1791)This is a potential shift by -1. If the method is called with, say,and, the reported line will end up with 1 << -1. Is that a planned behavior? V6034 Shift by the value of 'j' could be inconsistent with the size of type: 'j' = [0… 63]. IoTrace.java(272)In the reported line, thevariable can have a value within the range [0… 63]. Because of that, calculation of the value ofin the loop may run in an unexpected way. In theexpression, the value 1 is of type, so shifting it by 32 bits and more takes us beyond the limits of the type's range. This can be fixed by writing V6046 Incorrect format. A different number of format items is expected. Arguments not used: 1, 2. StatsSources.java(89)When writing the code to format the string using, the developer used wrong syntax. As a result, the passed parameters never made it to the resulting string. My guess is that the developer had been working on logging before writing this, which is where they borrowed the syntax from. V6051 The use of the 'return' statement in the 'finally' block can lead to the loss of unhandled exceptions. ObjectStore.java(9080)Returning anything from theblock is a very bad practice, and this example vividly shows why.In theblock, the program is forming a request and accessing the storage. Thevariable has the valueby default and changes its state only after all the previous actions in theblock have been successfully executed. It means that if an exception is raised, that variable will always be. Theblock will catch the exception, adjust it a bit, and throw it on. So when it's the turn of theblock, execution will enter the condition from which an empty list will be returned. What does this return cost us? Well, it costs us preventing any caught exception from being thrown on to the outside where it could be properly handled. None of the exceptions specified in the method's signature will ever be thrown; they are simply misleading.A similar diagnostic message: V6009 Function 'compareTo' receives an odd argument. An object 'o2.getWorkerIdentity()' is used as an argument to its own method. LlapFixedRegistryImpl.java(244)There could be a number of causes leading to such a silly mistake: copy-paste, carelessness, hurry, and so on. We often see errors like that in open-source projects and even have a whole article about that. V6020 Divide by zero. The range of the 'divisor' denominator values includes zero. SqlMathUtil.java(265)This one is quite trivial. A series of checks was helpless to avert the division by zero.A few more warnings: V6030 The method located to the right of the '|' operator will be called regardless of the value of the left operand. Perhaps, it is better to use '||'. OperatorUtils.java(573)The programmer wrote the bitwise operator | instead of the logical ||. It means the right part will be executed no matter the result of the left one. If, this typo will end up with an NPE right in the next logical subexpression. V6042 The expression is checked for compatibility with type 'A' but is cast to type 'B'. VectorColumnAssignFactory.java(347)We are interested in the classesand. The check that theobject is an instance ofexplicitly suggests that it is an object of this class that will be dealt with in the body of the conditional statement. Despite that, however, it is still cast to! As you can see, these are different classes except that they are derived from the same parent. As a result, we get aThe same is true in the case of casting to V6060 The 'var' reference was utilized before it was verified against null. Var.java(402), Var.java(395)Here you see a strange check of theobject forafter the dereference has already occurred. In this context,andare the same object (). The presence of thecheck implies that the passed object may be null. So callingwill result in an NPE, instead of the expected, right at the first line. Yes, the checkthere, but, sadly, it's in the wrong place.A few other similar cases, where an object is used before the check:If you ever interested yourself in Big Data if only a bit, then you could hardly be unaware of how significant Apache Hive is. This is a popular project, and quite a big one, comprised of over 6500 source files (*.java). Many developers have been writing it over many years, which means there are a lot of things for a static analyzer to find there. It just proves one more time that static analysis is extremely important and useful when developing medium and large projects!Note. Single-time checks like the one I did here are good for showcasing the analyzer's capabilities but are a totally improper scenario of using it. This idea is elaborated on here and here . Static analysis is to be used regularly!This check of Hive revealed quite a few defects and suspicious fragments. If the authors of Apache Hive come across this article, we'll be glad to help with the hard job of improving the project.You can't imagine Apache Hive without Apache Hadoop, so the Unicorn from PVS-Studio may well pay a visit to that one too. But that's all for today. Meanwhile, I invite you to download the analyzer and check your own projects.