Just a while ago, we released a new version of our analyzer PVS-Studio with support of C#-code analysis. With the development paused for the time of the release, I took this opportunity to test the analyzer. For my experiments, I picked projects IronPython and IronRuby. After I scanned them, I thought I could tell you about the analysis results in this small article.

IronPython and IronRuby

IronPython and IronRuby are implementations of programming languages Python and Ruby on the .NET platform. The projects' source codes can be downloaded from GitHub here. The pack also includes the source code of DLR. Starting with .NET Framework 4.0, DLR ships as its integral part, and IronPython and IronRuby use it. But since the old version of DLR happened to be in the pack, I analyzed it too.

Analysis details

So, the whole code is made up of three large parts: DLR, IronPython, and IronRuby, and contains 1630 *.cs files. The analysis was done with PVS-Studio 6.00, which can be downloaded from our website. It took me a bit more than a minute to analyze the solution. The analyzer output 34 warnings of the first level, 15 warnings of the second level, and 280 warnings of the third level.

Out of 34 first-level warnings, 19 turned out to be genuine bugs (which is a good result), and 6 warnings refer to suspicious fragments that should be reviewed. The remaining 9 warnings are false positives, half of which can be eliminated through some improvements to the analyzer itself, which we will make soon.

Among second- and third-level warnings, there were way fewer bugs and suspicious fragments.

Errors found

Now let's discuss examples of real bugs found by PVS-Studio in the projects:

Samples 1 and 2. Carelessness.

private bool Enter(RangeExpression/*!*/ node, bool isCondition) { .... if (!isCondition && litBegin != null && litEnd != null && litBegin.Value is int && litBegin.Value is int) { _result = MakeNode(NodeKind.lit, new Range( (int)litBegin.Value, (int)litEnd.Value, node.IsExclusive)); } else { .... } .... }

PVS-Studio diagnostic message: V3001 There are identical sub-expressions 'litBegin.Value is int' to the left and to the right of the '&&' operator. IronRubyParseTreeOps.cs 277

In the condition, litBegin.Value is checked twice for being of type 'int' instead of checking litEnd.Value as well.

Similar duplicate checks can be found in two more places, for example:

private static PythonTuple ReduceProtocol2( CodeContext/*!*/ context, object self) { .... if (self is PythonDictionary || self is PythonDictionary) { dictIterator = PythonOps.Invoke(context, self, "iteritems", ArrayUtils.EmptyObjects); } .... }

PVS-Studio diagnostic message: V3001 There are identical sub-expressions 'self is PythonDictionary' to the left and to the right of the '||' operator. IronPython ObjectOps.cs 452

Sample 3. Identical expressions.

protected override MSAst.Expression VisitTry( MSAst.TryExpression node) { .... if (newHandlers != null || newFinally != null) { node = Ast.MakeTry(node.Type, node.Body, newFinally != null ? newFinally : node.Finally, node.Fault, newHandlers != null ? newHandlers : newHandlers ); } return node; }

PVS-Studio diagnostic message: V3012 The '?:' operator, regardless of its conditional expression, always returns one and the same value: newHandlers. DebugInfoRewriter.cs 252

In this example, newHandlers is used in both parts of the conditional statement. Actually, it is node.Handlers that was meant to be used when newHandlers is null.

Samples 4 and 5. Carelessness.

public static bool HasValue(RubyContext/*!*/ context, object/*!*/ self, object value) { var strValue = value as MutableString; if (value == null) { return false; } var clrStrValue = strValue.ConvertToString(); .... }

PVS-Studio diagnostic message: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'value', 'strValue'. EnvironmentSingletonOps.cs 189

When casting a variable's type with the 'as' operator, programmers' common mistake is to check the source object, instead of the resulting one, for null and then go on to use an unchecked reference.

Another similar case:

private static RubyRegex/*!*/ ConstructRubyRegexp( RubyConstructor/*!*/ ctor, Node/*!*/ node) { ScalarNode scalar = node as ScalarNode; if (node == null) { throw RubyExceptions.CreateTypeError( "Can only create regex from scalar node"); } Match match = _regexPattern.Match(scalar.Value); .... }

PVS-Studio diagnostic message: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'node', 'scalar'. RubyConstructor.cs 230

Sample 6. Copy-Paste.

private void LoadNewObj(CodeContext/*!*/ context) { PythonTuple args = PopStack() as PythonTuple; if (args == null) { throw PythonOps.TypeError("expected second argument, got {0}", DynamicHelpers.GetPythonType(args)); } PythonType cls = PopStack() as PythonType; if (args == null) { throw PythonOps.TypeError("expected first argument, got {0}", DynamicHelpers.GetPythonType(args)); } .... }

PVS-Studio diagnostic message: V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. This means that the second 'if' statement is senseless. cPickle.cs 2194

In this code fragment, two conditions and calls on function GetPythonType() are totally the same. The second condition was obviously written by copying the first one, but the programmer forgot to change the name of the variable in the copied fragment. There were two more errors of this kind in the project.

Sample 7. Identical conditions.

public static int Compare(SourceLocation left, SourceLocation right) { if (left < right) return -1; if (right > left) return 1; return 0; }

PVS-Studio diagnostic message: V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. This means that the second 'if' statement is senseless. SourceLocation.cs 156

This method seems too simple to make a mistake in it, doesn't it? Nevertheless, the programmer swapped parameters left and right in the second condition for some reason. As a result, both conditions check one and the same thing – and this is what the analyzer didn't like.

The fixed version of the code:

public static int Compare(SourceLocation left, SourceLocation right) { if (left < right) return -1; if (left > right) return 1; return 0; }

Sample 8. Extra condition.

private void WriteSingleQuoted(string text, bool split) { .... while (ending <= text.Length) { c = '\0'; if (ending < text.Length) { c = text[ending]; } if (spaces) { if (c == 0 || c != 32) { .... }

PVS-Studio diagnostic message: V3023 Consider inspecting the 'c == 0 || c != 32' expression. The expression is excessive or contains a misprint. Emitter.cs 308

The 'c' variable is first assigned a default value, '\0'. Then, in case the whole string hasn't been processed yet, 'c' is assigned the next character of the string. At the end, it is checked if 'c' still contains the default value or any other character except space. The zero check is not necessary here, actually, since zero can't be equal to 32 (the space code) anyway. This defect doesn't cause any bugs but makes the code less clear, so the null check should be left out. The analyzer found a few more similar extra checks in this project.

Samples 9 and 10. Incorrect format string.

The general problem about using function String.Format is that the compiler doesn't check if the amount and numbers of a format string's parameters correspond with the numbers of parameters passed to String.Format. It may result in forming an incorrect string or raising a FormatException. See the following examples.

public T Current { get { try { return (T)enumerable.Current; } catch (InvalidCastException iex) { throw new InvalidCastException(string.Format( "Error in IEnumeratorOfTWrapper.Current. Could not cast: {0} in {0}", typeof(T).ToString(), enumerable.Current.GetType().ToString()), iex); } } }

PVS-Studio diagnostic message: V3025 Incorrect format. A different number of format items is expected while calling 'Format' function. Expected: 1. Present: 2. ConversionWrappers.cs 235

In this example, the last parameter is not used. Instead, value typeof(T).ToString() will be printed twice.

private static void DumpGenericParameters( MetadataTableView genericParams, MetadataRecord owner) { foreach (GenericParamDef gp in genericParams) { _output.WriteLine(" generic parameter #{0}: {1}", gp.Index, gp.Name, gp.Attributes); .... }

PVS-Studio diagnostic message: V3025 Incorrect format. A different number of format items is expected while calling 'WriteLine' function. Expected: 2. Present: 3. Program.cs 268

And here, function WriteLine receives one parameter more than suggested by the format string.

Sample 11. Null check after addressing.

public static MutableString ChompInPlace(....) { MutableString result = InternalChomp(self, separator); if (result.Equals(self) || result == null) { self.RequireNotFrozen(); return null; } .... }

PVS-Studio diagnostic message: V3027 The variable 'result' was utilized in the logical expression before it was verified against null in the same logical expression. MutableStringOps.cs 1097

In this condition, the null check and the call on method Equals should be swapped. The way it is written originally, the application may crash, raising a NullReferenceException.

Sample 12. Troubles with syncing.

class DictThreadGlobalState { public int DoneCount; .... } private static void RunThreadTest(DictThreadGlobalState globalState) { .... globalState.DoneEvent.Reset(); globalState.Event.Set(); while (globalState.DoneCount != 0) { // wait for threads to get back to finish } .... }

PVS-Studio diagnostic message: V3032 Waiting on this expression is unreliable, as compiler may optimize some of the variables. Use volatile variable(s) or synchronization primitives to avoid this. EngineTest.cs 2558

This code contains an error that will only show up on some occasions, depending on the execution environment, .NET Framework's version, the number of processors on the computer, and other implementation specifics. Such bugs are very difficult to catch. In this case, the DoneCount variable is not declared as volatile; therefore, the compiler assumes it is used by one thread only and its value can be cached and then restored from the cache all the time as this variable doesn't change inside the loop. In our case, however, it does change in another thread. That's why variables should be declared as volatile when used to sync threads. See MSDN for details.

Sample 13. Double assignment

private static Dictionary<string, EncodingInfoWrapper> MakeCodecsDict() { .... switch (normalizedName) { case "iso_8859_1": d["8859"] = d["latin_1"] = d["latin1"] = d["iso 8859_1"] = d["iso8859_1"] = d["cp819"] = d["819"] = d["latin"] = d["latin1"] = d["l1"] = encs[i]; break; .... }

PVS-Studio diagnostic message: V3005 The 'd["latin1"]' variable is assigned to itself. StringOps.cs 1905

In this code, the d["latin1"] variable is assigned values twice. The second assignment seems to be just superfluous code, not a bug. But it is also possible that this code was meant to handle some code page. Anyway, it should be checked.

Sample 14. Checking an unsigned variable for null

public static int __hash__(UInt64 x) { int total = unchecked((int) (((uint)x) + (uint)(x >> 32))); if (x < 0) { return unchecked(-total); } return total; }

PVS-Studio diagnostic message: V3022 Expression 'x < 0' is always false. Unsigned type value is always >= 0. IntOps.Generated.cs 1967

I'm almost sure it is 'total', not 'x', that should be compared with null because it doesn't look right doing something to 'x' all the time and then check a special case. Besides, 'total' is signed, so the check "total < 0" seems to make more sense.

Sample 15. Identical checks.

public void ReflectTypes(Type[]/*!*/ allTypes) { .... def.Super = null; if (cls != null && def.Extends != typeof(BasicObject) && !def.Extends.IsInterface) { if (cls != null && cls.Inherits != null) { def.Super = new TypeRef(cls.Inherits); .... }

PVS-Studio diagnostic message: V3030 Recurring check. The 'cls != null' condition was already verified in line 373. LibraryDef.cs 374

In both conditions, the 'cls' variable is checked for null. The programmer probably wanted to check 'def' for null in the first condition since they address its property Extends right after the check. But it's not really necessary, either, because 'def.Super' is assigned null right before the condition, which means that 'def' is not null anymore. So, it's just an extra check.

Sample 16. Copy-paste.

Now we've got to the third-level warnings, which make a total of 280. Most of them deal with pairs of functions with identical bodies and comparison of floating-point numbers. I didn't expect to find anything serious here, so I just started skimming the warnings but eventually stumbled upon one interesting thing.

public static bool IsPositiveOne(BigDecimal x) { return IsOne(x) && IsPositive(x); } public static bool IsNegativeOne(BigDecimal x) { return IsOne(x) && IsPositive(x); }

PVS-Studio diagnostic message: V3013 It is odd that the body of 'IsPositiveOne' function is fully equivalent to the body of 'IsNegativeOne' function (351, line 355). BigDecimal.cs 351

This is a real bug that results from copying code from one function to another. The fixed version of the code should look like this:

public static bool IsNegativeOne(BigDecimal x) { return IsOne(x) && IsNegative(x); }

Sample 17. Strange check for NaN.

public static bool Equals(float x, float y) { if (x == y) { return !Single.IsNaN(x); } return x == y; }

PVS-Studio diagnostic message: V3024 An odd precise comparison: x == y. Consider using a comparison with defined precision: Math.Abs(A - B) < Epsilon. FloatOps.cs 1048

I'm not sure why one would need a special check for NaN here. If the (x == y) condition is true, then neither 'x' nor 'y' is NaN because NaN doesn't equal any other value, including itself. That is, the first return statement will always return true. It seems this check for NaN is just superfluous.

Conclusion

I think the analyzer had done well with the analysis of these projects. Firstly, it caught a couple of dozens of interesting bugs, the fixing of which will make the project's code better; secondly, I found a few false positives that can be eliminated by making some improvements to our product. So, I encourage everyone to go download the PVS-Studio demo version and run it on their code.