Comparing F# and C# with dependency networks

9 June 2014

Fans of different programming languages always argue about benefits of their language of choice. It is difficult to use objective criteria in a debate like this. Terms like ‘clarity’ or ‘maintainability’ are too vague and subjective. What if we used some tools from network science to compare projects written in different languages?

In this blog post I use network analysis to investigate how complex dependency graphs are and if they differ between C# and F#. It turns out that F# and C# dependency networks have quite different structures and use different local network patterns. For example, I’ll describe specific types of cyclic dependencies that frequently appear only in C# projects.

This blog post is an addition to an excellent article by Scott Wlaschin on modularity and cyclic dependencies in real-world F# and C# projects Cycles and modularity in the wild. I wanted to look at the same data that Scott extracted in his article but from network analysis perspective.

Dependency networks

For my analysis, I extracted dependency networks from 40 different projects, half of them written in C# and half of them in F#. All the networks come from compiled assemblies that can be downloaded through NuGet. I used similar method as Scott for my analysis. If you want more details, head over to F# for fun and profit for a more detailed description. I’ll give just a brief overview here.

Structure of a dependency network

A dependency network is formed by nodes and oriented links between them.

Nodes in the dependency network are formed by

Classes in C#

Modules in F#

Compiler turns F# modules into static classes so the two definitions should be roughly comparable, at least on the CIL level. Both types of nodes represent only top-level classes and modules, nested types are incorporated into their parent class or module. The networks analyzed in this blog post contain all the classes and modules from each project, not just the public ones.

Links between the nodes represent dependencies. There is a link from A to B in the network if:

Class B inherits from class A or implements interface A.

Function in B calls a function or method from A.

Field, property, method or function in B references A as a parameter or as a return type.

Note that I switched direction of dependency arrows in the network compared to the original article at F# for fun and profit. Now links represent the direction in which information is passed between nodes. This definition corresponds more to the logic of information flow in a program. For example if there is a bug in a function, it will propagate along the dependency arrows into all nodes that call the function.

Projects under the spotlight

I expanded the list of analysed projects compared to the original analysis at F# for fun and profit. Again, the projects are not directly comparable in general. I hope that by using more projects, data get averaged and we can get a bigger picture out of them. The results are still biased from the small sample size though.

Here are the 40 projects (individual dlls) that got included into the analysis (in no particular order):

C# projects:

Antlr, AutoMapper, Castle, elmah, EntityFramework, FParsecCS, log4net, MathNet.Numerics, SignalR, Bcl.Runtime, Owin, Cecil, Moq, Nancy, Newtonsoft.Json, Nuget, NUnit, SpecFlow, xunit, YamlDotNet

F# projects:

canopy, Deedle, Fake, Foq, FParsecFS, FsCheck, FSharp.Compiler.Service, FSharp.Core, FSharp.Data, FSharp.Data.Twitter, FSharpx, FsPowerPack, FsSql, FsUnit, FsYaml, Storm, TickSpec, WebSharper, WebSharper.Core, WebSharper.Html

Network statistics

The networks extracted from compiled project dlls have very different sizes. The following chart shows the number of nodes (classes or modules) and number of dependencies in each project. The axes in the figure are logarithmic so that we can put data with different scales into one picture.

Projects written in F# seem to be generally smaller. On the other hand, C# projects tend to be larger both in the number of nodes and number of dependencies. It is interesting that the plot looks approximately like a straight line. This indicates a power law relation between the number of nodes and links both in F# and C# projects.

Next question we might ask is how complex are the networks? One measure of complexity in code depedendency networks might be how many dependencies are chained together in the graph. Long chains of dependencies increase complexity of code. For example, bugs that get propagated through a long dependency path might affect a large part of the whole project. A standard measure for this is the network diameter. It is computed by looking at shortest paths between all possible pairs of nodes in a network. Diameter is defined as the length of the longest of these paths. For diameters in C# and F# projects we get these box plots:

Diameter of analyzed C# projects is on average more than double the diameter of F# projects. Diameters are actually roughly proportional to the number of nodes and links in each network. Because C# has larger networks, diameters expand as well.

One aspect where F# and C# projects differ dramatically is the number of isolated nodes. These represent standalone modules or classes that do not have any dependency within the project. Here is a box plot showing the proportion of standalone nodes.

Isolated nodes appear much more frequently in F# projects than in C# projects. This is probably an effect of different programming paradigms. Object-oriented language like C# might require the programmer to introduce more dependencies into the code. As a result, functional F# has cleaner modularity than C# on average.

Below are images of networks from two different projects as an example. There is Yaml.NET on the left and FSharp.Core on the right. The two projects are not comparable in terms of their scope. However, their networks have roughly the same number of nodes and similar diameter.

FSharp.Core has more isolated nodes that do not have any dependencies within the project which seems to be typical for F# projects. The densely connected core of the project is much smaller than in C#. The two networks are meant just as an illustration of typical features of C# and F# dependency networks.

Here are the detailed numbers for the analyzed projects:

C# code statistics

Project Code size Number of nodes Number of links Isolated nodes Diameter Antlr 34344 91 257 8.8 % 5 AutoMapper 34793 152 549 5.3 % 8 Castle 112538 430 1766 5.6 % 8 elmah 43728 116 300 7.8 % 5 EntityFramework 1144189 1679 11671 4.7 % 16 FParsecCS 32230 35 48 14.3 % 3 log4net 102651 227 746 0.9 % 10 MathNet.Numerics 492095 342 1285 5.6 % 8 SignalR 63690 221 735 6.8 % 11 Bcl.Runtime 73 8 2 62.5 % 1 Owin 13376 55 98 10.9 % 7 Cecil 100650 240 1145 5.0 % 8 Moq 158417 541 1536 11.1 % 14 Nancy 130818 369 1205 5.4 % 12 Newtonsoft.Json 157716 237 1005 4.6 % 13 Nuget 101586 229 943 2.2 % 10 NUnit 45873 183 505 14.2 % 7 SpecFlow 41187 242 578 2.5 % 7 xunit 14590 72 209 1.4 % 7 YamlDotNet 42372 161 550 2.5 % 7

F# code statistics

Project Code size Number of nodes Number of links Isolated nodes Diameter canopy 23630 11 12 27.3 % 2 Deedle 122918 95 249 18.9 % 5 Fake 1395 3 1 33.3 % 1 Foq 38532 40 75 5.0 % 3 FParsecFS 45946 6 4 33.3 % 2 FsCheck 76418 54 103 16.7 % 5 FSharp.Compiler.Service 110523 42 23 50.0 % 2 FSharp.Core 206348 154 287 40.3 % 6 FSharp.Data 135001 94 173 8.5 % 6 FSharp.Data.Twitter 10372 20 29 25.0 % 3 FSharpx 290577 175 77 56.0 % 2 FsPowerPack 102878 93 68 46.2 % 4 FsSql 15311 13 14 0.0 % 4 FsUnit 1580 2 0 100.0 % 0 FsYaml 14573 8 10 12.5 % 3 Storm 55072 67 195 3.0 % 5 TickSpec 27970 34 48 5.9 % 3 WebSharper 43747 56 22 57.1 % 2 WebSharper.Core 83201 12 13 25.0 % 2 WebSharper.Html 14152 19 37 10.5 % 2

Network motifs

We looked at some global properties of dependency networks, now we turn to explore more local features. Motifs are small reccurring patterns of links between nodes that appear in real-life networks. For example, there has been a lot of research on motifs in gene regulatory networks and their functional meaning. We can apply the same approach to our dependency networks to see if there are any typical patterns.

Motif finding in general networks is computationally hard because it involves identifying graph isomorphisms. The larger the motif, the harder it is to find it in a network. In this analysis, I looked only at motifs on three and four nodes. I used the igraph package in R with F# RProvider. The motif finding function from igraph counts the number of times each possible motif on three or four nodes appears in a given network.

Motifs on 3 nodes

There are 13 possible motifs on three nodes. I computed how many times each of these motifs appears in the project networks. Because each network has different size, the counts were normalized with respect to the total number of motifs in each network. The following bar plot compares average frequencies of all the motifs.

Average motif profiles on 3 nodes in C# and F# projects

Motifs number 1, 2, 4 and 5 are the most common in both C# and F# projects. They seem to differ only in how often each motif appears. The results seem quite intuitive because these motifs look like standard patterns that would be expected in a software project. The bar plot shows only the average frequencies and variance between individual projects is quite high. Summary of results for each project is available here.

Motifs that are C#-specific

What is interesting is that there are several motifs that appear in many C# projects but they are not in any of the analyzed F# projects. Here they are:

Additionally motif number 12 appears just once in FSharp.Core and nowhere else among the F# projects. What all these motifs have in common is that they all contain cyclic dependencies. Scott Wlaschin wrote a nice blog post on why cyclic dependencies are evil. Simply said, they add complexity, mess up structure of code and complicate maintainability. So, this is how the evil cyclic dependencies look in real-world projects. Especially motif number 13 with full connectivity looks like something that should be avoided. How frequent are these cyclic motifs?

Motif Number of projects 8 13 9 9 10 14 13 4

The table shows how many projects contain each of the C#-specific motifs. Motifs number 8 and 10 are in majority of the analysed networks which means they are quite widespread. Fortunately, the most entagled motif number 13 is the least common one and occurs only in 4 projects. There are no motifs that would appear only in F# projects.

Motifs on 4 nodes

I will not give the full analysis of motifs on 4 nodes because there are 199 of them. However, there are a few interesting things to point out. Again, F# and C# share the most common motifs which look like patterns that we would expect to see:

And again, we have some motifs that appear exclusively in C# projects, this time we have 129 motifs that are C#-only. There are no motifs that would be just in F# projects. These are the most common C#-specific ones:

These motifs are also quite widespread.

The first one appears in 14 projects, the rest of them in 13 projects. Finally, what about the most complex motif on 4 nodes?

It turns out that this motif appears in 3 of the C# projects (specifically in EntityFramework, Mono.Cecil and Newtonsoft.Json). This pattern looks like quite a poor design choice.

Explore motifs in your projects

If you want to find what is the motif profile of your own project, this FsLab Journal shows how to run the analysis. Source code from the Journal is available here. You can also download the full source code that replicates results from this blog post from my GitHub page.

Summary

In this blog post, I looked at dependency networks in several C# and F# projects. The analysis shows some similarities and differences between the two programming languages. In general, C# projects tend to be larger, with more classes and dependencies. They also have longer chains of dependencies on average. Real world F# projects are smaller with cleaner modularity.

I also described recurring patterns (motifs) that appear in dependency networks. The most common motifs are similar in C# and F# projects. However, most of C# projects contain motifs with complicated cyclic dependencies that do not appear in F# at all. Cyclic dependencies in general complicate the code and obscure dependency structure.

This analysis is still very limited. For example we can debate if

the dependency networks are well defined with respect to both languages to be truly comparable. Nevertheless, it seems that this type of analysis can reveal some aspects of dependency networks.

In general, it seems that most C# projects would be harder to maintain because of all the cyclic dependencies and more complex structure overall. The question is whether it is a feature of the language itself that encourages programmers to create more complex systems.

I also presented a poster on this topic at Cambridge Networks Day 2014.

Correction 13/6/2014: Relation between number of nodes and number of links is a power law function.