Hello there fellow Code Reversers!

In this tutorial, we are going to learn the basics of how CIL Assembly works and get our hands dirty, building your very (first?) own .NET Decompiler!

Table of content

Introduction

tl;dr I first wrote this tutorial over at www.zerratar.com but due to being more active here I've decided I wanted to share this article on this website instead. And continue with part 2 here as well :-)

Over a year ago, I threw myself into the task of learning some CIL for a project called PapyrusDotNet; (Available on github). A program that would translate/compile .NET Code and return Skyrim's own scripting language, Papyrus assembly code. Making it possible to code your very own Skyrim Scripts directly in C#!



I instantly fell in love with the technology I used back then and how much you really can do with just some knowledge in the area!



For PapyrusDotNet I used a library called Mono.Cecil, created by JB Evain, it is a beautiful library

enabling Reading/Writing CLR Assemblies. I recommend you to read as much as you can about it as the power of Mono.Cecil is just incredible.



In this tutorial is we are going to use Mono.Cecil for reading assemblies.

I might write another tutorial in the future that will use "Roselyn" to format the code.

I did not want to cover Roselyn here it would make this tutorial like supah huuge!



Neither will this tutorial cover the complete aspect of creating a fully-fledged .NET Decompiler, however it will give you a simple starting point, learning to understand the power of Mono.Cecil. Or at least bring the interest in using this wonderful library for future projects of yours.

Please note

This article will be split into multiple parts in the future, due to its length. This is also not the complete tutorial but still just a part of it. The real part "2" is still being written and will cover a more advanced way of parsing CIL and creating a better core.

(The number 2 will be changed as soon as this one gets split up. That means, what I mention as part 2 during this article may actually be part 3. Sorry for the confusion!)

Be aware, a note to everyone reading this tutorial:

As the time of this writing, I'm actually creating the material for this

tutorial at the same time. I've also never created a Decompiler before.

Nor am I an expert in CIL or with the full extent of Mono.Cecil and Roselyn.

That means, what you will read in this tutorial is something completely

new for me as well. Knowing this, the code or explanations might not be "perfect" or even fully correct...You have been warned!

Requirements

Intermediate skills in C#, as I will not describe every line of code.

Visual Studio 2013 or higher, I will be using VS2015 in this Tutorial.

I recommend you download the Visual Studio 2015 RC Community Edition

It is completely free and offers everything necessary for this tutorial.

I recommend you download the Visual Studio 2015 RC Community Edition It is completely free and offers everything necessary for this tutorial. Patience, you will need it. This will be a long read!

(Optional) Previous experience of CIL or Mono.Cecil can be very useful.

Getting started, get that awesome project running!

Start that lovely Visual Studio of yours! And it is time to create a new Console Application Project, I will call it Tutorials.SimpleDecompiler, you can of course, call it whatever you like.

Awesome! We've gotten somewhere already! Next thing you should do is to install the following NuGet packages with the console lines:

Install - Package Mono . Cecil

You can open up the NuGet console by going to Tools > NuGet Package Manager > Package Manager Console



Never used NuGet before?

NuGet is a wonderful service, it is basically a online library of references that you can use in any of your .NET projects. You can either install them using the GUI that Visual Studio provides, or you can install them the way we just did, using the console.

By now, it should have added a couple of new dll references to your project. These will be our core for grabbing information from an already compiled .NET assembly.

Mono.Cecil will be used for parsing the target assembly. Gosh! The easy part is done. Good job so far!

Starting somewhere over the rainbow, with a pot of code.

You've guessed it! We are starting off with some code right away. In our newly project, I took the liberty to write some basic code to read through an existing assembly file. Start with replacing all your code in Program.cs with the following:

using System ; using System . Collections . Generic ; using System . Reflection ; namespace Tutorials . SimpleDecompiler { class Program { public string just_a_dummy ; static void Main ( string [ ] args ) { DecompileAssembly ( Assembly . GetExecutingAssembly ( ) . Location ) ; Console . ReadKey ( ) ; } public static void DecompileAssembly ( string fileName ) { var asm = Mono . Cecil . AssemblyDefinition . ReadAssembly ( fileName ) ; Console . WriteLine ( "{0} successefully loaded." , asm . Name . Name ) ; foreach ( var module in asm . Modules ) { Console . WriteLine ( "{0} ({1})" , module . Name , module . Architecture ) ; var types = module . GetTypes ( ) ; foreach ( var type in types ) { Console . WriteLine ( "\t- {0}" , type . FullName ) ; var methods = type . Methods ; foreach ( var method in methods ) { Console . WriteLine ( "\tMethod\t- {0}" , method . Name ) ; } var fields = type . Fields ; foreach ( var field in fields ) { Console . WriteLine ( "\tField\t- {0}" , field . Name ) ; } } } } } }

Oh snap! That is a lot to read just off nothing. But the code itself should not be too hard to understand. I've tried to add as descriptive comments as possible.

Right! So, what is this? Well...

After having our awesome references to Mono.Cecil, we now load the current executing assembly of ours and just list its own modules, types, methods and fields!

Don't wait any longer! Try it out! Hit F5 and see for yourself!

Ohh.. Mono.Cecil, how you fill my dreams with compassion, joy and love.

Let's continue.

The declaration of a method

Now that we have a basic loop for going through our assembly. We want to actually grab some more detailed information about each part!

So lets start with our method loop! Mono.Cecil gives us a lot of information that we can use straight away. We can check if the method is static, is public, is protected, is an abstract method, is a generic instance, and much more!

To get some more information about our methods that are being declared in our fancy assembly we are parsing. We want to add a new function:

public static string GetMethodDecleration ( MethodDefinition method ) { var output = ( method . IsPublic ? "public " : "private " ) + ( method . IsStatic ? "static " : "" ) + ( method . IsAbstract ? "abstract " : "" ) ; if ( method . IsConstructor ) { output + = method . MethodReturnType . ReturnType . Name + " " + method . DeclaringType . Name ; } else { output + = method . MethodReturnType . ReturnType . Name + " " + method . Name ; } if ( method . HasParameters ) { var parameters = new List < string > ( ) ; foreach ( var parameter in method . Parameters ) { parameters . Add ( parameter . ParameterType . Name + " " + parameter . Name ) ; } return output + "(" + string . Join ( "," , parameters ) + ")" ; } return output + "()" ; }

Sorry for leaving out the comments here.

In this function, we check if the method we are parsing is public, is static or is abstract. We can add add additional checks as, is virtual, and more. I will however leave it for now as it will give a better understanding of how CIL works when we can experience our changes up front.



We then check what kind of output Type this method returns and adds its parameters. If any.

Note that we do a conditional check on method.IsConstructor.

We do this because the constructors that are being generated in the assembly is always

called .ctor, which is not how we would actually see it when writing code. A constructor in your always have the private/public CLASS_NAME(). And since .ctor != CLASS_NAME.

We can't just use method.Name to return the name of the constructor.

If a user has not written a parameterless constructor in his/her class.

Then the compiler will automatically create one for you.

We then return the full name of the method together with the used parameters etc.

Alright! Time for you to update the method loop with the following code:

var methods = type . Methods ; foreach ( var method in methods ) { var methodDecleration = GetMethodDecleration ( method ) ; Console . WriteLine ( "\tMethod\t- {0}" , methodDecleration ) ; }

Don't wait, hit F5!

Proceeding with the fields

As with the methods above, we want to do the same thing for our fields. If you haven't noticed it already. We have a property called just_a_dummy at the top of our class. It is also shown when running the application. However, we don't know anything about it yet. All we know is that it is a property. That's it.

What we want to do is to generate some more detailed information for us to see.

So add the following function to your class:

public static string GetFieldDecleration ( FieldDefinition field ) { var output = ( field . IsPublic ? "public " : "private " ) + ( field . IsStatic ? "static " : "" ) ; return output + field . FieldType + " " + field . Name ; }

This is function is very similar to what we already have created for getting our method declarations, just slightly simpler.

We will keep it like this for now, don't forget to update your Field loop. It should look something like this

var fields = type . Fields ; foreach ( var field in fields ) { var fieldDecleration = GetFieldDecleration ( field ) ; Console . WriteLine ( "\tField\t- {0}" , fieldDecleration ) ; }

F5 it once again!

You should now see some useful details about your fields as well! Great job!

But... But... With only knowing the field and method definitions, We can't really make a decompiler. So far this isn't anything more than what you would expect from the assembly browser already existing in Visual Studio. Not only that, we still don't know much about our actual type!

Close friendship with my Types

You could probably guess it, next thing we want to do is to add new function to read the information from our types.

So without any further ado. Please add the following code to your class:

public static string GetTypeDecleration ( TypeDefinition type ) { var output = ( type . IsPublic ? "public " : "private " ) + ( type . IsSealed ? "sealed " : "" ) + ( type . IsAbstract ? "abstract " : "" ) + ( type . IsInterface ? "interface " : "" ) + ( type . IsEnum ? "enum " : "" ) + ( type . IsClass ? "class " : "" ) ; if ( type . HasGenericParameters ) { var parameters = type . GenericParameters . Select ( t = > t . Name ) . ToList ( ) ; if ( parameters . Count > 0 ) return output + ( type . Name . Replace ( "`1" , "" ) ) + "<" + string . Join ( "," , parameters ) + ">" ; } return output + type . Name ; }

Alright! Yet another Get.. Declaration function. This one slightly differs a bit from the rest of the functions.

As you can see, I've added a conditional check against type.HasGenericParameters, some classes like List<T> contains a generic parameter T. And what this does is checking if a parameter as such exists or not.

There is also something more interesting going on. We are modifying the type.Name by replacing any instances of `1 into oblivion.

Types that exposes generic parameters will also have the additional `1 signature on their names. This to distinguish them from a "normal" type. Sorry, you will have to correct me here if I'm wrong. But that's the cases I've seen during my days of playing around with CIL.

Next thing, as with the others. Lets use this new function inside our Type loop:

foreach ( var type in types ) { var typeDecleration = GetTypeDecleration ( type ) ; Console . WriteLine ( "\t- {0}" , typeDecleration ) ;

This should do the trick!

F5! F5! F5! Sorry! You only need to press that once!

At this point, you will now have a more defined information regarding the types available in the assembly. Sweet!

However. You're also probably seeing a lot of extra stuff being added.

I myself had some extra classes and stuff that was actually being part of Mono.Cecil. Ouch!

Assemblies to test against

We have gotten pretty far, but still a lot to do!

As our console is getting more and more smudged with lots of information. It will get harder to read and understand what is happening. So go ahead and create a new Solution folder, I called this one Test Libraries

Inside this folder, I created a new Library Project (.dll) called Test1

This newly created project will be our sandbox, it will be where we push in code, types, fields, methods, etc. Anything we want to try and get to work!

You can rename the class1.cs into whatever you would like. I just called mine AwesomeClass for now.

Alrighties! Jump back to our "decompiler".

We want to point to this new library, rather than the executing assembly.

Replace your static void Main function with the following code.

static void Main ( string [ ] args ) { DecompileAssembly ( @"..\..\..\Test1\bin\Debug\Test1.dll" ) ; Console . ReadKey ( ) ; }

Run forest, run! Hit F5!

Sweet! We now don't have as much stuff plottered in our console.

However, the console will soon be smudged again unless we update

the way we present our parsed assembly.

So for now, lets replace the content of our DecompileAssembly function into something like this:

public static void DecompileAssembly ( string fileName ) { var asm = Mono . Cecil . AssemblyDefinition . ReadAssembly ( fileName ) ; Console . WriteLine ( "{0} successefully loaded." , asm . Name . Name ) ; Console . WriteLine ( ) ; foreach ( var module in asm . Modules ) { var types = module . GetTypes ( ) ; foreach ( var type in types ) { var typeDecleration = GetTypeDecleration ( type ) ; Console . WriteLine ( "{0}" , typeDecleration ) ; Console . WriteLine ( "{" ) ; var methods = type . Methods ; foreach ( var method in methods ) { var methodDecleration = GetMethodDecleration ( method ) ; Console . WriteLine ( " {0}" , methodDecleration ) ; } var fields = type . Fields ; foreach ( var field in fields ) { var fieldDecleration = GetFieldDecleration ( field ) ; Console . WriteLine ( " {0}" , fieldDecleration ) ; } Console . WriteLine ( "}" ) ; } } }

Don't forget to run again! F5.

Wow. Much.. So much cleaner now, huh?

On a closer notice, unless you noticed it earlier. When a void methodname is being listed, the return type is actually seen as Voidand not void.

We could change this now by doing an conditional check inside GetMethodDecleration and replace Void with void. I will leave it as is; for now. And come back to this again later.

Before we get too deep into adding tons and tons of code in our poor little

Program.cs, we should create a new class to hold our "helper" functions.



Let's call it... Assembly Wait for it... Helper!.. AssemblyHelper!

Hahaha. I'm really sorry about that name.

I couldn't come up with anything else right now. You can of course name it whatever you like. But I will stick with this for now.

Now, take all methods you've created so far in your Program.cs file and

paste it into your AssemblyHelper class. But don't include the Main method.

We will need that to stay put or else the program wont run.

If you want to, you can keep the DecompileAssembly in Program.cs, you may move it as you wish, but I will keep that one as well (including the Main method).

Ah, fresh! I rather keep my little helper functions separated from the main code. As those Helper classes can tend to grow very big.

Keeping things clean

Aight shoo! At this point, we do have some nice information showing up in our pretty little console. However, this is still far from being usable.

What we want to do next though, is to make the output code a bit easier to manage.

Jump into the AssemblyHelper.cs and add the following function:

public static string Indent ( string inputFormat , int numberOfIndents , params object [ ] vars ) { var output = inputFormat ; for ( var i = 0 ; i < numberOfIndents ; i ++ ) { output = " " + output ; } if ( vars . Length > 0 ) return string . Format ( output , vars ) ; return output ; }

This little function is going to help us to create Indents in our output code. You know... Tabs!

Go back to your DecompileAssembly method and replace it with the following code:

public static void DecompileAssembly ( string fileName ) { var asm = Mono . Cecil . AssemblyDefinition . ReadAssembly ( fileName ) ; Console . WriteLine ( "{0} successefully loaded." , asm . Name . Name ) ; Console . WriteLine ( ) ; foreach ( var module in asm . Modules ) { var types = module . GetTypes ( ) ; foreach ( var type in types ) { var typeDecleration = AssemblyHelper . GetTypeDecleration ( type ) ; Console . WriteLine ( "{0}" , typeDecleration ) ; Console . WriteLine ( "{" ) ; var methods = type . Methods ; foreach ( var method in methods ) { var methodDecleration = AssemblyHelper . GetMethodDecleration ( method ) ; Console . WriteLine ( AssemblyHelper . Indent ( "{0}" , 1 , methodDecleration ) ) ; if ( ! method . IsAbstract ) { Console . WriteLine ( AssemblyHelper . Indent ( "{" , 1 ) ) ; Console . WriteLine ( AssemblyHelper . Indent ( "}" , 1 ) ) ; } } var fields = type . Fields ; foreach ( var field in fields ) { var fieldDecleration = AssemblyHelper . GetFieldDecleration ( field ) ; Console . WriteLine ( AssemblyHelper . Indent ( "{0}" , 1 , fieldDecleration ) ) ; } Console . WriteLine ( "}" ) ; } } }

Sweet loordi! Don't forget to try it out! F5.

Wow, this is actually starting to look something like code!

But lets not end here. Since, even if we got a somewhat structured starting point. We are still missing all the vital parts, there are no method bodies! And we even have a strange private class <Module>, seriously?? I was trying to figure out a good way to explain what it was. But I can't.

I'm no expert at this thing, I do know however that its something we don't want to be outputted in our decompiled code.

Let's just get rid of it for now, if I do find out the proper use of it. Then I will cover it later on as we progress.

Alright, so all you have to do is to add a new line directly under your types loop.

foreach ( var type in types ) { if ( type . Name == "<Module>" ) continue ;

Yep. that should do it. If we encounter the type name <Module>, we should just skip that one and continue as nothing have happened. I'm sorry Module for ignoring you, but it is for the best.

Mirror mirror on the wall, where the hell is the CIL in it all?

Don't worry! We are getting there. But right now we have made a bit of changes, so you should probably try it out. But..

Before I start screaming "F5!". I want you to add another thing. Inside your method loop, we have a comment stating "Time to grab some body for this method!". Well.. Let's do that. Replace the code inside your if(!method.IsAbstract) condition, with:

if ( ! method . IsAbstract ) { Console . WriteLine ( AssemblyHelper . Indent ( "{" , 1 ) ) ; foreach ( var instruction in method . Body . Instructions ) { Console . WriteLine ( AssemblyHelper . Indent ( "{0}: {1} {2}" , 2 , instruction . Offset , instruction . OpCode , instruction . Operand ) ) ; } Console . WriteLine ( AssemblyHelper . Indent ( "}" , 1 ) ) ; }

Alright, before I start explaining what we just added. I want you to hit F5 once again!

You should be amazed now. I know I am. What we are actually seeing is the method body of our little constructor we have in our test library.

The results I can see in my Console is as following:

public class AwesomeClass { public Void AwesomeClass ( ) { 0 : ldarg .0 1 : call System . Void System . Object : : . ctor ( ) 6 : nop 7 : ret } }

Oh deary! First of all, those lines might not really say much to you; if you do. Then you might not need to hear my explanation of it.

Alright, so what we added to our method body was all the instructions available. Each and every method described in our types will have instructions. Instructions is what tells the .NET Runtime what to do.

But wait! All I see is a number, some weird text and some more text. If you look at the line we added above:

Console . WriteLine ( AssemblyHelper . Indent ( "{0}: {1} {2}" , 2 , instruction . Offset , instruction . OpCode , instruction . Operand ) ) ;

We supply something called an Offset, a OpCode and a Operand.

Since assemblies are binary files filled with 0 and 1s or bytes if you prefer that. It is something that the computer will interperate and do something magically as showing a cat video on youtube.

Offset

Each instruction take up their designated space inside the assembly, and the offset is just to tell where in the assembly we found the instruction.

OpCode

Each instruction has a OpCode, this is to represent what we want to do. Such as jumping between the code, assigning a number, calling a function, etc.

Operand

The operand will describe what to use with the OpCode.

If we use the OpCode call, as seen above. We would need to supply a operand to specify what method we want to call. Or if we have a number and want to do some math to it, then we would use the operand

to know what number to apply.

-- I may be wrong here, the OpCode property also contains a Value. Refer to that one if i'm missing something.

For a list of all OpCodes, please take a look here.

This will come very handy!

So now for the actual OpCodes that are being used in our above example.

ldarg.0

ldarg.0 is short for load argument at position 0.

There are a total of 5 different ldarg OpCodes.

ldarg.1, ldarg.2, ldarg.3 and ldarg <num_operand>



Not only that, ldarg.0, as you might believe would

be the very first parameter of your your function/method.

This is wrong. The first parameter is actually at position 1.



So if we have a method with two parameters, lets say the following



public void test ( int a , int b ) { }

To load the first parameter a, you would actually

use ldarg.1 to load the value of a into the stack.

So what the hell is ldarg.0 doing?

What ldarg.0 basically does is to load the active instance

of that class/or method. ldarg.0 would correspond to the keyword this in C#.

This will lead on to the next OpCode.

call System.Void System.Object::.ctor()

call <return-value_method-origin::method> operand

Alright, what is this? using the OpCode call, will instruct the computer/code to

execute the method .ctor() and as we know from earlier. .ctor() is our actual constructor. But.. Why are we calling a constructor inside our constructor? Each type that we declare, in this case. A class.

All per default is extended from the primitive type Object.

And what happens behind the scene is that the constructor is basically saying

We want to run the base Type constructor as well.



That's where the previous ldarg.0 comes into place.

Whenever loading a parameter, a value or what not. It is being pushed into the stack.



The stack is basically a list of values we want the .NET Runtime to

use together with something else.

Everything you push into the stack will be used with whatever OpCode comes next.

We will be coming back to this very very soon.

nop

nope nope nope! I'm boarding the Nope train! Anyone with me?

nop is short for no operation. That means we have a empty space here.



It has several uses, for instance after calling a method with the call instruction.

We automatically pushes the return value/results from the call into the stack.

And since we do not want to do anything with the value we returned.

We could use a nop instruction to just do nothing.



nop instructions however are the first thing to be removed from the compiler if you

set to build your assembly with "optimized code" or under "Release" instead of "Debug".

As their main function is to give the user space for debugging.

ret

Can you guess it?... Yep you got it! ret stands for return, all methods will have a return instruction. Even a void will have one, although it may not return anything from the method itself. it is still there to tell the Runtime to terminate the function.

Further understanding of what is running under the hood

Wow, previous part was long. I must apologize for that! And I also hope that you learned something from it. Or I wouldn't expect anything less than some flaming and hate coming towards me right now.

Don't worry, I can't teach you everything. I don't know everything. I only want to give you a somewhat understanding of these things.

Regardless, let's continue.

Let's add two new functions in our AwesomeClass.cs, our test library.

public void test ( int a , int b ) { a = 4 ; b = c ( ) ; } public int c ( ) { return 10 ; }

Alright, these are very very basic functions. test has two parameters. a and b, all we do is assign these parameters with new values. One directly just assigning with the number 4 and the second one with the help of another function. c(), the second function just returns the value 10.

Hit F5 and check it out! (Don't forget to build your test library, or it might not show up!).

Our result is as following:

public class AwesomeClass { public Void test ( Int32 a , Int32 b ) { 0 : nop 1 : ldc . i4 .4 2 : starg . s a 4 : ldarg .0 5 : call System . Int32 Test1 . AwesomeClass : : c ( ) 10 : starg . s b 12 : ret } public Int32 c ( ) { 0 : nop 1 : ldc . i4 . s 10 3 : stloc .0 4 : br . s IL_0006 : ldloc .0 6 : ldloc .0 7 : ret } public Void AwesomeClass ( ) { 0 : ldarg .0 1 : call System . Void System . Object : : . ctor ( ) 6 : nop 7 : ret } }

As you can see, we are now having a few new instructions that I didn't cover earlier.

Inside test

1: ldc.i4.4 // Load Constant 4xByte Integer 4 and push to the stack

2: starg.s a // Store the pushed value into the named argument a

4: ldarg.0 // this.

5: call System.Int32 Test1.AwesomeClass::c() // Call this.c()

10: starg.s b // Store the value return from the row above and assign it to our variable named b

12: ret // Terminate the method.

Inside c

1: ldc.i4.s 10 // Load Constant 4xByte integer 10 and push into the stack

3: stloc.0 // Store the pushed value in our temporarily created variable

4: br.s IL_0006: ldloc.0 // Branch/Jump to offset 6

6: ldloc.0 // Pop the temporarily variable, get it ready to be used

7: ret // Return the value from previous step

With the comments above, I try to explain on what is happening here.

In our first method, test we are loading the value 4 and assigning our variable a with that value.

We then load this instance and call the function c from the current instance.

Finally, we assign our parameter b with the value we returned from the function.



The second method, c, we can see that a lot more stuff is happening.

As we are building using Debug, we get additional instructions to store the value into a temporarily

variable and then doing a useless jump to load the value once more before returning.

Wow... Pointless! Perhaps for what is trying to do, but it gives the debugger more information necessary to actually grab the values to make it easier for reading them. Let's try and switch to "Release" mode, and change our following line

DecompileAssembly ( @"..\..\..\Test1\bin\Debug\Test1.dll" ) ;

into

DecompileAssembly ( @"..\..\..\Test1\bin\Release\Test1.dll" ) ;

And hit F5!

This is our result now:

public class AwesomeClass { public Void test ( Int32 a , Int32 b ) { 0 : ldc . i4 .4 1 : starg . s a 3 : ldarg .0 4 : call System . Int32 Test1 . AwesomeClass : : c ( ) 9 : starg . s b 11 : ret } public Int32 c ( ) { 0 : ldc . i4 . s 10 2 : ret } public Void AwesomeClass ( ) { 0 : ldarg .0 1 : call System . Void System . Object : : . ctor ( ) 6 : ret } }

And as you can see, from constructor we no longer have the nop instruction.

And we no longer have the extra branching in our c() method.

It simply just loads the value 10 and returns it directly.

Lets keep it at Release for now.

This first half of Part 1 ends here.

Continue to the second half of Part 1