Reverse engineering intermediate languages like .NET

 

What is Reverse Engineering?

Reverse engineering is the process of taking apart software or technology to understand how it works, even without having its original source code. This means carefully analysing the code and reconstructing its functions based on observations and knowledge of programming.
 

Why is Reverse Engineering Necessary?

Reverse engineering is important because sometimes we need to understand how a piece of software works without having access to its original code. Normally, when developers write software, they label their variables and functions clearly so that others can easily understand what the program does.

For example, imagine an online shopping website. A developer might use a variable called Cart to store a list of items a customer wants to buy. There could also be a function called add_product_to_cart, which finds the right product and adds it to the list. This makes it easy for other developers to understand the code and modify it if needed.

However, companies that sell software or hackers who create malicious programs may not want people to see how their code works. They use techniques to hide the original structure of the code, making it difficult to read. This is where reverse engineering becomes useful. If you can't trust the source of a program, it's dangerous to run it without understanding what it does. Security experts, software developers, and researchers use reverse engineering to analyse unknown software, detect security threats, and even fix or improve existing programs.

A Simple Example: The Locked Box

Imagine you find a locked box with several buttons but no instructions on how to open it. You don’t know what is inside, but you’re curious. You start pressing the buttons in different sequences, watching what happens each time. Maybe a light turns on when you press one button, or you hear a click after pressing another. By experimenting, you slowly figure out which buttons need to be pressed, and in what order, to unlock the box. 

Reverse engineering software works in a similar way. Instead of buttons, software engineers analyse how a program responds to different inputs. By testing, observing patterns, and making educated guesses, they can figure out how a program was built and how it functions — even if they don’t have access to the original source code. 

Obfuscation: Hiding How Software Works 

Some software creators don’t want their code to be understood, so they use a technique called obfuscation to make it harder to analyse. This is like removing the labels for the buttons on the locked box so that even if someone finds them, they can’t easily understand how to open it. Sometimes, they might even add buttons without functionality to waste your time, in the hopes that you will give up. Later, we will see that there is a fundamental insight that will allow us to deduce the functionality of any program we choose, if we are willing to invest the time into understanding it. 

For example, instead of using clear variable names like Cart and add_product_to_cart, an obfuscated program might rename them to random, meaningless names like X1A and F2G. Instead of readable instructions, it might use unnecessary or confusing steps to make the program harder to follow. Some programs even detect when they are being analysed and change their behaviour to avoid being understood.

Let us illustrate this with an example. Here is a simple C# program that reads a file and prints its contents:

using System; 

using System.IO; 

 

class ReadAndPrintFile 

    static void Main() 

    { 

        string filePath = "secret.txt"; 

        string content = ReadSecretFile(filePath); 

        Console.WriteLine(content); 

    } 

 

    static string ReadSecretFile(string path) 

    { 

        return File.ReadAllText(path); 

    } 

}

and here is the obfuscated version where we’ve changed several things: 

  • The program name is now just X3 instead of ReadAndPrintFile 
  • The file name is encoded in Base64 ("U2VjcmV0LnR4dA==" decodes to "secret.txt") and needs to be decrypted. 
  • The function names are changed (ReadSecretFile → P). 
  • Junk code (meaningless conditions) is added to confuse reverse engineers. 

using System; 

using System.IO; 

 

class X3 

    static void Main() 

    { 

        string y = Decrypt("U2VjcmV0LnR4dA==");   // Encrypted filename 

        string z = P(y); 

        Console.WriteLine(z); 

    } 

 

    static string P(string x) 

    { 

        if (x.Length > 0) { }  // Junk code 

        return File.ReadAllText(x); 

    } 

 

    static string Decrypt(string s) 

    { 

        byte[] b = Convert.FromBase64String(s); 

        return System.Text.Encoding.UTF8.GetString(b); 

    } 

}

The program still does the exact same thing, but it’s less readable to someone analysing it. 

If we were now to deobfuscate this again, we might arrive at a program like this

using System; 

using System.IO; 

using System.Text; 

 

class Program 

    static void Main() 

    { 

        string filePath = DecodeBase64("U2VjcmV0LnR4dA=="); // Looked up externally and found to decode to 'secret.txt' 

        string content = ReadSecretFile(filePath); 

        Console.WriteLine(content); 

    } 

 

    static string ReadSecretFile(string path) 

    { 

        return File.ReadAllText(path); 

    } 

 

    static string DecodeBase64(string input) 

    { 

        byte[] data = Convert.FromBase64String(input); 

        return Encoding.UTF8.GetString(data); 

    } 

}

Deobfuscation: Undoing the Confusion

Just as reverse engineers can figure out the correct button sequence on the locked box, they can also deobfuscate software — undoing the tricks used to hide its logic. This often involves renaming variables to something meaningful, simplifying complex code structures, and removing unnecessary steps. 

For example, if a reverse engineer finds a function named F2G, they might figure out that it is actually adding a product to a cart, so they rename it add_product_to_cart to make the program easier to understand. Over time, with enough effort, they can reconstruct the original logic of the software, just like solving the puzzle of the locked box. 

Obfuscation Can’t Hide Everything 

Even though obfuscation makes code harder to read, it has a fundamental weakness: a program still has to execute real instructions on a computer. No matter how much code is scrambled, at some point, it has to interact with the operating system — this is where system calls (syscalls) come in. 

A syscall is a request a program makes to the operating system to perform basic tasks, like reading a file, sending data over the internet, or allocating memory. These syscalls must follow strict rules defined by the operating system, meaning they can’t be obfuscated. 

This is a key insight in reverse engineering. Even if a function name is hidden, and the logic is full of unnecessary complexity, the program still has to make syscalls to do anything useful. By monitoring these syscalls, reverse engineers can deduce the real purpose of a program without needing to understand every line of obfuscated code. 

How This Relates to .NET

The .NET framework (used for C#, VB.NET, and F# applications) is particularly interesting in the context of obfuscation and reverse engineering because .NET programs don’t compile directly to machine code. Instead, they compile into Intermediate Language (IL), which is then executed by the Common Language Runtime (CLR) at runtime. 

Because .NET applications rely on IL, they are easier to analyse than native machine code. Reverse engineers can use tools like dnSpy or ILSpy to decompile a .NET program, often recovering large portions of readable code — even if some obfuscation is applied. 

However, even if a .NET program is heavily obfuscated, it still has to interact with the underlying Windows system using P/Invoke or system calls. For example, a program calling System.IO.File.ReadAllText("secret.txt") must eventually call a Windows syscall like NtReadFile to read the file and a malware program trying to inject itself into another process might use System.Diagnostics.Process.Start(), which ultimately calls CreateProcessW in Windows. 

Tracking Syscalls to Uncover Behaviour

Reverse engineers can ignore obfuscation entirely and instead monitor system calls to figure out what a program is doing. This is done using tools like:

  • Process Monitor (ProcMon) – Shows what files, registry keys, and network connections a program interacts with. 
  • WinDbg – A debugger that can inspect running .NET applications at a low level. 
  • API Monitor – Captures API calls made by a .NET application.

Since all programs must eventually interact with the operating system, reverse engineers can track system calls to uncover what’s really happening. In .NET applications, this is even easier because the code remains in an intermediate form, making decompilation more straightforward. 

No matter how well a program tries to hide its logic, it still has to ask the operating system to do the actual work — and that’s where reverse engineers can step in.

Why This Matters

Reverse engineering and deobfuscation are important for many reasons but they are particularly important in the context of security. To fully understand the timeline during a cyber security incident, it is vital to know the functionalities of all the tools involved. This is not only important in assessing the damage or data exfiltration capabilities of the attack but especially important during the clean-up phase to determine if any backdoors have been left behind. 

If you have recently found a piece of software within your network that you suspect to be malicious, our trained reverse engineering experts at BDO Cyber Security will gladly help you examine it and provide recommendations and measures to take based on the type of sample you provide.