DCG937 July Meeting: Reverse Engineering C# and Modifying Unity3D Games
- July 25, 2020
- Posted by: Logan Rickert
- Category: Reverse Engineering
This presentation is about introducing people to the basics of reverse engineering C# and applying that knowledge to modifying Unity3D games. It’s aimed at people who have a basic understanding of how programming works and want to learn more about modifying, extending, and experimenting with Unity3D games. The presentation covers some of the basic building blocks behind reverse engineering and reverse engineering and modifying Unity3D games. This knowledge is important for people who are seeking to improve community support for an application or game, add patches or fixes to a game or software that is no longer supported, or improve their skills with learning how some games operate under the hood.
Watch On YouTube
The majority of source code written today is heavily abstracted from the type of code that computers actually understand. This makes the code much easier for humans to read, write, and understand. Programming languages that are abstracted in this way are known as high-level programming languages. In order for this source code to be understood by computers, it must first be translated into something that the computer will understand. This translation is the job of a compiler. In most cases, a compiler will generate what is known as machine code.
Under the hood, computers are extremely dumb but extremely good at doing exactly what they are told. These sets of instructions are called machine code and are represented in binary form, which means that it is just a bunch of 1s and 0s. Humans are exceptionally bad at being able to read binary. To help with this, a human-readable version was created called assembly. It’s usually not an exact 1 to 1 representation but there is an extremely strong relationship between the two.
Not every computer can understand the same machine code. Each different type of CPU has a slightly or drastically different set of instructions. Most Intel and AMD CPUs have a standard of instructions to allow compatibility across many CPUs (and therefore computers). This means that almost all computer programs compiled for that standard will work across every CPU that follows that standard set of instructions. ARM CPUs do not follow that same standard. This is why programs that work on a Windows desktop will not run on a Windows ARM tablet.
Disassembly is the process of taking machine code and generating the assembly that represents it. This is important when analyzing how a program works without having access to the source code. There are a multitude of reasons for analyzing a program, including:
- To audit for vulnerabilities and bugs
- We want to ensure the software running on our machine is safe
- To understand how it works
- Improve documentation
- Understand how the software was designed
- Make sure the software is not doing anything malicious
- If it is doing something malicious, figure out what it is doing to better detect and protect against it.
- Apply fixes
- Fix bugs or crashes in the software
Managed vs Unmanaged Language
Having to recompile the source code for an application in order to allow it to run on any platform can be cumbersome.
- It forces the user to have to know which platform they have
- The user has to download the correct software for the correct platform
- The developers have to host and maintain every platform’s version
- The developers have to recompile the software when a new platform is released.
One of the solutions to this problem is using an intermediate translator. The source code is compiled for just one platform, the intermediary. The translator will take the intermediate instructions from the application and translate them on the fly to the correct machine instructions for the CPU that it is running on. Higher level programming languages that use this translator are considered managed languages. Languages that do not are considered unmanaged languages.
This is how Java and C# work. Java’s translator is called the “JVM” (Java Virtual Machine). A developer can compile their Java code once and know that no matter what platform they run it on, the code will work as intended as long as it’s running on the JVM.
Assembly can be very difficult and tricky to read. While it is possible, it can be extremely difficult to decompile the code, which is to go back from assembly to the original higher level language. This is mainly because compilers are very complex. They will also optimize the output code, scrabbling and rearranging it, making it even harder. To add on top of that, recall from earlier about how each CPU has a slightly different instruction set. That’s a lot of variables to account for.
A lot of these issues will go away though when dealing with managed programming languages and decompiling the intermediate code. The intermediate code retains a lot of the metadata and is standard across all platforms, making it easier to decipher. One of the issues with decompiling is that while the output will be valid source code for the desired programming language, it will not be the same source code.
Dynamically Linked Libraries
There are many processes or routes that developers commonly want to execute throughout many different applications. A set of these routes can be packaged as a library. When the library is compiled, it creates a dynamically linked library. This allows any developer to include these libraries in their code without having to compile the library themselves. This can save a lot of time and space.
Reverse Engineering C#
Introduction to ILSpy
ILSpy is an application that takes a C# executable or DLL and decompiles it. On the left hand side, it will display a list of all of the namespaces, classes, member functions, and member variables. On the right it will display the C# code for that function or class.
ILSpy does not natively support editing the files. In order to do this, a plugin needs to be installed called Reflexil. It’s very easy to do by simply downloading the Reflexil files and dropping them into the root of the ILSpy folder. Reflexil will produce the window seen on the bottom right. It lists all of the intermediate instructions for the C# code that is above it. In order to edit the files, the user must edit their code in C# assembly. This can be quite a pain. For a much better experience, dnSpy is recommended for use instead. For more in-depth information on ILSpy, please consider watching the presentation.
Introduction to dnSpy
dnSpy is similar to ILSpy, except it provides an overall better experience. dnSpy also allows the user to write and edit code in functions as C#. For more in-depth information on dnSpy, please consider watching the presentation.
Link to dnSpy: https://github.com/0xd4d/dnSpy