I have the compiled C code in text format. I need to extract the source code by decompiling the machine code. How to do that?
-
2from a binary you can produce the corresponding assembler source code, with some associated information in case the debug data are present (binary not stripped), but do not expect more easily. Furthermore your can have the same binary from several initial C sources, so even you are able to reverse the compilation to produce a C code, it will not the be initial one, and it risk to not be easy to readbruno– bruno2019-02-17 13:03:09 +00:00Commented Feb 17, 2019 at 13:03
-
1Please describe more about what you expect. It is not that you will get readable source code from an executable, is it? What do you mean by "compiled C code in text format"? "text format" sounds like something human-readable, but "compiled" is the opposite. Note that not even humans are very good at writing readable or even similar code for the exact same purpose. Also there are always several ways of writing different codes which result in the same binary. And the one only consisting of info which is found in the binary is surely not one which would generally be considered readable.Yunnosch– Yunnosch2019-02-17 13:08:02 +00:00Commented Feb 17, 2019 at 13:08
-
The colloquial term for this is “turning hamburger back into cows”. It’s effectively impossible to recover the original source code from compiled machine code. Decompilers will give you something that’s functionally equivalent, but it won’t be the original source code.John Bode– John Bode2019-02-17 16:26:33 +00:00Commented Feb 17, 2019 at 16:26
3 Answers
"True" decompiling is, basically, impossible. Foremost, you can't "decompile" local names (in functions and source code files / modules). For those, you'll get something like, for int local variables: i1, i2... Of course, unless you also have debug information, which is not often the case.
Decompiling to "something" (which might not be very readable) is possible, but it usually relies on some heuristics, recognizing code patterns that compilers generate and can be fooled into generating strange (possibly even incorrect) C code. In practice that means that a decompiler usually works OK for a certain compiler with certain (default) compile options, but, not so nice with others.
Having said that, decompilers do exist and you can try your luck with, say Snowman
Comments
As Srdjan has said, in general decompilation of a C (or C++) program is not possible. There is too much information lost during the compilation process. For example consider a declaration such as int x this is 'lost' as it does not directly produce any machine level instruction. The compiler needs this information to do type checking only.
Now, however it is possible to disassembly which is taking the compiled executable back up a level to assembly language. However, interpretation of the assembly might (will ?) be difficult and certainly time consuming. There are several disassemblers available, if you have money IDA-Pro is probably the industry standard in disassemblers, and if you are doing this type work, well worth the several thousand dollars per license. There are a number of open source disassemblers available, google can find them.
Now, that being said there have been efforts to create a decompilers, IDA-Pro has one, and you can look at http://boomerang.sourceforge.net/ in addition to Snowman linked above.
Lastly, other languages are more friendly towards decompilation then C or C++. For example a C# programs is decompilable with tools like dotPeek or ilSpy. Similarly with Java there are a number of tools that can convert Java bytecode back into Java source.
Comments
Please post a sample of the "compiled C code in text format."
Perhaps then it will be easier to see what you are trying to achieve.
Typically it is not practical to reverse engineer assembly language into C because much the human readable information in the form of Labels and variable names is permanently lost in the compilation process.