I've been tinkering in the reverse-engineering space. My problem amounts to reusing compiled binaries by combining them in novel ways. That is, I would like to take algorithm/subsystem X from software Y, combine it with something else. The goal is to have a library of components which I may be able to combine.
It has lead me to investigate a few technologies I have been meaning to invest time into, like llvm, qemu. There are a few projects which combine these as well as related technologies like DECAF, radare2, DynamoRio, mcsema.
The hard problem which I am facing, but by no means have an effective solution is of having to extract semantically the essence of the program in spite of ISA, and to find a balance between emulation and readapt-ability (i.e abstract out the code that is dependent upon some base-address assumption.
The value on the surface seems counter intuitive to the investment, especially from my roots in SWE where one can have a hard enough time trying to accomplish that with source-code available. Although the application is broad, I've focussed intermittently on video games. I believe this is where some value lies, as a finely-tuned subsystem can be the heart of a franchise.
Very interesting. I'm trying to do something closely related for other reasons.
One thing I'm thinking of is that it might be possible to brute-force the semantics of short snippets of code using genetic algorithms. A similar technique has been demonstrated a few times by author of [1].
I want to use this to eventually rapidly search a large number of binaries for insecure behavior. But to do that I need to be able to formulate questions like:
"Find me a function where attacker controlled data is marshaled to a size type and then used to allocate memory, to which a different attacker controlled amount of attacker controlled data is written".
Basically this: https://github.com/Battelle/PaperMachete