I think there is a general problem with this line of research. This paper and others try to attribute execution time to specific instructions to provide feedback to software developers. But there’s no way to verify that the resulting execution time breakdown is “correct.” If the tool says 5% of execution time is spent in a particular ADD instruction, I can’t just take out the instruction and verify this claim - the resulting program would be functionally different. Then how do I know if it’s really 5%, 10%, or some other number?

Not to say that there’s no value in this research area; I would like the processor to provide feedback on where the performance bottlenecks are in my program. I would rather the researchers focus on generating meaningful optimization suggestions rather than just a table of numbers, though.

Take a look at coz the causal profiler: https://github.com/plasma-umass/coz

From what I know this is the closest project that fits your description.