What does HackerNews think of async-profiler?

Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

Language: C++

Try looking at it with async-profiler. This can be done in production. I discovered performance problems in unexpected (and some expected) places with it in the past. It may be more helpful if it's your application code that's to blame, though, less so if it's the JVM itself.

https://github.com/jvm-profiling-tools/async-profiler

> The other two languages I’ve used mostly in recent decades are Java and Ruby and the profiler situation is for both those languages is kind of shitty. I had to pay real money to get the Java profiler I used at AWS and while it worked, it was klunky, not fun to use.

These days, async profiler (https://github.com/jvm-profiling-tools/async-profiler) is much better than the Go tooling for performance. It is a joy to use and features a top-like view for the hottest methods. It works for locks, allocations and CPU time. It also integrates with JMH.

Have you tried setting up a JMH benchmark? This should allow you to see if the JIT is the cause of your slowdowns. Also, running it under a profiler (I recommend async-profiler[1]) should give you a good idea of where the slowdown occurs which might help you pin it down further.

[1] https://github.com/jvm-profiling-tools/async-profiler

I've run IntelliJ on aarch64 using my system jdk11 instead of the bundled one. I ran into issues with two native binaries that come bundled with it:

1. An inotify library for detecting file changes. In this case, IntelliJ detected that it didn't have a binary for my architecture and just let me know it would be slower. Not a big deal.

2. The async profiler[0]. While available for other architectures, only x86_64 binaries are bundled with IntelliJ. And unfortunately, it didn't detect this either; it just quietly failed to work.

Hopefully with this patch they've also fixed these issues on linux-aarch64.

0. https://github.com/jvm-profiling-tools/async-profiler

This looks like a neat tool, and I can see myself using this!

My main project so far this year has been creating actual performance metrics, and not just guesstimates (This is especially important with the JVM which will optimize code at runtime to your actual payloads). And the best tool so far has been FlameGraphs [1]: I urge everyone to try and find am implementation for them and their specific language, as these things are actually interactive. It's not just a nice graphic, but can tell you very directly where you're spending time. We've found countless minor bugs and wasted cycles.

The best java integration I could find is Async-Profiler [2] which can - as the name implies - be attached to any running jvm. The config is pretty powerful and intuitive. It's one of those magical things that just work.

[1]: http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

[2]: https://github.com/jvm-profiling-tools/async-profiler

Although the article is a bit long-winded as others have pointed out, it did mention the importance of profiling your application before making any changes.

In particular, you should always aim to profile your app when running under a production load so that you do not have to make assumptions about its behaviour. Something like async-profiler[0] is good, since it avoids the safepoint bias issue and can also track heap allocations.

0. https://github.com/jvm-profiling-tools/async-profiler

For Java you can do better than perf. Sun's Performance Analyzer [1] has had hardware counter based profiling of real Java (not just JIT-compiled code) for more than 10 years. Open source async profiler [2] seems to be doing a decent job on the data collection side, though, doesn't go beyond a basic flame graph for analysis.

[1] https://en.wikipedia.org/wiki/Performance_Analyzer [2] https://github.com/jvm-profiling-tools/async-profiler

Yes, perf sampling of Java is still going well, via -XX:+PreserveFramePointer. Our one microservice that had high overhead with that option (up to 10%, which is rare) has improved their code, lowered the overhead, and now enabled this option by default.

But there's also a newer way that can do Java stack sampling from perf without the frame pointer: https://github.com/jvm-profiling-tools/async-profiler

We're not running it yet. I want to try it out. Note that the stacks with async-profiler are a bit broken -- Java methods become detached from the JVM -- but I'm hoping that's fixable (it should be with a JVM change, at least).