https://github.com/benfred/py-spy
It's worked really well for my needs
I'm not sure if it exports results in a format Chrome can render but it does produce great interactive SVGs and is compatible with speedscope.app
- py-spy: https://github.com/benfred/py-spy (written in Rust)
- pyflame: https://github.com/uber-archive/pyflame (C++, seems to be not maintained anymore)
The "no performance cost" thing is interesting: my experience writing a similar profiler is that there are a couple of things that can affect performance a little bit:
1. You have to make a lot of system calls to read the memory of the target process, and if you want to sample at a high rate then that does use some CPU. This can be an issue if you only have 1 CPU.
2. you have two choices when reading memory from a process: you can either race with the program and hope that you read its memory to get the function stack before it changes what function it's running (and you're likely to win the race, because C is faster than Python), or you can pause the program briefly while taking a sample. py-spy has an option to choose which one you want to do: https://github.com/benfred/py-spy#how-can-i-avoid-pausing-th...
Definitely this method is a lot lower overhead than a tracing profiler that instruments every single function call, and in practice it works well.
One thing I think is nice about this kind of profiler is that reading memory from the target process sounds like a complicated thing, but it's not: you can see austin's code for reading memory here, and it's implemented for 3 platforms in just 130 lines of C: https://github.com/P403n1x87/austin/blob/877e2ff946ea5313e47...
As a quick compare and contrast between py-spy and pyinstrument it looks like py-spy has the advantage of being able to attach to an already running process which is super useful when your program is stuck and you don't know why. I haven't used pyinstrument yet but I do like the fact that it can do its flame graph in the console, sometimes I find saving down an svg file and opening up the browser a bit arduous. Excited to give it a try.
[1] https://github.com/benfred/py-spy
[2] https://jvns.ca/blog/2018/09/08/an-awesome-new-python-profil...
I published that here:
https://pypi.org/project/better_exchook/ https://github.com/albertz/py_better_exchook
I also often have a SIGUSR1 handler which will print the stacktrace of all threads. This is useful on long running processes, involving multi threading, where you might run into some strange hangs or deadlocks.
In addition to that, if you might get crashes (segfault or so), something like faulthandler is useful.
If you also want to see the C stack trace in addition in such cases, I load libSegFault.so, like here: https://github.com/rwth-i6/returnn/blob/5b8e34ec1fd725d0e20b...
Profiling is another topic. Something like Py-Spy (https://github.com/benfred/py-spy) can be very helpful.
I also found a remote background ZMQ IPython/Jupyter kernel to be useful sometimes. I published that here: https://github.com/albertz/background-zmq-ipython
I wrote a tool py-spy (https://github.com/benfred/py-spy) that is worth checking out if you’re interesting in profiling python programs. Not only does it solve those problems with cProfile - py-spy also lets you generate a flamegraph, profile running programs in production, works with multiprocess python applications, can profile native python extensions etc.
> While pyflame is a great project, it doesn't support Python 3.7 yet and doesn't work on OSX, Windows or FreeBSD.
I wonder how the CPU profiling in Scalene is different. It does not mention PyFlame or py-spy at all in the Readme. Of course, the memory profiler is some nice extra.
I've had success using `py-spy` for debugging perf issues. Flamegraphs are much nicer to work with than cProfile's output.