Is there a well established python stack for load testing and profiling? Like, I would like to load test a python web application, then use some analysis tool to understand what part of the stack is eating the CPU cycles and/or RAM.

Ditto.

Edit:

Yappi - A tracing profiler that is multithreading, asyncio and gevent aware.

https://github.com/sumerc/yappi