What does HackerNews think of yappi?
Yet Another Python Profiler, but this time multithreading, asyncio and gevent aware.
Edit:
Yappi - A tracing profiler that is multithreading, asyncio and gevent aware.
For async code, the issue with normal profiler is that we end up mostly in the event loop. In Python there is https://github.com/sumerc/yappi which has a notion of coroutine profiling (check the README there), so I'm wondering if this would make sense in the context of Austin.
Anyway thanks for your work!
- Profiling and refactoring Python code in general:
Using yappi[^1] to profile and generate profile data, and KCachegrind[^2] to visualize that data in the form of call graphs, number of cycles, etc. can yield great results. You can find which functions in your code base are taking too long, and this can give great pointers to where bottlenecks are.
Using pyreverse[^3], now integrated in pylint[^4], to generate say a PNG image for class hierarchy and "UML diagrams" is extremely helpful. When I have used it and was the arrows going all over the place, it has helped me eke out better abstractions, remove a lot of code, write cleaner interfaces, and frankly write code I and others could actually read.
After installing pylint. On a package level for instance. Say package name is foo and follows standard hierarchy with `foo/foo`:
cd foo
pyreverse -o png .
# generates classes.png and packages.png
# You can also see pyreverse -o png foo
- Profiling in the context of Flask:Using Werkzeug's ProfilerMiddleware[^5] helps you see what's going on with each request. What functions are called, number of calls, total time, per call, which line, etc.
If the example in the documentation does not work, try the following:
...
try:
from werkzeug.middleware.profiler import ProfilerMiddleware
except ModuleNotFoundError:
# Older version
from werkzeug.contrib.profiler import ProfilerMiddleware
...
# Assuming you have an app object
app.config['PROFILE'] = True
app.wsgi_app = ProfilerMiddleware(app.wsgi_app, restrictions=[50])
General things: it is very helpful to extract as much code from the routes. This helps making the functions usable elsewhere, and not rely on Flask's testing client which can be pretty frustrating when dealing with the app context, especially in test suites involving database actions, and weird connections in setUp and tearDown if you're using unittest*.As I said, this is general and not very specific for "big data" or "billions of rows", but these small things lead to bigger things in my opinion: making the code easier to read and extend, easier to test and cover, easier to profile and improve, compounds to a point you may postpone more involved approaches.
[^1]: https://github.com/sumerc/yappi
[^2]: https://kcachegrind.github.io/
[^3]: https://www.logilab.org/blogentry/6883
[^4]: https://github.com/PyCQA/pylint
[^5]: https://werkzeug.palletsprojects.com/en/1.0.x/middleware/pro...