What does HackerNews think of yappi?

Grasshopper – An open-source Python library for load testing | May 2023

Ditto.

Edit:

Yappi - A tracing profiler that is multithreading, asyncio and gevent aware.

Spy on Python down to the Linux kernel level | Sep 2021

I've just realised after posting that the AUR package uses the git version, so it's actually normal that we have to use git version for austin-tui too and not the pypi one. Just if someone like me install the pypi version without paying attention, the git one is necessary.

For async code, the issue with normal profiler is that we end up mostly in the event loop. In Python there is https://github.com/sumerc/yappi which has a notion of coroutine profiling (check the README there), so I'm wondering if this would make sense in the context of Austin.

Anyway thanks for your work!

PyInstrument – A statistical Python profile that focuses on the slow parts | Oct 2020

How does it compare with yappi[1]?

[1] https://github.com/sumerc/yappi

Ask HN: What do I need to learn how to write a scalable app? | Jun 2020

Since you mention you're using Flask. Before diving into data intensive stuff if need be, a lot can be done by profiling, refactoring, and improving the actual code:

- Profiling and refactoring Python code in general:

Using yappi[^1] to profile and generate profile data, and KCachegrind[^2] to visualize that data in the form of call graphs, number of cycles, etc. can yield great results. You can find which functions in your code base are taking too long, and this can give great pointers to where bottlenecks are.

Using pyreverse[^3], now integrated in pylint[^4], to generate say a PNG image for class hierarchy and "UML diagrams" is extremely helpful. When I have used it and was the arrows going all over the place, it has helped me eke out better abstractions, remove a lot of code, write cleaner interfaces, and frankly write code I and others could actually read.

After installing pylint. On a package level for instance. Say package name is foo and follows standard hierarchy with `foo/foo`:

  cd foo
  pyreverse -o png .
  # generates classes.png and packages.png
  # You can also see pyreverse -o png foo

- Profiling in the context of Flask:

Using Werkzeug's ProfilerMiddleware[^5] helps you see what's going on with each request. What functions are called, number of calls, total time, per call, which line, etc.

If the example in the documentation does not work, try the following:

  ...
  try:
      from werkzeug.middleware.profiler import ProfilerMiddleware
  except ModuleNotFoundError:
      # Older version
      from werkzeug.contrib.profiler import ProfilerMiddleware

  ...
  # Assuming you have an app object

  app.config['PROFILE'] = True
  app.wsgi_app = ProfilerMiddleware(app.wsgi_app, restrictions=[50])

General things: it is very helpful to extract as much code from the routes. This helps making the functions usable elsewhere, and not rely on Flask's testing client which can be pretty frustrating when dealing with the app context, especially in test suites involving database actions, and weird connections in setUp and tearDown if you're using unittest*.

As I said, this is general and not very specific for "big data" or "billions of rows", but these small things lead to bigger things in my opinion: making the code easier to read and extend, easier to test and cover, easier to profile and improve, compounds to a point you may postpone more involved approaches.

[^1]: https://github.com/sumerc/yappi

[^2]: https://kcachegrind.github.io/

[^3]: https://www.logilab.org/blogentry/6883

[^4]: https://github.com/PyCQA/pylint

[^5]: https://werkzeug.palletsprojects.com/en/1.0.x/middleware/pro...