This looks like a pure Python implementation. I wonder how the performance is like?

Author of mpmath here. It depends a lot on what you're doing.

A single floating-point arithmetic operation in mpmath at low precision involves something like 100 "Python cycles" (bytecode ops), each of which takes perhaps 100 machine cycles.

That makes it, roughly:

100 times slower than machine arithmetic in Python.

10000 times slower than machine arithmetic in C (or NumPy if you can vectorize your code fully).

100 times slower than arbitrary precision floating-point arithmetic implemented in C (~100 machine cycles).

These are obviously just order of magnitude estimates.

However, low precision is the worst case, relatively speaking. mpmath uses Python longs internally, and it can also use GMPY when available. At sufficiently high precision, the time is dominated the Python/GMP kernel for multiplying integers and performance is close to other bignum implementations.

Also, for computing transcendental functions, mpmath uses fixed-point arithmetic internally, which reduces overhead a lot.

The biggest problem with mpmath is that it doesn't implement algorithms that scale optimally for all operations, and a lot of the error analysis is completely nonrigorous.

Since 2012, I have been developing a C library (https://github.com/fredrik-johansson/arb/) that solves many of the shortcomings of mpmath. It is obviously much faster at low precision (the factor 100 mentioned above), it generally uses much better algorithms, and it tracks error bounds automatically using interval arithmetic.