For comparison, I just implemented the same as C SWIG extension[1]. It's about 10% faster, but it's cheating by comparing bytes instead of utf-8 encoded characters. The more interesting part to me is the comparison of the amount of boilerplate code required.
https://github.com/martinxyz/rust-python-example/commit/f8e3...
I find pybind11 [1] to be perfect for my C++ code. There's so little boilerplate, and I get RAII-guaranteed memory safety and all the speed my C++ development can bring.
For example, the binding of an accelerated HyperLogLog implementation only requires tiny amount of work, plus a line in my Makefile:
PYBIND11_MODULE(_hll, m) {
m.doc() = "pybind11-powered HyperLogLog"; // optional module docstring
py::class_ (m, "hll")
.def(py::init())
.def("clear", &hll_t::clear, "Clear all entries.")
.def("resize", &hll_t::resize, "Change old size to a new size.")
.def("sum", &hll_t::sum, "Add up results.")
.def("report", &hll_t::report, "Emit estimated cardinality. Performs sum if not performed, but sum must be recalculated if further entries are added.")
.def("add", &hll_t::add, "Add a (hashed) value to the sketch.")
.def("addh_", &hll_t::addh, "Hash an integer value and then add that to the sketch.");
}
[1] https://github.com/pybind/pybind11