For comparison, I just implemented the same as C SWIG extension[1]. It's about 10% faster, but it's cheating by comparing bytes instead of utf-8 encoded characters. The more interesting part to me is the comparison of the amount of boilerplate code required.

https://github.com/martinxyz/rust-python-example/commit/f8e3...

I find pybind11 [1] to be perfect for my C++ code. There's so little boilerplate, and I get RAII-guaranteed memory safety and all the speed my C++ development can bring.

For example, the binding of an accelerated HyperLogLog implementation only requires tiny amount of work, plus a line in my Makefile:

  PYBIND11_MODULE(_hll, m) {
      m.doc() = "pybind11-powered HyperLogLog"; // optional module docstring
      py::class_ (m, "hll")
          .def(py::init())
          .def("clear", &hll_t::clear, "Clear all entries.")
          .def("resize", &hll_t::resize, "Change old size to a new size.")
          .def("sum", &hll_t::sum, "Add up results.")
          .def("report", &hll_t::report, "Emit estimated cardinality. Performs sum if not performed, but sum must be recalculated if further entries are added.")
          .def("add", &hll_t::add, "Add a (hashed) value to the sketch.")
          .def("addh_", &hll_t::addh, "Hash an integer value and then add that to the sketch.");
  }
[1] https://github.com/pybind/pybind11