What does HackerNews think of libdivide?

Official git repository for libdivide: optimized integer division

Language: C++

libdivide tl;dr:

> libdivide allows you to replace expensive integer divides with comparatively cheap multiplication and bitshifts. Compilers usually do this, but only when the divisor is known at compile time. libdivide allows you to take advantage of it at runtime. The result is that integer division can become faster - a lot faster. [...] divide SIMD vectors by runtime constants, [1]

> libdivide.h is a header-only C/C++ library for optimizing integer division. Integer division is one of the slowest instructions on most CPUs e.g. on current x64 CPUs a 64-bit integer division has a latency of up to 90 clock cycles whereas a multiplication has a latency of only 3 clock cycles. libdivide allows you to replace expensive integer divsion instructions by a sequence of shift, add and multiply instructions that will calculate the integer division much faster.

> On current CPUs you can get a speedup of up to 10x for 64-bit integer division and a speedup of up to to 5x for 32-bit integer division when using libdivide. libdivide also supports SSE2, AVX2 and AVX512 vector division which provides an even larger speedup. You can test how much speedup you can achieve on your CPU using the benchmark program.[2]

[1] https://libdivide.com/ [2] https://github.com/ridiculousfish/libdivide

That's a difficult question. As I am working at Google I need to consult each open source I want to publish on my behalf. This does not involve contributing to the list of the approved projects and telling about these contributions.

If you want to workaround this for now, I suggest looking into libdivide (https://github.com/ridiculousfish/libdivide), it is published with the boost license and the library contains all the needed artifacts I described in the article (unfortunately, not combined).

I haven't checked what exactly massscan is doing to randomize the IP and port sequences, but if reduction modulo some runtime constant is that much of a problem (according to the Readme at least) perhaps you should consider replacing the modulo and division operations by multiplications?

The canonical reference is Granlund and Montgomery [1]. Luckily, there are ready-made libraries for this, like libdivide [2], which would probably lower the reported 90 cycles into something more palatable (and pipelineable).

[1] http://gmplib.org/~tege/divcnst-pldi94.pdf

[2] https://github.com/ridiculousfish/libdivide