Conversely we found in a microbenchmark the other day that allowing THP more than doubled the speed.
Note glibc lets you turn on and off THP per process which is pretty useful for benchmarking if it helps or hinders performance.
$ hyperfine ' nbdkit -U - data "1 * 10737418240" --run exit '
Benchmark 1: nbdkit -U - data "1 * 10737418240" --run exit
Time (mean ± σ): 3.658 s ± 0.049 s [User: 0.406 s, System: 3.242 s]
Range (min … max): 3.576 s … 3.713 s 10 runs
$ hyperfine ' GLIBC_TUNABLES=glibc.malloc.hugetlb=1 nbdkit -U - data "1 * 10 737418240" --run exit '
Benchmark 1: GLIBC_TUNABLES=glibc.malloc.hugetlb=1 nbdkit -U - data "1 * 10 737418240" --run exit
Time (mean ± σ): 1.655 s ± 0.007 s [User: 0.299 s, System: 1.350 s]
Range (min … max): 1.643 s … 1.666 s 10 runs
But there was a surprise... more than 10 times degradation of overall Linux server performance due to increased physical memory fragmentation after a few days in production: https://github.com/ClickHouse/ClickHouse/commit/60054d177c8b...
It was seven years ago, and I hope that the Linux kernel has been improved. I will need to try "revert of revert" of this commit. These changes cannot be tested by microbenchmarks, and only production usage can show their actual impact.
Also, we successfully use huge pages for text section of the executable, and it is beneficial for the stability of performance benchmarks due to lowering the number of iTLB misses.
[1] ClickHouse - high-performance OLAP DBMS: https://github.com/ClickHouse/ClickHouse/