I've heard this as an argument for GC being faster/using less memory in some cases than manual memory management. You can asynchronously return memory to the OS in a separate thread. Boehm GC in particular is really easy to link into an app and use instead of malloc and friends.
Memory allocators for manual memory management could return memory asynchronously in a separate thread as well. The call to free() is only required to update some internal bookkeeping in the memory allocator's internal data structures. A separate thread could easily monitor when the data structures that hold raw pages from the OS are completely free, and return them to the OS.
That's what tcmalloc does, at least if you set TCMALLOC_RELEASE_RATE.
I don't think that's completely true. The value TCMALLOC_RELEASE_RATE controls how often tcmalloc returns memory back to the OS, but I don't believe it uses a separate thread for that task.
Hrmm, you’re right. This seems to be among the gratuitous differences between internal and open source tcmalloc.