This article is full of misinformation. Just a few representative things:
- The expansion of pclntab in Go 1.2 dramatically improved startup time and reduced memory footprint, by letting the OS demand-page this critical table that is used any time a stack must be walked (in particular, during garbage collection). See https://golang.org/s/go12symtab for details.
- We (the Go team) did not “recompress” pclntab in Go 1.15. We did not remove pclntab in Go 1.16. Nor do we have plans to do either. Consequently, we never claimed “pclntab has been reduced to zero”, which is presented in the article as if a direct quote.
- If the 73% of the binary diagnosed as “not useful” were really not useful, a reasonable demonstration would be to delete it from the binary and see the binary still run. It clearly would not.
- The big table seems to claim that a 40 MB Go 1.8 binary has grown to a 289 MB Go 1.16 binary. That’s certainly not the case. More is changing from line to line in that table than the Go version.
Overall, the claim of “dark bytes” or “non-useful bytes” strikes me as similar to the claims of “junk DNA”. They’re not dark or non-useful. It turns out that having the necessary metadata for garbage collection and reflection in a statically-compiled language takes up a significant amount of space, which we’ve worked over time at reducing. But the dynamic possibilities in reflection and interface assertions mean that fewer bytes can be dropped than you’d hope. We track binary size work in https://golang.org/issue/6853.
An unfortunate article.
An easily obtained apples-to-apples¹ table:
$ for i in $(seq 3 16); do
curl -sLo go1.$i.tgz https://golang.org/dl/go1.$i.linux-amd64.tar.gz
tar xzf go1.$i.tgz go/bin/gofmt
size=$(ls -l go/bin/gofmt | awk '{print $5}')
strip go/bin/gofmt
size2=$(ls -l go/bin/gofmt | awk '{print $5}')
echo go1.$i $size $size2
done
go1.3 3496520 2528664
go1.4² 14398336 13139184
go1.5 3937888 2765696
go1.6 3894568 2725376
go1.7 3036195 1913704
go1.8 3481554 2326760
go1.9 3257829 2190792
go1.10 3477807 2166536
go1.11 3369391 2441288
go1.12 3513529 2506632
go1.13 3543823 2552632
go1.14 3587746 2561208
go1.15 3501176 2432248
go1.16 3448663 2443736
$
Size fluctuates from release to release, but the overall trendline is flat: Go 1.16 binaries are roughly where Go 1.3 binaries were.At the moment, it looks like Go 1.17 binaries will get a bit smaller thanks to the new register ABI making executable code smaller (and faster).
¹ Well, not completely. The gofmt code itself was changing from release to release, but not much. Most of the binary is the libraries and runtime, though, so it's still accurate for trends.
² Turns out we shipped the go 1.4 gofmt binary built with the race detector enabled! Oops.
Strip removes the symbol tables and DWARF information.
But still, the sum of all the bytes advertised in symbol tables for the non-DWARF data does not sum up to the stripped size. What's the remainder about?
I am reminded of how early versions of MSWord were embedding pages of heap space in save files that were not relevant to the document being saved, just because it made the saving algorithm simpler. For all we know, the go linker could be embedding random data.
> For all we know, the go linker could be embedding random data.
I do know, and it is not.
> But still, the sum of all the bytes advertised in symbol tables for the non-DWARF data does not sum up to the stripped size. What's the remainder about?
if you do know, then pray, what is the answer to this question?
Maybe another day I will take the time to write a full-length blog post examining the bytes in a Go binary. Today I have other work planned and still intend to do it.
My points today are only that:
1. Go binary size has not gotten dramatically better or worse in any particular release and is mostly unchanged since Go 1.3.
2. Many claimed facts in the blog post are incorrect.
3. The linker is not "embedding random data" into Go binaries as you conjectured in the comment above.
Stepping back a level, you don't seem to be interested in engaging in good faith at all. I'm not going to reply to any more of your comments.
I have no dog in this fight either way, I'm just very curious about the answer: if something like 30-40% in a Go executable that clocks in at more than a 100 megabytes is not taken up by either symbols, debug information or the pclntab, what exactly is in it? You mentioned "necessary metadata for garbage collection and reflection in a statically-compiled language" in a previous comment. Can you give some more details on what that means?
You can see the true size of the Go pclntab in ELF binaries using "readelf -S" and in Mac binaries using "otool -l". Its not zero.
One thing that did change from Go 1.15 to Go 1.16 is that we broke up the pclntab into a few different pieces. Again, it's all in the section headers. But the pieces are not in the actual binary's symbol table anymore, because they don't need to be. And since the format is different, we would have removed the old "runtime.pclntab" symbol entirely, except some old tools got mad if the symbol was missing. So we left the old symbol table entry present, with a zero length.
Clearly, we could emit many more symbols accurately describing all these specific pieces of the binary. But ironically that would just make the binary even larger. Leaving them out is probably the right call for nearly all use case.
Except perhaps trying to analyze binary size, although even there even symbols don't paint a full picture. OK, the pclntab is large. Why is it large? What are the specific things in it that are large? Symbols don't help there at all. You need to analyze the actual data, add debug prints to the linker, and so on.
That would make an interesting post, and perhaps we will write one like that. But not today.
Is it reasonably easy to attribute individual entries in pclntab to specific symbols? If so I'd love to add this capability to https://github.com/google/bloaty which already tries to do per-symbol analysis of many other sections (eh_frame, rela.dyn, etc).