What does HackerNews think of bloaty?

Bloaty McBloatface: a size profiler for binaries

Language: C++

ESP32s aren't really ‘lower level’ in the sense that anyone is likely to write assembly code for them (compared to, say, 8051 or PIC), other than maybe some driver author at Espressif. The big win from using RISC-V, other than name recognition, is mainstream compiler support (which is nothing to sneeze at, especially when it's largely funded by someone else).

When I worked on Matter¹, the Xtensa and RISC-V versions were basically fungible from the software point of view. (And really, so were other vendors' various ARMs.) We did find that Bloaty McBloatface² didn't support Xtensa, so I had to write an alternative.

¹ https://github.com/project-chip/connectedhomeip/

² https://github.com/google/bloaty

I’ve gotten good insight into what takes up space in binaries by profiling with Bloaty McBloatface. My last profiling session showed that clang’s ThinLTO was inlining too aggressively in some cases, causing functions that should be tiny to be 75 kB+.

https://github.com/google/bloaty

I'm surprised they didn't go for the binary size analysis tools like

https://github.com/google/bloaty

Or goweight.

Loggy McLogface

Like the other Google project, Bloaty McBloatface.

https://github.com/google/bloaty

A famous one is Google's Bloaty McBloatface [1]. It's a very powerful and useful tool.

1. https://github.com/google/bloaty

> OK, the pclntab is large. Why is it large? What are the specific things in it that are large?

Is it reasonably easy to attribute individual entries in pclntab to specific symbols? If so I'd love to add this capability to https://github.com/google/bloaty which already tries to do per-symbol analysis of many other sections (eh_frame, rela.dyn, etc).

> The sum of the sizes reported by go tool nm does not add up to the final size of the Go executable.

> At this time, I do not have a satisfying explanation for this “dark” file usage.

The author's journey of starting with "nm --size", discovering "dark" bytes, and wanting to attribute them properly, is exactly what led me to create and invest so much effort into Bloaty McBloatface: https://github.com/google/bloaty

Bloaty's core principle is that every byte of the file should be attributed to something, so that the sum of the parts always adds up to the total file size. If we can't get detailed symbol, etc. information for a given region of the file, we can at least fall back to describing what section the bytes were in.

Attributing all of the bytes requires parsing much more than just the symbol table. Bloaty parses many different sections of the binary, including unwind information, relocation information, debug info, and the data section itself in an attempt to attribute every part of the binary to the function/data that emitted it. It will even disassemble the binary looking for references to anonymous data (some data won't make it into the symbol table, especially things like string literals).

I wrote up some details of how Bloaty works here: https://github.com/google/bloaty/blob/master/doc/how-bloaty-.... The section on the "Symbols" data source is particularly relevant here:

> I excerpted two symbols from the report. Between these two symbols, Bloaty has found seven distinct kinds of data that contributed to these two symbols. If you wrote a tool that naively just parsed the symbol table, you would only find the first of these seven:"

The author's contention that these "dark" bytes are "non-useful" is not quite fair. There are plenty of things a binary contains that are useful even though they are not literally executable code. For example, making a binary position-independent (which is good for security) requires emitting relocations into the binary so that globals with pointer values can be relocated at program load time, once the base address of the binary is chosen. I don't know if Go does this or not, but it's just one example.

On the other hand, I do agree that the ability to produce slim binaries is an important and often undervalued property of modern compiler toolchains. All else being equal, I much prefer a toolchain that can make the smallest binaries.

Bloaty should work reasonably well for Go binaries, though I have gotten some bug reports about things Bloaty is not yet handling properly for Go: https://github.com/google/bloaty/issues/204 Bloaty is just a side thing for me, so I often don't get as much time as I'd like to fix bugs like this.

I used my tool Bloaty, which parses the ELF and DWARF information in the binary: https://github.com/google/bloaty
Since we're talking about ELF binaries, let me drop a link for my favorite size profiler (that is compatible with ELF and other formats):

Bloaty McBloatface.

https://github.com/google/bloaty

I just ran Bloaty (https://github.com/google/bloaty) on /usr/bin/touch from Catalina, and I got this:

    $ bloaty `which touch` -d segments,sections --domain=file
       FILE SIZE
    ---------------
     55.1%  19.6Ki    __LINKEDIT
       91.7%  18.0Ki    Code Signature
        2.7%     544    Symbol Table
        1.9%     376    Lazy Binding Info
        1.6%     328    String Table
        1.2%     232    Indirect Symbol Table
        0.6%     112    Binding Info
        0.2%      32    Export Info
        0.1%      24    Function Start Addresses
        0.0%       8    Rebase Info
        0.0%       8    Table of Non-instructions
        0.0%       0    [__LINKEDIT]
     22.2%  7.90Ki    __TEXT
       35.0%  2.77Ki    __TEXT,__text
       33.6%  2.65Ki    [__TEXT]
       17.2%  1.36Ki    [Mach-O Headers]
        4.2%     339    __TEXT,__cstring
        3.4%     274    __TEXT,__const
        3.3%     266    __TEXT,__stub_helper
        1.9%     150    __TEXT,__stubs
        1.5%     120    __TEXT,__unwind_info
     11.2%  4.00Ki    __DATA
       94.9%  3.80Ki    [__DATA]
        4.9%     200    __DATA,__la_symbol_ptr
        0.2%       8    __DATA,__data
     11.2%  4.00Ki    __DATA_CONST
       98.4%  3.94Ki    [__DATA_CONST]
        1.6%      64    __DATA_CONST,__got
      0.3%     104    [Mach-O Headers]
    100.0%  35.6Ki    TOTAL
So 2.77Ki of actual code in "__TEXT,__text".

Those [__TEXT], [__DATA], and [__DATA_CONST] sections are the part that is lost to padding, so 10.4Ki or so.

Disclosure: I am the author of Bloaty.

Language flame wars aside, Bloaty McBloatface is a wonderful tool to analyze why the binary size is big: https://github.com/google/bloaty

I frequently use this tool to answer questions for C++ binaries, another language that has a penchant for producing large executables.

I wrote Bloaty (https://github.com/google/bloaty) which involved writing a totally custom ELF file parser. Here are some epiphanies I had about ELF while writing it.

ELF (and Mach-O, PE, etc) are designed to optimize the creation of a process image. The runtime loader mainly just has to mmap() a bunch of file ranges into memory with various permissions. This is quite different than loading a .jar file, .pyc, etc. which involve building a runtime heap and loading objects into that heap.

ELF has two file-level tables: sections and segments (the latter are also called program headers). Things clicked for me when I realized: sections are for the linker and segments are for the runtime loader. Sections are the atomic unit of data that linkers operate on: the linker will never rearrange data within a section (it may concatenate several input sections into a single output section though). The loader doesn't even look at the section table AFAIK, everything needed to load the binary is put into segments / program headers.

Only some parts of the binary are actually read/loaded when the binary is executed. Debugging info may bloat the binary but it doesn't cost any RAM at runtime because it's never loaded unless you run a debugger. Bloaty makes this clear by showing both VM size and file size: https://github.com/google/bloaty#running-bloaty

> I suppose this is the end of the dive, unless someone knows some tools to get more insight into .rodata data. I suppose in theory it should be possible to track down where in the code each bit of .rodata is accessed from, but that seems bit of a stretch.

My tool Bloaty (https://github.com/google/bloaty) attempts to do exactly this. It even disassembles the binary looking for instructions that reference other sections like .rodata.

It doesn't currently know anything about Rust's name mangling scheme. I'd be happy to add this, though I suppose Rust's mangling is probably written in Rust and Bloaty is written in C++.

Shameless plug: my project Bloaty (https://github.com/google/bloaty) is a size profiler that shows you both mapped size and file size out-of-the-box.
If you're into stuff like this, you might like my project Bloaty McBloatface, which can dump size profiles of binaries:

https://github.com/google/bloaty

I agree with your point completely. I just want to add that throwing out raw numbers like "10.3MB" or "400 KB" is not very precise. Binaries can vary immensely based on whether they have debug info, string tables, etc. or whether these have been stripped away.

I wrote a size profiling tool that can give much more precise measurements (like size(1) on steroids, see: https://github.com/google/bloaty). Here is output for LuaJIT:

    $ bloaty src/luajit -n 5
         VM SIZE                     FILE SIZE
     --------------               --------------
      74.3%   323Ki .text           323Ki  73.8%
      12.5%  54.5Ki .eh_frame      54.5Ki  12.4%
       7.6%  33.2Ki .rodata        33.2Ki   7.6%
       2.2%  9.72Ki [Other]        12.9Ki   2.9%
       2.1%  9.03Ki .eh_frame_hdr  9.03Ki   2.1%
       1.2%  5.41Ki .dynsym        5.41Ki   1.2%
     100.0%   435Ki TOTAL           438Ki 100.0%
And for Pixie:

    $ bloaty pixie/pixie-vm -n 5
         VM SIZE               FILE SIZE
     --------------         --------------
      57.5%  4.39Mi .text    4.39Mi  44.7%
      33.7%  2.58Mi .data    2.58Mi  26.3%
       0.0%       0 .symtab  1.31Mi  13.4%
       0.0%       0 .strtab   978Ki   9.7%
       8.8%   688Ki [Other]   595Ki   5.9%
       0.0%       8 [None]        0   0.0%
     100.0%  7.64Mi TOTAL    9.82Mi 100.0%
In this case, neither binary had debug info. Pixie does appear to have a symbol table though, which LuaJIT has mostly stripped.

In general, I think "VM size" is the best general number to cite when talking about binary size, since it avoids penalizing binaries for keeping around debug info or symbol tables. Symbol tables and debug info are useful; we don't want people to feel pressured to strip them just to avoid looking bad in conversations about binary size.

Really great article and blog.

> You can examine binary images using the nm and objdump commands to display symbols, their addresses, segments, and so on.

You can also use my new tool Bloaty McBloatface (https://github.com/google/bloaty). Check out the -v option especially, which will dump a memory map of both the file domain and the VM address domain:

    $ ./bloaty `which ls` -v -d segments
    FILE MAP:
    [0, 19d44] LOAD [RX], LOAD [RX]
    [19d44, 19df0] [None], [Unmapped]
    [19df0, 1a5f4] LOAD [RW], LOAD [RW]
    [1a5f4, 1a700] [None], [Unmapped]
    [1a700, 1ae00] [None], [ELF Headers]
    VM MAP:
    [0, 400000] NO ENTRY
    [400000, 419d44] LOAD [RX], LOAD [RX]
    [419d44, 619df0] NO ENTRY
    [619df0, 61a5f4] LOAD [RW], LOAD [RW]
    [61a5f4, 61b360] LOAD [RW], LOAD [RW]
         VM SIZE                     FILE SIZE
     --------------               --------------
      95.1%   103Ki LOAD [RX]       103Ki  96.1%
       4.9%  5.36Ki LOAD [RW]      2.00Ki   1.9%
       0.0%       0 [ELF Headers]  1.75Ki   1.6%
       0.0%       0 [Unmapped]        440   0.4%
     100.0%   108Ki TOTAL           107Ki 100.0%
If you leave off "-d segments" the map will include all sections too (like .bss, .text, etc). Here is an example of that output: http://pastebin.com/3XGcqA8k