What does HackerNews think of incbin?

Include binary files in C/C++

Language: C

Tangentially, I've been exploring something kind of fascinating...

So, in compiled binary executables it was not uncommon to include encoded binary resources, like files or images. Not tons of them, because they would explode the size of the executable.

For example, for C or C++ https://github.com/graphitemaster/incbin

So here's the interesting part - we decided to build executables in a way that data in those executables could not be changed (makes sense since it's compiled and the compiled code runs on the machine, and for security reasons).

That said, think about what a container (e.g. docker) is. A running container is kind of like a packaged executable, except it also has a filesystem.

So if you pack data (which can be modified and runtime!) into a container, it's a similar concept to an embedded resource in an executable, except it can change.

Now, in a container, any changes made to data at runtime inside the container won't persist unless the container is given persistent storage.

What I've been wondering lately is why we didn't invent some kind of single "executable plus volatile data space embedded within the executable", so that programs and data (say, a database) could couple together into a single file.

Just a musing but tangentially related to your "baked data" - basically, embedded resources in executables just embeds the encoded data right into an executable.

For scripting languages, of course, we can just make a script file that contains a variable with the encoded data as base64 directly.

For the latter 2, it's only good for relatively small static data of course, but it would be interesting to build tech that somehow lifted that constraint of executables.

Sure, but only if the text file looks like a C string literal, i.e. starts and ends with double quotes (which would make it into a weird text file).

Doing

    const char *s = "
    #inclued "test.txt"
    ";
won't work, since the preprocessor won't interpret directives inside string literals of course.

In many assemblers, there is a directive called "incbin" which pastes in unstructured binary data at the point of usage. I just found a very clever C and C++ wrapper [1] for that, which gives you an INCBIN() macro. Nice!

[1]: https://github.com/graphitemaster/incbin

I have been using a library named incbin (https://github.com/graphitemaster/incbin). On Mac and Linux it doesn't even require a cli tool to convert the file. It just embed the content using the `.incbin` directive of the inline assembler.

It is pretty perfect for my project, which is a deep learning application for Android. I use it to embed the CNN model file into the C++ code. It let me avoid putting it in the apk, and then loading it from Java, and then passing it to C++.

There is also an option to use assembler or inline asm. I found quite a nice utility that uses inline asm [0]. It's widely portable and I think that I will use it instead of my naive asm/shell combo that doesn't work with mingw asm.

The problem with objcopy outside of unconvenient usage and naming is that naive objcopy will result in your binary having executable stack [1]. You can change a symbol name, but that's also unconvenient.

Check resulting binary with:

  $ readelf -lW the_binary | grep GNU_STACK
    GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RWE  0x8
  $
Notice: RWE instead RW.

Also: https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart#H...

[0] https://github.com/graphitemaster/incbin

[1] https://news.ycombinator.com/item?id=10816322#10818085

EDIT:

My shell script - bin2o.sh:

  #!/bin/sh
  set -e
  
  filename="$1"
  name=$(echo "$1" | sed "s/[^A-Za-z0-9]/_/g")
  obj="$2"
  
  echo \
  "	.section .rodata
  	.global ${name}
  	.type ${name}, @object
  	.global ${name}_size
  ${name}:
  	.incbin \"${filename}\"
  1:
  ${name}_size:
  	.int 1b - ${name}
  	.section .note.GNU-stack,\"\",%progbits
  " | gcc -x assembler -c - -o "$obj"