So, in compiled binary executables it was not uncommon to include encoded binary resources, like files or images. Not tons of them, because they would explode the size of the executable.
For example, for C or C++ https://github.com/graphitemaster/incbin
So here's the interesting part - we decided to build executables in a way that data in those executables could not be changed (makes sense since it's compiled and the compiled code runs on the machine, and for security reasons).
That said, think about what a container (e.g. docker) is. A running container is kind of like a packaged executable, except it also has a filesystem.
So if you pack data (which can be modified and runtime!) into a container, it's a similar concept to an embedded resource in an executable, except it can change.
Now, in a container, any changes made to data at runtime inside the container won't persist unless the container is given persistent storage.
What I've been wondering lately is why we didn't invent some kind of single "executable plus volatile data space embedded within the executable", so that programs and data (say, a database) could couple together into a single file.
Just a musing but tangentially related to your "baked data" - basically, embedded resources in executables just embeds the encoded data right into an executable.
For scripting languages, of course, we can just make a script file that contains a variable with the encoded data as base64 directly.
For the latter 2, it's only good for relatively small static data of course, but it would be interesting to build tech that somehow lifted that constraint of executables.
Doing
const char *s = "
#inclued "test.txt"
";
won't work, since the preprocessor won't interpret directives inside string literals of course.In many assemblers, there is a directive called "incbin" which pastes in unstructured binary data at the point of usage. I just found a very clever C and C++ wrapper [1] for that, which gives you an INCBIN() macro. Nice!
It is pretty perfect for my project, which is a deep learning application for Android. I use it to embed the CNN model file into the C++ code. It let me avoid putting it in the apk, and then loading it from Java, and then passing it to C++.
The problem with objcopy outside of unconvenient usage and naming is that naive objcopy will result in your binary having executable stack [1]. You can change a symbol name, but that's also unconvenient.
Check resulting binary with:
$ readelf -lW the_binary | grep GNU_STACK
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RWE 0x8
$
Notice: RWE instead RW.Also: https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart#H...
[0] https://github.com/graphitemaster/incbin
[1] https://news.ycombinator.com/item?id=10816322#10818085
EDIT:
My shell script - bin2o.sh:
#!/bin/sh
set -e
filename="$1"
name=$(echo "$1" | sed "s/[^A-Za-z0-9]/_/g")
obj="$2"
echo \
" .section .rodata
.global ${name}
.type ${name}, @object
.global ${name}_size
${name}:
.incbin \"${filename}\"
1:
${name}_size:
.int 1b - ${name}
.section .note.GNU-stack,\"\",%progbits
" | gcc -x assembler -c - -o "$obj"