Would this approach also work for Tar archives? Transparent support for sub-files from a .tar would be badass.
Tar doesn't use any sort of index like zip does, so to extract the specified file the server side would need to parse through possibly the entire file just to see if the requested file is there, and then start streaming it. Requests for files that aren't in the tar archive would be prohibitively expensive.
There are definitely ways to do it without those problems, though. They just wouldn't be quite as simple as the approach done for supporting zip.
You could pre-index them I suppose. Though even that would only work with a subset of compression methods or no compression.
I work with serving tiled geospatial data [2] (Mapbox vector tiles) to our users as slippy maps where we serve millions of small (mostly <100KB) files to our users, our data only changes weekly so we precompute all the tiles and store them in a tar file in s3.
We compute a index for the tar file then use s3 range requests to serve the tiles to our users, this means we can generally fetch a tile from s3 with 2 (or 1 if the index is cached) requests to s3 (generally ~20-50ms).
To get full coverage of the world with map box vector tiles it is around 270M tiles and a ~90GB tar file which can be computed from open street map data [3]
> Though even that would only work with a subset of compression methods or no compression.
We compress the individual files as a work around, there are options for indexing a compressed (gzip) tar file but the benefits of a compressed tar vs compressed files are small for our use case
[1] https://github.com/linz/cotar (or wip rust version https://github.com/blacha/cotar-rs) [2] https://github.com/linz/basemaps or https://basemaps.linz.govt.nz [3] https://github.com/onthegomap/planetiler