> The data has to be on the filesystem. You can’t load data from a blob store like AWS S3.

You sure about this? You can mount S3 locally, and the filesystem should be transparent to the kernel.

There are two ways of doing this that I know of:

1. FUSE filesystems. This means sending really slow queries to S3, and so you're much better off using compression to get better performance.

2. Syncing to filesystem with AWS DataSync. This would work, yes, but it's a S3 specific feature and not all blob storage systems have it.

> 1. FUSE filesystems. This means sending really slow queries to S3, and so you're much better off using compression to get better performance.

As author of https://github.com/kahing/goofys/ I respectfully disagree :-)