What does HackerNews think of seaweedfs?
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
This benchmark uses large batch size, 64MB, to test. There is nothing new here. Most common file systems can easily do the same.
The difficult task is to read and write lots of small files. There is a term for it, LOSF. I work on SeaweedFS, https://github.com/chrislusf/seaweedfs , which is designed to handle LOSF. And of course, no problem with large files at all.
However, seems this research did not look into Apache projects, which basically maintain a different culture to encourage more contributors, so much as to encourage the main contributors to refrain from jumping to solve an issue until another person steps in first.
Check out https://github.com/chrislusf/seaweedfs/ implementation of reed solomon. Small files can still be served from 1 server.
It's also efficient for small files, which a image store requires.
You might want to look at other options as well like SeaweedFS [0] a POSIX compliant S3 compatible distributed file system.
For anyone who wants HA and horizontal elastic scalability, checkout SeaweedFS instead, it is based on the Facebook "Haystack" paper: https://github.com/chrislusf/seaweedfs
There are also SeaweedFS CSI Driver: https://github.com/seaweedfs/seaweedfs-csi-driver
With your dedicated server, the latency is consistent, No API/network cost. Extra data can be tiered to S3.
SeaweedFS as a Key-Large-Value store https://github.com/chrislusf/seaweedfs/wiki/Filer-as-a-Key-L...
Cloud Tiering https://github.com/chrislusf/seaweedfs/wiki/Cloud-Tier
And it already supports S3 API, and other HTTP, FUSE, WebDAV, Hadoop, etc.
There should be many existing hardware options that is much cheaper than AWS S3.
Not something to be proud of if including the time spent to evolve the project.
You should be extra careful on big servers with little bandwidth, you might need a month to fill/empty/rebalance them.
How will you host the images ? Metadata will become a bottleneck before hdd size.
Check out https://github.com/chrislusf/seaweedfs
It really shouldn't be this complex. I would love to just be able to boot an executable with a simple config file and be done with it. SeaweedFS shines a light on how this could be improved: https://github.com/chrislusf/seaweedfs
https://github.com/chrislusf/seaweedfs
Seaweed-FS is a simple and highly scalable distributed file system. There are two objectives: to store billions of files! to serve the files fast!