I've worked on Ambry at LinkedIn for a little while, I'd be happy to answer any questions about architecture or things we've done since 2016. I wasn't part of the original team. One thing I would call attention to from the article:
> it’s key-value based approach to interacting with blobs doesn’t support file-system like capabilities, posing more of a burden on the user of the system (who must manage metadata and relationships between entities themselves).
I think this trade-off is one Ambry's strongest design decisions. By giving up key-value access, Ambry gets to dictate the location of an object at write time. When a partition fills up, set it to read-only and create new partitions on new hosts. By having Ambry generate the blob ID, the system can embed information (like the partition number) right in the ID. With a key-value approach you need to worry about balancing (and re-balancing) the key space over your topology. With dense storage nodes, re-balancing is VERY expensive.
Also--most applications don't actually need key-value access. For storing something like media (think: LinkedIn profile photo), you've already got a database row for the user profile; now one of those fields is a reference to your object store. It might as well be a storage-generated reference instead of one where the application tries to manage reference uniqueness and ends up using UUIDs or something similar anyway.
Apologies for the new account, I try to keep my main HN account semi-anonymous.
> Ambry is a distributed object store that supports storage of trillions of small immutable objects (50K -100K) as well as billions of large objects.
1. What is considered to be the size of a “large object”? 2. Can Ambry handle large object multipart uploads or would one have to build this themselves by storing chunks as separate objects?