What does HackerNews think of minio?

High Performance Object Storage for AI

Language: Go

#9 in Go
#2 in Kubernetes
#4 in Kubernetes
>Again, here you seem to be arguing against a strawman that doesn't know that blocking the IO loop is bad. Try arguing against one that knows ways to work around that. This is why I'm saying this rule isn't true. Extensive computation on single-threaded "scripting" languages is possible (and even if it wasn't, punt it off to a remote pool of workers, which could also be NodeJS!).

Very rare to find a rule that's absolutely true.. I clearly stated exceptions to the rule (which you repeated) but the generality is still true.

Threading in nodejs is new and didn't exist since the last time I touched it. It looks like it's not the standard use case as google searches still have websites with titles saying node is single threaded everywhere. The only way I can see this being done is multiple Processes (meaning each with a copy of v8) using OS shared memory as IPC and they're just calling it threads. It will take a shit load of work to make v8 actually multi-threaded.

Processes are expensive so you can't really follow this model per request. And we stopped following threading per request over a decade ago.

Again these are exceptions to the rule, from what I'm reading Nodejs is normally still single threaded with a fixed number of worker processes that are called "threads". Under this my general rule is still generally true: backend engineering does no typically involve writing non blocking code and offloading compute to other sources. Again, there are exceptions but as I stated before these exceptions are rare.

>Here's what I mean -- you can actually solve the ordering problem in O(N) + O(M) time by keeping track of the max you've seen and building a sparse array and running through every single index from max to zero. It's overkill, but it's generally referred to as a counting sort:

Oh come on. We both know these sorts won't work. These large numbers will throw off memory. Imagine 3 routes. One route gets 352 hits, another route gets 400 hits, and another route gets 600,000 hits. What's Big Oh for memory and sort?

It's O(600,000) for both memory and runtime. N=3 and it doesn't even matter here. Yeah these types of sorts are almost never used for this reason, they only work for things with smaller ranges. It's also especially not useful for this project. Like this project was designed so "counting sort" fails big time.

Also we don't need to talk about the O(N) read and write. That's a given it's always there.

>I don't think these statements make sense -- having docker installed and having redis installed are basically equivalent work. At the end of the day, the outcome is the same -- the developer is capable of running redis locally. Having redis installed on your local machine is absolutely within range for a backend developer.

Unfortunately these statements do make sense and your characterization seems completely dishonest to me. People like to keep their local environments pure and segregated away from daemons that run in a web server. I'm sure in your universe you are claiming web developers install redis, postgresql and kafka all locally but that just sounds absurd to me. We can agree to disagree but from my perspective I don't think you're being realistic here.

>Also, remote development is not practiced by many companies -- the only companies I've seen doing thin-clients that are large.

It's practiced by a large amount and basically every company I've worked at for the past 5 years. Every company has to at least partially do remote dev in order to fully test E2E stuff or integrations.

>I see it as just spinning up docker, not compose -- you already have access to the app (ex. if it was buildable via a function) so you could spawn redis in a subprocess (or container) on a random port, and then spawn the app.

Sure. The point is it's hacky to do this without an existing framework. I'll check out that library you linked.

>I agree that integration testing is harder -- I think there's more value there.

Of course there's more value. You get more value at higher cost. That's been my entire point.

>Also, for replicating S3, minio (https://github.com/minio/minio) is a good stand-in. For replicating lambda, localstack (https://docs.localstack.cloud/user-guide/aws/lambda/) is probably reasonable there's also frameworks with some consideration for this (https://www.serverless.com/framework/docs/providers/aws/guid...) built in.

Good finds. But what about SNS, IOT, Big Query and Redshift? Again my problem isn't about specific services, it's about infra in general.

>Ah, this is true -- but I think this is what people are testing in interviews. There is a predominant culture/shared values, and the test is literally whether someone can fit into those values.

No. I think what's going on is people aren't putting much thought into what they're actually interviewing for. They just have some made up bar in their mind whether it's a leetcode algorithm or whether the guy wrote a unit test for the one available pure function for testing.

>Whether they should or should not be, that's at least partially what interviews are -- does the new team member feel the same way about technical culture currently shared by the team.

The answer is no. There's always developers who disagree with things and just don't reveal it. Think about the places you worked at. Were you in total agreement? I doubt it. A huge amount of devs are opinionated and think company policies or practices are BS. People adapt.

>Now in the case of this interview your solution was just fine, even excellent (because you went out of your way to do async io, use newer/easier packaging methodologies, etc), but it's clearly not just that.

The testing is just a game. I can play the game and suddenly I pass all the interviews. I think this is the flaw with your methodology as I just need to write tests to get in. Google for example in spirit attempted another method which involves testing IQ via algorithms. It's a much higher bar

The problem with google is that their methodology can also be gamed but it's much harder to game it and often the bar is too high for the actual job the engineer is expected to do.

I think both methodologies are flawed, but hiring via ignoring raw ability and picking people based off of weirdly specific cultural preferences is the worse of the two hiring methodologies.

Put it this way. If a company has a strong testing culture, then engineers who don't typically test things will adapt. It's not hard to do, and testing isn't so annoying that they won't do it.

You say protocol alternative, but assuming you're more concerned with AWS as the host than S3 as the protocol you might try https://github.com/minio/minio

If you do feel an aversion to the protocol then the rclone backend list would be a good starting point

https://rclone.org/overview/

I like recent (v3) SMB over the network personally.

Maybe Minio: https://github.com/minio/minio / https://min.io

I've only used it as a fairly straight forward object store though, so not sure about privileges/permissions (etc).

> we are thinking of switching to a distributed object store for our public server

As a data point, we're using Minio (https://github.com/minio/minio) for the object store on our backend. It's been very reliable, though we're not at a point where we're pushing things very hard. :)

Minio is a self hostable object storage project https://github.com/minio/minio
> I still run a NAS at home, and it's terrible... I prefer self-hosted...

One could self-host minio which is nice S3 replacement: https://github.com/minio/minio

In next 2 weeks, we are releasing Redshift as destination. After that, we have PostgreSQL destination in our pipeline. You can configure it and capture everything there.

If you prefer that to be in files, you could setup a minio server on your VPS instance. It's coming in next weeks too. https://github.com/minio/minio

We would like to understand your preference on this so that we could align our next set of destinations.

Please drop an email at [email protected] or join our Slack/Discord channels.

That's a good point, though there are on-prem and "other vendor" api compatible services for some of those, like S3.

Minio, for example: https://github.com/minio/minio or Google's cloud storage. Both are compatible with the S3 api.

If your data storage requirement is semi-constant at 1.5TB and is ephemeral, have you looked at just using your own replicated servers ?

That would probably turn out to be cheaper than using S3 - https://github.com/minio/minio

Dangit, I trust the CoreOS team more/better than a lot of people in the space. Torus would have been so useful.

At the other end of the spectrum though, maybe this is reasonable? As a developer, my first thoughts for "I want my own S3" is not etcd (strong consistency) but projects like https://github.com/minio/minio , or even using eventually consistent SQLite replication / synchronization tools https://github.com/gundb/sqlite .

So that makes me ask about rook.io too, what layer of the "stack" is it trying to fit into? Obviously pretty low, but that also seems unnecessary (and part of why I suspect Torus is stopping).

Running this without amazon seems to require replacing the hardcoded "{}://{}.s3{}.amazonaws.com" url with the location where the s3 compatible service is running on the local network. Now if that was just a default and could be overwritten with an env var as well... :)

And minio [1] seems to be the easiest pseudo-s3 there is ($ ./minio server CacheDir/, done?), or are there better alternatives by now?

1: https://github.com/minio/minio

Yes true. OpenSSL version gives a great boost. Rather than SHA512 we ended up using Blake2b - https://github.com/minio/blake2b-simd (also optimized with SIMD instructions) internally for bit-rot verification in https://github.com/minio/minio
Another project to be added to this list would be

https://github.com/minio/minio - Minio is an object storage server compatible with Amazon S3 and licensed under Apache 2.0 License.

Disclaimer: I work for Minio.

Yup, it does. https://github.com/minio/minio has all details, will be happy to help & would love to run s3io on Minio. :)
I recently came across minio[1] and was blown away. It is feature complete and comes with a beautiful web interface, though you can't delete files via the web interface yet[2].

[1] https://github.com/minio/minio [2] https://github.com/minio/miniobrowser/issues/154