Ok, now do it for >2tb.
Our prod hadoop dataset is now > 130tb, try that!
What kind of data do you have? Is this mostly text, or more like compressed time series?
We're analyzing genome sequence data on that scale: https://github.com/hail-is/hail
Ok, now do it for >2tb.
Our prod hadoop dataset is now > 130tb, try that!
What kind of data do you have? Is this mostly text, or more like compressed time series?