Ok, now do it for >2tb.

Our prod hadoop dataset is now > 130tb, try that!

What kind of data do you have? Is this mostly text, or more like compressed time series?

We're analyzing genome sequence data on that scale: https://github.com/hail-is/hail