I'm going to humbly put my kubecon talk here because it's unlike any conference talk you've ever seen and I'm really happy with how it turned out. https://youtu.be/VtedIghTPzI

Really enjoyed that. I think I will also really enjoy a longer version that talks a little bit more concretely about the tech stack being used. Also, not sure if the qr code is just prop or not, scanning it didn't work.

JustinGarrison

Thanks! I was constrained by the talk length but intended it to be a starting point for future conversations if people are curious about details. Zach Bintliff has a reinvent talk in January going over some details of how we did deployments at Disney+ if you'd like some specific details about that part of the stack.

The QR code was intended to be functional and worked in editing but I never got it to work after uploading to youtube. It was just a link to justinplus.com

akritrime

I am actually curious about the details, especially the cloud storage. What kind of storage makes sense in this kind of infrastructure ? Is it just regular block volumes being mounted as ext4, or is it something more k8s-ish solution like longhorn or openEBS? I recently started learning and maintaining a k8s cluster, so I tend to really enjoy this kind of content. Will be really looking forward for the January talk.

Also, Turkey Leg and Mmm The Peas are both masterpiece.

JustinGarrison

The storage animation goes through some of it but for the basics using something like an 80/20 rule for content popularity you can also apply that to data cost. Your most popular 20% of content needs to be as close to the client as possible and will probably be the most expensive for you to store (typically in 1 or many CDNs).

The next most popular 20% will probably be semi expensive but doesn't need to be as close. This can be a subset of CDNs (where it financially makes sense) or georeplicated s3 buckets.

The lower ~60% of your content probably doesn't get accessed very often (relative to "hot" content) and depending on your content and user demands can probably live in your own data centers or POP locations.

You'll want to make it as easy as possible to promote/demote content into these tiers. Content popularity happens seasonally and in different regions so keeping the APIs similar between tiers will help you a lot. S3 is a pretty good standard to build toward so you'll probably want something like that on-prem or in your k8s clusters (minio, swift, etc.). We're also talking about each movie having potentially thousands of assets to track (different encodings for video and audio and different DRM wrappers etc) and hundreds or thousands of terabytes for big catalogs. So using immutable object storage is going to save you a TON of time instead of sticking with POSIX based block or file storage.

During the file ingest (when your encoding it) you're probably going to be dealing with POSIX based files so the encoders can work on the file but you'll want as many systems after that dealing with object storage as soon as possible.

akritrime

Hmm, makes sense. Does same principles also hold while rendering the movie? The 80/20 rule will be harder to maintain because assets can change frequently and unexpectedly during production. But I guess the need for the data to geographically replicated, with fast and easy access, remains (especially during these remote times). During those times, is s3 a still good enough choice or do you want something more robust so your rendering process is not bottlenecked by io.

JustinGarrison

Rendering is different because the assets are very interdependent and the tooling is very POSIX specific. Rendering a scene (especially with ray tracing) can use thousands of assets from the movie even if they aren't directly on screen. Artists draw and save files in tools like Autodesk Maya and renderers read files from disk and work on them while in memory.

For some assets and tooling FUSE mounted s3 can work but generally FUSE and other userspace mounters I've seen slow down artists and rendering measurably.

Think of rendering assets more like code files and git with trunk based development. You want all of your artists to use the latest assets which are daily being updated. All of the assets should be co-located and you don't want geo-replication because of latency. Even if your artists are located all over the world you'll want them to store the saved assets in one place. Where the rendering happens.

There will be different assets that are hot as the movie progresses. But you're more likely to try to keep the latest version of all assets hot rather than all versions of specific assets hot.

Most studios use HPC style environments. NFS + big compute servers connected with high bandwidth.

akritrime

Interesting. Is Kubernetes ever used as the orchestrator in HPC workloads, or rather are even containers used there? Also what are some good resources to get better at kubernetes? Currently I am mostly playing with managed k8s like digital ocean's and thinking about transitioning to k3s based bare-metal solution. So, I would really like to learn what's the right way of doing this.

JustinGarrison

There are some HPC environments that use Kubernetes but they likely use custom schedulers optimized for batch workloads (e.g. https://github.com/volcano-sh/volcano).

"containers" are often used but not always docker containers. HPC environments I've seen will often use container primitives (e.g. cgroups, namespaces).

There's a lot you can learn with managed Kubernetes and it's a great place to start. You can learn a lot of the parts of Kubernetes with running through https://github.com/kelseyhightower/kubernetes-the-hard-way or reading https://www.amazon.com/Kubernetes-Running-Dive-Future-Infras...

I'll email you to follow-up since tracking HN comments isn't a great way to have a conversation.