What does HackerNews think of nos?

Module to Automatically maximize the utilization of GPU resources in a Kubernetes cluster through real-time dynamic partitioning and elastic quotas - Effortless optimization at its finest!

Language: Go

#29 in Kubernetes
Some of the available modules include:

Speedster: Automatically apply the best set of SOTA optimization techniques to achieve the maximum inference speed-up on your hardware. https://github.com/nebuly-ai/nebullvm/blob/main/apps/acceler...

Nos: Automatically maximize the utilization of GPU resources in a Kubernetes cluster through real-time dynamic partitioning and elastic quotas. https://github.com/nebuly-ai/nos

ChatLLaMA: Build faster and cheaper ChatGPT-like training process based on LLaMA architectures. https://github.com/nebuly-ai/nebullvm/tree/main/apps/acceler...

OpenAlphaTensor: Increase the computational performances of an AI model with custom-generated matrix multiplication algorithm fine-tuned for your specific hardware. https://github.com/nebuly-ai/nebullvm/tree/main/apps/acceler...

Forward-Forward: The Forward Forward algorithm is a method for training deep neural networks that replaces the backpropagation forward and backward passes with two forward passes. https://github.com/nebuly-ai/nebullvm/tree/main/apps/acceler...

Hi HN! I’m Michele Zanotti and today I’m releasing nos, an open-source module to efficiently run GPU workloads on Kubernetes!

Nos is meant to increase GPU utilization and cut down infrastructure and operational costs providing 2 main features:

1. Dynamic GPU Partitioning: you can think of this as a cluster autoscaler for GPUs. Instead of scaling up the number of nodes and GPUs, it dynamically partitions them into smaller “GPU slices”. This ensures that each workload only uses the GPU resources it actually needs, resulting in spare GPU capacity that could be used for other workloads. To partition GPUs, nos leverages Nvidia's MPS and MIG [1,2], finally making them dynamic.

2. Elastic Resource Quota management: it allows to increase the number of Pods running on the cluster by allowing teams (namespaces) to borrow quotas of reserved resources from other teams as long as they are not using them.

https://github.com/nebuly-ai/nos

What is your thoughts on this project? :)

Nos addresses some key challenges of Kubernetes tied to the fact that Kubernetes was not designed to support GPU and AI / machine learning workloads. In Kubernetes, GPUs are managed with [3] Nvidia k8s Device Plugin, which has a few major downsides. First, it requires the allocation of an integer number of GPUs per workload, not allowing workloads to request only fractions of GPU. Second, when enabling GPU shared access either with time-slicing or MIG, the device plugin advertises to Kubernetes a fixed set of GPU resources that do not dynamically adapt to the requests of the Pods at each time.

This often leads to both underutilized GPUs and pending Pods, and/or the cluster admin having to spend a lot of time looking for workarounds to make the best use of GPUs.

For example, consider a company with a k8s cluster with 20 GPUs, where 3 of these GPUs have been reserved for the data science team using Resource Quota objects. In most cases, the workloads of data scientists (notebooks, scripts, etc.) require much less memory/compute resources than those of an entire GPU, yet Kubernetes will force each container to consume an entire GPU. Also, if the team once needs to run a heavy workload, it may want to use as many resources as possible. However, the Resource Quota over their namespace would constrain the team to use at most the 3 GPUs reserved for them, even if the company cluster may be full of unused GPUs!

Instead, with nos the data science team would use nos Dynamic GPU Partitioning to request GPU slices so that many workloads can share the same GPU. Also, Elastic Resource Quotas would allow the team to consume more than the 3 reserved GPUs, borrowing quotas from other teams that are not using them. To recap, the team would be able to launch more Pods and the company would likely need fewer nodes. All this with minimal effort required by the cluster admin, who only has to set up nos.

Let me know what you think of nos, feedback would be very helpful! :) And please leave a star on GitHub if you like this opensource https://github.com/nebuly-ai/nos

Here are some other links that may be useful

- Tutorial on how to use Dynamic GPU Partitioning with Nvidia MIG https://towardsdatascience.com/dynamic-mig-partitioning-in-k...

- Pros and cons of different technologies for sharing resources among workloads in Kubernetes: time-slicing, MIG, and MPS https://docs.nebuly.com/nos/dynamic-gpu-partitioning/partiti...

- Nos documentation https://docs.nebuly.com/nos/overview/

References

[1] Multi-Process Service (MPS) https://docs.nvidia.com/deploy/mps/index.html

[2] Multi-Instance GPU (MIG) https://docs.nvidia.com/datacenter/tesla/mig-user-guide/

[3] NVIDIA k8s Device Plugin https://github.com/NVIDIA/k8s-device-plugin

Hi HN, I'm excited to share my latest Towards Data Science article on how to take advantage of Dynamic MIG Partitioning to efficiently run AI workloads on Kubernetes

Excited to share my latest Towards Data Science article on how to take advantage of Dynamic MIG Partitioning to efficiently run AI workloads on Kubernetes

MIG Partitioning is a way to divide GPU resources into smaller slices. This allows Pods to be scheduled only on the memory/compute resources they actually need, thus increasing GPU utilization and reducing infrastructure costs in Kubernetes clusters.

In this article, I discuss how the NVIDIA GPU Operator supports Multi-Instance GPU (MIG) in Kubernetes by exposing isolated GPU partitions as resources. I then present the limitations of static MIG configurations that can lead to inefficiencies and impracticality for cluster admins.

Finally, the article introduces Dynamic MIG Partitioning as a more effective solution to manage MIG configurations, improving GPU utilization while reducing the burden on cluster admins.

Dynamic GPU partitioning is a component of nos, an opensource I developed to maximize GPU utilization by dynamically partitioning GPUs with NVIDIA MIG and MPS and leveraging Elastic Quotas. https://github.com/nebuly-ai/nos

I hope this will be helpful to you as well. And let me know your feedback on this solution!

Hi HN, my name is Michele Zanotti and I’ve just released an open-source called nos https://github.com/nebuly-ai/nos that enables to finally make GPU partitioning dynamic on Kubernetes! (and much more) You can think of the Dynamic GPU Partitioning as a cluster autoscaler for GPUs. Instead of scaling up the number of nodes and GPUs, it dynamically partitions them into smaller “GPU slices”. This ensures that each workload only uses the GPU resources it actually needs, resulting in spare GPU capacity that could be used for other workloads. To partition GPUs, nos leverages Nvidia MIG, as well as the less known MPS, finally making them dynamic. On Towards Data Science you can find a tutorial on how to use the Dynamic GPU Partitioning https://towardsdatascience.com/dynamic-mig-partitioning-in-k...

I also wrote a review of the pros and cons of different technologies for sharing resources among workloads in Kubernetes: time-slicing, MIG, and MPS https://docs.nebuly.com/nos/dynamic-gpu-partitioning/partiti...

In addition, this open source also has other components to increase GPU utilization even more. It’s called Elastic Resource Quota management: it allows to increase the number of Pods running on the cluster by allowing teams (namespaces) to borrow quotas of reserved resources from other teams as long as they are not using them. It’s better described in nos documentation https://docs.nebuly.com/nos/overview/

Let me know your thoughts on the project!

Hi HN, my name is Michele Zanotti and I’ve just released an open-source called nos https://github.com/nebuly-ai/nos that enables to finally make GPU partitioning dynamic on Kubernetes! (and much more) You can think of the Dynamic GPU Partitioning as a cluster autoscaler for GPUs. Instead of scaling up the number of nodes and GPUs, it dynamically partitions them into smaller “GPU slices”. This ensures that each workload only uses the GPU resources it actually needs, resulting in spare GPU capacity that could be used for other workloads. To partition GPUs, nos leverages Nvidia MIG, as well as the less known MPS, finally making them dynamic.

On Towards Data Science you can find a tutorial on how to use the Dynamic GPU Partitioning https://towardsdatascience.com/dynamic-mig-partitioning-in-k...

I also wrote a review of the pros and cons of different technologies for sharing resources among workloads in Kubernetes: time-slicing, MIG, and MPS https://docs.nebuly.com/nos/dynamic-gpu-partitioning/partiti...

In addition, this open source also has other components to increase GPU utilization even more. It’s called Elastic Resource Quota management: it allows to increase the number of Pods running on the cluster by allowing teams (namespaces) to borrow quotas of reserved resources from other teams as long as they are not using them. It’s better described in nos documentation https://docs.nebuly.com/nos/overview/

Let me know your thoughts on the project!

Hi HN, my name is Michele Zanotti and I’ve just released an open-source called nos https://github.com/nebuly-ai/nos that enables to finally make GPU partitioning dynamic on Kubernetes! (and much more)

You can think of the Dynamic GPU Partitioning as a cluster autoscaler for GPUs. Instead of scaling up the number of nodes and GPUs, it dynamically partitions them into smaller “GPU slices”. This ensures that each workload only uses the GPU resources it actually needs, resulting in spare GPU capacity that could be used for other workloads. To partition GPUs, nos leverages Nvidia MIG, as well as the less known MPS, finally making them dynamic.

On Towards Data Science you can find a tutorial on how to use the Dynamic GPU Partitioning https://towardsdatascience.com/dynamic-mig-partitioning-in-k...

I also wrote a review of the pros and cons of different technologies for sharing resources among workloads in Kubernetes: time-slicing, MIG, and MPS https://docs.nebuly.com/nos/dynamic-gpu-partitioning/partiti...

In addition, this open source also has other components to increase GPU utilization even more. It’s called Elastic Resource Quota management: it allows to increase the number of Pods running on the cluster by allowing teams (namespaces) to borrow quotas of reserved resources from other teams as long as they are not using them. It’s better described in nos documentation https://docs.nebuly.com/nos/overview/

Let me know your thoughts on the project!

Hi HN, my name is Michele Zanotti and I’ve just released an open-source called nos https://github.com/nebuly-ai/nos that enables to finally make GPU partitioning dynamic on Kubernetes! (and much more)

You can think of the Dynamic GPU Partitioning as a cluster autoscaler for GPUs. Instead of scaling up the number of nodes and GPUs, it dynamically partitions them into smaller “GPU slices”. This ensures that each workload only uses the GPU resources it actually needs, resulting in spare GPU capacity that could be used for other workloads. To partition GPUs, nos leverages Nvidia MIG, as well as the less known MPS, finally making them dynamic.

On Towards Data Science you can find a tutorial on how to use the Dynamic GPU Partitioning https://towardsdatascience.com/dynamic-mig-partitioning-in-k...

I also wrote a review of the pros and cons of different technologies for sharing resources among workloads in Kubernetes: time-slicing, MIG, and MPS https://docs.nebuly.com/nos/dynamic-gpu-partitioning/partiti...

In addition, this open source also has other components to increase GPU utilization even more. It’s called Elastic Resource Quota management: it allows to increase the number of Pods running on the cluster by allowing teams (namespaces) to borrow quotas of reserved resources from other teams as long as they are not using them. It’s better described in nos documentation https://docs.nebuly.com/nos/overview/

Let me know your thoughts on the project!

Hi HN, my name is Michele Zanotti and I’ve just released an open-source called nos https://github.com/nebuly-ai/nos that enables to finally make GPU partitioning dynamic on Kubernetes! (and much more)

You can think of the Dynamic GPU Partitioning as a cluster autoscaler for GPUs. Instead of scaling up the number of nodes and GPUs, it dynamically partitions them into smaller “GPU slices”. This ensures that each workload only uses the GPU resources it actually needs, resulting in spare GPU capacity that could be used for other workloads. To partition GPUs, nos leverages Nvidia MIG, as well as the less known MPS, finally making them dynamic.

On Towards Data Science you can find a tutorial on how to use the Dynamic GPU Partitioning https://towardsdatascience.com/dynamic-mig-partitioning-in-k...

I also wrote a review of the pros and cons of different technologies for sharing resources among workloads in Kubernetes: time-slicing, MIG, and MPS https://docs.nebuly.com/nos/dynamic-gpu-partitioning/partiti...

In addition, this open source also has other components to increase GPU utilization even more. It’s called Elastic Resource Quota management: it allows to increase the number of Pods running on the cluster by allowing teams (namespaces) to borrow quotas of reserved resources from other teams as long as they are not using them. It’s better described in nos documentation https://docs.nebuly.com/nos/overview/

Let me know your thoughts on the project!