It's a bit eerie because 3 years after the video was posted and out of a sudden I opened it for the first time and he correctly reminds me that I am watching it on a Tuesday.
Another easy non-daemonset approach to doing this is edit the nodes to add the labels, then using affinity or topology spread constraints, but I guess the daemonset approach is if you’re running a giant bazillion node cluster and you want it to do it automatically when you join in more nodes. Do you know if any special setup needs to happen on the hosts prior to rolling out the nvidia image to it? Also - any recommendations on which driver containers to use (like what repo/image the daemonset is trying to pull)?
you are right. Another consideration is if your cluster is changing size, aka, you scale up once a week to do a full retrain ect. This makes adding GPU nodes easier. This is an older video though, maybe I should update it to be a little more modern.
Hello, i have configured kubeflow on k8s cluster, However i have recently added gpu nodes to my kubernetes cluster which is on premise. Need to make sure now the notebook servers in kubeflow make use of gpu nodes…how do i achieve it
Hey! I see you have also joined the discord server and asked the question there! I have responded there as it's a bit easier to have a back and forth! Hope you found these videos useful and thank you for watching!
I'm actively perusing this. At this point with nvidia, I don't think it is possible. There is things like www.nvidia.com/en-us/data-center/virtual-gpu-technology/ but that requires both extra software and hardware licencing, and as nvidia does not sponsor me... it is very expensive. Now you can run multiple containers, just not in parallel.
@@MrBillyClanton This is correct. And even if you have the A100 it is my understanding you have to pay/licence for some extra software too do this. it is a shame honestly. though 3 2060S are about the same price as a 2080, so really it's best to get 3 of those, and lets you run 3 containers. (same holds true for previous models as well.
I think that the operator can do this, though I have not tried to do this directly. I have put it on my list of things to check in the future. You CAN do this if you install them manually and expose them to the cluster manually. This I know works (and thus am pretty sure that the operator can do it... if it does not I doubt it would be hard to update it) Do mind you, that you would need two separate nodes, as these are not the drivers "in" the container, but on the system we are talking about. So not even a A100 with gpu virtualization would support two sets of drivers on the same gpu.
Kind of maybe. Unless you did not know this existed. Also if you want to get me enough money to have double GPUs to make this work I would gladly update it with a demo!
It's a bit eerie because 3 years after the video was posted and out of a sudden I opened it for the first time and he correctly reminds me that I am watching it on a Tuesday.
I'm good! All this ML has paid off!
Same here. lol
Could you do a video on installation?
absolutely! Thanks for asking!
If you have Helm installed do:
helm install --wait --generate-name \
-n gpu-operator --create-namespace
nvidia/gpu-operator
Check with:
kubectl get pods -n gpu-operator
Another easy non-daemonset approach to doing this is edit the nodes to add the labels, then using affinity or topology spread constraints, but I guess the daemonset approach is if you’re running a giant bazillion node cluster and you want it to do it automatically when you join in more nodes.
Do you know if any special setup needs to happen on the hosts prior to rolling out the nvidia image to it?
Also - any recommendations on which driver containers to use (like what repo/image the daemonset is trying to pull)?
you are right. Another consideration is if your cluster is changing size, aka, you scale up once a week to do a full retrain ect. This makes adding GPU nodes easier. This is an older video though, maybe I should update it to be a little more modern.
with ARM image EKS still need to support GPU over it, by which only nvidia ds can be utilize
Thanks for the info! I have not used ARM image on EKS! Thanks!
But how to make multiple pods access the same GPU?
To do this you require special virtualization software from Nvidia. At least that is they way I know of! :)
Hello, i have configured kubeflow on k8s cluster, However i have recently added gpu nodes to my kubernetes cluster which is on premise. Need to make sure now the notebook servers in kubeflow make use of gpu nodes…how do i achieve it
Hey! I see you have also joined the discord server and asked the question there! I have responded there as it's a bit easier to have a back and forth! Hope you found these videos useful and thank you for watching!
@@NullLabs yes thank you
So 1 gpu for multiple dockers is a deadend idea?
I'm actively perusing this. At this point with nvidia, I don't think it is possible. There is things like www.nvidia.com/en-us/data-center/virtual-gpu-technology/ but that requires both extra software and hardware licencing, and as nvidia does not sponsor me... it is very expensive. Now you can run multiple containers, just not in parallel.
If I figure anything out in this area you all will be one of the first to know.
Yes, unless you are using A100
www.nvidia.com/en-us/technologies/multi-instance-gpu/
@@MrBillyClanton This is correct. And even if you have the A100 it is my understanding you have to pay/licence for some extra software too do this. it is a shame honestly. though 3 2060S are about the same price as a 2080, so really it's best to get 3 of those, and lets you run 3 containers. (same holds true for previous models as well.
@@NullLabs Do you have any news about this? Because it seems to be possible "by accident": github.com/NVIDIA/gpu-operator/issues/28
Is it possible to run multiple driver versions? Have 2 containers and each have different driver version?
I think that the operator can do this, though I have not tried to do this directly. I have put it on my list of things to check in the future.
You CAN do this if you install them manually and expose them to the cluster manually. This I know works (and thus am pretty sure that the operator can do it... if it does not I doubt it would be hard to update it) Do mind you, that you would need two separate nodes, as these are not the drivers "in" the container, but on the system we are talking about. So not even a A100 with gpu virtualization would support two sets of drivers on the same gpu.
V V V Slow
Nice!
A video without demo, useless.
Kind of maybe. Unless you did not know this existed. Also if you want to get me enough money to have double GPUs to make this work I would gladly update it with a demo!
@@NullLabs Yes you can use k8s on docker and at least make a good demo on sharing a single gpu with multiple containers, hope this doesn't cost you