Awesome video Ronan as always. This video is very timely as we are optimizing our costs by moving away from runpod serverless. I have a couple of questions. - Can the service you have written scale to 0? It seems that with the minimum TPS being a positive number, this wouldn't work right? Scaling to 0 is very important for us as we have bursty traffic with long idle times and this is the primary motivation for serverless. - Is there any alternative to configuring the TPS scaling limits manually for each GPU/model combination? This seems a bit cumbersome. Would it be possible to scale directly based on the GPU utilization? I am thinking something like ssh'ing into the instance with paramiko and automatically running nvidia-smi (you can output results to a csv with --format=csv and --query-gpu parameters). You can then use the output of these results to determine if the GPUs are at full utilization. Maybe take a sample over your time window as this number can fluxuate a lot. Then you can use this to determine whether you need to add or subtract instances and you could use current TPS to determine if the instances is being used at all (scale to 0). Do you think this approach would work? - Do you only support runpod or can other clouds like vast.ai or shadeform be added as well? Both have apis that allow you to create, delete, and configure specific instances. Runpod has had many GPU shortage issues lately specifically for 48gb GPUs (A40, L4, L40, 6000 Ada etc.) - Is there any configuration here for Secure cloud vs. Community cloud? I think by default if you don't specify in the runpod api, it defaults to "ALL" which means that you will get whatever. Community cloud can be less stable and less secure so many users may want to only opt for Secure cloud. Again, I really appreciate the content you produce. For anyone who hasn't purchased access to the Trelis git repos yet, they are quite the value. Ronan consistently keeps them up-to-date with the latest models and new approaches. It is a great return on your investment and the gift that keeps on giving!
Howdy! 1. Yes, if you set the min instances to zero, it will scale to zero. 2. Scaling based on utilisation might work, yea, it's a cool idea. That may be more robust than doing TPS. Perhaps sshing might be needed or maybe there's a way to get that info from vllm, I'd have to dig. 3. Yes, you could use other platforms by updating pod_utils.py to hit those apis. (will require some different syntax there). 4. Secure cloud is currently hard coded, yeah, for the reasons you said.
Howdy! No Black Friday discounts. The way I work pricing is to keep it consistent and rising over time as I add new content. This way I benefit earlier supports, which I think is the right incentive.
Hi Ronan, Can you guide on the best resources for algorithmic trading using ML, DL and AI. Also, are you planning to offer Black Friday Sale or Christmas Discount on your Trelis Advanced Repo?
Howdy! The closest thing I have done is build a neural network from scratch to train on weather forecasts. You can find that live stream and replace temperature with stock price to get a trading type tool. No BF or Christmas discounts. My approach is to keep pricing straightforward and rising over time as I add more content to the products - this way the earlier buyers benefit most.
Reasoning you provide on the configurations is gold, thanks
My fav ML channel
Trelis at it again..
thanks for the various scenarios simulations
Wonderful work.
I see trelis, i click
oH my, another cool content!
Awesome video Ronan as always. This video is very timely as we are optimizing our costs by moving away from runpod serverless. I have a couple of questions.
- Can the service you have written scale to 0? It seems that with the minimum TPS being a positive number, this wouldn't work right? Scaling to 0 is very important for us as we have bursty traffic with long idle times and this is the primary motivation for serverless.
- Is there any alternative to configuring the TPS scaling limits manually for each GPU/model combination? This seems a bit cumbersome. Would it be possible to scale directly based on the GPU utilization? I am thinking something like ssh'ing into the instance with paramiko and automatically running nvidia-smi (you can output results to a csv with --format=csv and --query-gpu parameters). You can then use the output of these results to determine if the GPUs are at full utilization. Maybe take a sample over your time window as this number can fluxuate a lot. Then you can use this to determine whether you need to add or subtract instances and you could use current TPS to determine if the instances is being used at all (scale to 0). Do you think this approach would work?
- Do you only support runpod or can other clouds like vast.ai or shadeform be added as well? Both have apis that allow you to create, delete, and configure specific instances. Runpod has had many GPU shortage issues lately specifically for 48gb GPUs (A40, L4, L40, 6000 Ada etc.)
- Is there any configuration here for Secure cloud vs. Community cloud? I think by default if you don't specify in the runpod api, it defaults to "ALL" which means that you will get whatever. Community cloud can be less stable and less secure so many users may want to only opt for Secure cloud.
Again, I really appreciate the content you produce. For anyone who hasn't purchased access to the Trelis git repos yet, they are quite the value. Ronan consistently keeps them up-to-date with the latest models and new approaches. It is a great return on your investment and the gift that keeps on giving!
Howdy!
1. Yes, if you set the min instances to zero, it will scale to zero.
2. Scaling based on utilisation might work, yea, it's a cool idea. That may be more robust than doing TPS. Perhaps sshing might be needed or maybe there's a way to get that info from vllm, I'd have to dig.
3. Yes, you could use other platforms by updating pod_utils.py to hit those apis. (will require some different syntax there).
4. Secure cloud is currently hard coded, yeah, for the reasons you said.
@@TrelisResearch Awesome, thanks!
10/10
Hi Trelis,
Are you planning to give a Black Friday discount?
Howdy!
No Black Friday discounts. The way I work pricing is to keep it consistent and rising over time as I add new content. This way I benefit earlier supports, which I think is the right incentive.
@@TrelisResearch hey thanks. Is there any plan for purchase power parity?
@@subhamkundu5043 yeah there is some built in already, where are you based?
@@TrelisResearch I am based in India, it will be great if there is some additional discount for PPP.
@@subhamkundu5043 yes, it's already there in that case.
What I recommend is just buy pieces of the repos (scripts) if the full are too expensive.
Hi Ronan,
Can you guide on the best resources for algorithmic trading using ML, DL and AI.
Also, are you planning to offer Black Friday Sale or Christmas Discount on your Trelis Advanced Repo?
Howdy! The closest thing I have done is build a neural network from scratch to train on weather forecasts. You can find that live stream and replace temperature with stock price to get a trading type tool.
No BF or Christmas discounts. My approach is to keep pricing straightforward and rising over time as I add more content to the products - this way the earlier buyers benefit most.
@@TrelisResearch Thanks. Not going to have buyer's remorse atleast.
why it is better than aws . everyone uses aws ?
a) Can be cheaper
b) On AWS it can be hard to get allocations of good GPUs unless you are a massive company.