How to Build an Inference Service

Поділитися
Вставка
  • Опубліковано 21 січ 2025

КОМЕНТАРІ • 22

  • @Maskra_
    @Maskra_ Місяць тому

    Reasoning you provide on the configurations is gold, thanks

  • @Little-bird-told-me
    @Little-bird-told-me Місяць тому

    My fav ML channel

  • @ChrisSMurphy1
    @ChrisSMurphy1 2 місяці тому +3

    Trelis at it again..

  • @Techonsapevole
    @Techonsapevole 2 місяці тому

    thanks for the various scenarios simulations

  • @MegaClockworkDoc
    @MegaClockworkDoc 2 місяці тому

    Wonderful work.

  • @iTube4U
    @iTube4U Місяць тому

    I see trelis, i click

  • @NLPprompter
    @NLPprompter 2 місяці тому

    oH my, another cool content!

  • @danieldemillard9412
    @danieldemillard9412 2 місяці тому

    Awesome video Ronan as always. This video is very timely as we are optimizing our costs by moving away from runpod serverless. I have a couple of questions.
    - Can the service you have written scale to 0? It seems that with the minimum TPS being a positive number, this wouldn't work right? Scaling to 0 is very important for us as we have bursty traffic with long idle times and this is the primary motivation for serverless.
    - Is there any alternative to configuring the TPS scaling limits manually for each GPU/model combination? This seems a bit cumbersome. Would it be possible to scale directly based on the GPU utilization? I am thinking something like ssh'ing into the instance with paramiko and automatically running nvidia-smi (you can output results to a csv with --format=csv and --query-gpu parameters). You can then use the output of these results to determine if the GPUs are at full utilization. Maybe take a sample over your time window as this number can fluxuate a lot. Then you can use this to determine whether you need to add or subtract instances and you could use current TPS to determine if the instances is being used at all (scale to 0). Do you think this approach would work?
    - Do you only support runpod or can other clouds like vast.ai or shadeform be added as well? Both have apis that allow you to create, delete, and configure specific instances. Runpod has had many GPU shortage issues lately specifically for 48gb GPUs (A40, L4, L40, 6000 Ada etc.)
    - Is there any configuration here for Secure cloud vs. Community cloud? I think by default if you don't specify in the runpod api, it defaults to "ALL" which means that you will get whatever. Community cloud can be less stable and less secure so many users may want to only opt for Secure cloud.
    Again, I really appreciate the content you produce. For anyone who hasn't purchased access to the Trelis git repos yet, they are quite the value. Ronan consistently keeps them up-to-date with the latest models and new approaches. It is a great return on your investment and the gift that keeps on giving!

    • @TrelisResearch
      @TrelisResearch  2 місяці тому

      Howdy!
      1. Yes, if you set the min instances to zero, it will scale to zero.
      2. Scaling based on utilisation might work, yea, it's a cool idea. That may be more robust than doing TPS. Perhaps sshing might be needed or maybe there's a way to get that info from vllm, I'd have to dig.
      3. Yes, you could use other platforms by updating pod_utils.py to hit those apis. (will require some different syntax there).
      4. Secure cloud is currently hard coded, yeah, for the reasons you said.

    • @danieldemillard9412
      @danieldemillard9412 2 місяці тому

      @@TrelisResearch Awesome, thanks!

  • @fearnworks
    @fearnworks 2 місяці тому

    10/10

  • @subhamkundu5043
    @subhamkundu5043 2 місяці тому

    Hi Trelis,
    Are you planning to give a Black Friday discount?

    • @TrelisResearch
      @TrelisResearch  2 місяці тому

      Howdy!
      No Black Friday discounts. The way I work pricing is to keep it consistent and rising over time as I add new content. This way I benefit earlier supports, which I think is the right incentive.

    • @subhamkundu5043
      @subhamkundu5043 Місяць тому

      @@TrelisResearch hey thanks. Is there any plan for purchase power parity?

    • @TrelisResearch
      @TrelisResearch  Місяць тому

      @@subhamkundu5043 yeah there is some built in already, where are you based?

    • @subhamkundu5043
      @subhamkundu5043 Місяць тому

      ​@@TrelisResearch I am based in India, it will be great if there is some additional discount for PPP.

    • @TrelisResearch
      @TrelisResearch  Місяць тому

      @@subhamkundu5043 yes, it's already there in that case.
      What I recommend is just buy pieces of the repos (scripts) if the full are too expensive.

  • @mdrafatsiddiqui
    @mdrafatsiddiqui 2 місяці тому

    Hi Ronan,
    Can you guide on the best resources for algorithmic trading using ML, DL and AI.
    Also, are you planning to offer Black Friday Sale or Christmas Discount on your Trelis Advanced Repo?

    • @TrelisResearch
      @TrelisResearch  2 місяці тому +1

      Howdy! The closest thing I have done is build a neural network from scratch to train on weather forecasts. You can find that live stream and replace temperature with stock price to get a trading type tool.
      No BF or Christmas discounts. My approach is to keep pricing straightforward and rising over time as I add more content to the products - this way the earlier buyers benefit most.

    • @mdrafatsiddiqui
      @mdrafatsiddiqui Місяць тому

      @@TrelisResearch Thanks. Not going to have buyer's remorse atleast.

  • @sndrstpnv8419
    @sndrstpnv8419 20 днів тому

    why it is better than aws . everyone uses aws ?

    • @TrelisResearch
      @TrelisResearch  19 днів тому

      a) Can be cheaper
      b) On AWS it can be hard to get allocations of good GPUs unless you are a massive company.