Why Build Enterprise RAG with Postgres?

Multi modal Audio + Text Fine tuning and Inference with Qwen

Advanced Embedding Models and Techniques for RAG

How Strong Is Tape?

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts

Cat mode and a glass of water #family #humor #fun

How to Build an Inference Service

Trelis Research

Переглядів 2 013

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 21 січ 2025

КОМЕНТАРІ • 22

@Maskra_ Місяць тому
Reasoning you provide on the configurations is gold, thanks
@Little-bird-told-me Місяць тому
My fav ML channel
@ChrisSMurphy1 2 місяці тому ⁺³
Trelis at it again..
@Techonsapevole 2 місяці тому
thanks for the various scenarios simulations
@MegaClockworkDoc 2 місяці тому
Wonderful work.
@iTube4U Місяць тому
I see trelis, i click
@NLPprompter 2 місяці тому
oH my, another cool content!
@danieldemillard9412 2 місяці тому
Awesome video Ronan as always. This video is very timely as we are optimizing our costs by moving away from runpod serverless. I have a couple of questions.
- Can the service you have written scale to 0? It seems that with the minimum TPS being a positive number, this wouldn't work right? Scaling to 0 is very important for us as we have bursty traffic with long idle times and this is the primary motivation for serverless.
- Is there any alternative to configuring the TPS scaling limits manually for each GPU/model combination? This seems a bit cumbersome. Would it be possible to scale directly based on the GPU utilization? I am thinking something like ssh'ing into the instance with paramiko and automatically running nvidia-smi (you can output results to a csv with --format=csv and --query-gpu parameters). You can then use the output of these results to determine if the GPUs are at full utilization. Maybe take a sample over your time window as this number can fluxuate a lot. Then you can use this to determine whether you need to add or subtract instances and you could use current TPS to determine if the instances is being used at all (scale to 0). Do you think this approach would work?
- Do you only support runpod or can other clouds like vast.ai or shadeform be added as well? Both have apis that allow you to create, delete, and configure specific instances. Runpod has had many GPU shortage issues lately specifically for 48gb GPUs (A40, L4, L40, 6000 Ada etc.)
- Is there any configuration here for Secure cloud vs. Community cloud? I think by default if you don't specify in the runpod api, it defaults to "ALL" which means that you will get whatever. Community cloud can be less stable and less secure so many users may want to only opt for Secure cloud.
Again, I really appreciate the content you produce. For anyone who hasn't purchased access to the Trelis git repos yet, they are quite the value. Ronan consistently keeps them up-to-date with the latest models and new approaches. It is a great return on your investment and the gift that keeps on giving!
@TrelisResearch 2 місяці тому
Howdy!
1. Yes, if you set the min instances to zero, it will scale to zero.
2. Scaling based on utilisation might work, yea, it's a cool idea. That may be more robust than doing TPS. Perhaps sshing might be needed or maybe there's a way to get that info from vllm, I'd have to dig.
3. Yes, you could use other platforms by updating pod_utils.py to hit those apis. (will require some different syntax there).
4. Secure cloud is currently hard coded, yeah, for the reasons you said.
@danieldemillard9412 2 місяці тому
@@TrelisResearch Awesome, thanks!
@fearnworks 2 місяці тому
10/10
@subhamkundu5043 2 місяці тому
Hi Trelis,
Are you planning to give a Black Friday discount?
@TrelisResearch 2 місяці тому
Howdy!
No Black Friday discounts. The way I work pricing is to keep it consistent and rising over time as I add new content. This way I benefit earlier supports, which I think is the right incentive.
@subhamkundu5043 Місяць тому
@@TrelisResearch hey thanks. Is there any plan for purchase power parity?
@TrelisResearch Місяць тому
@@subhamkundu5043 yeah there is some built in already, where are you based?
@subhamkundu5043 Місяць тому
@@TrelisResearch I am based in India, it will be great if there is some additional discount for PPP.
@TrelisResearch Місяць тому
@@subhamkundu5043 yes, it's already there in that case.
What I recommend is just buy pieces of the repos (scripts) if the full are too expensive.
@mdrafatsiddiqui 2 місяці тому
Hi Ronan,
Can you guide on the best resources for algorithmic trading using ML, DL and AI.
Also, are you planning to offer Black Friday Sale or Christmas Discount on your Trelis Advanced Repo?
@TrelisResearch 2 місяці тому ⁺¹
Howdy! The closest thing I have done is build a neural network from scratch to train on weather forecasts. You can find that live stream and replace temperature with stock price to get a trading type tool.
No BF or Christmas discounts. My approach is to keep pricing straightforward and rising over time as I add more content to the products - this way the earlier buyers benefit most.
@mdrafatsiddiqui Місяць тому
@@TrelisResearch Thanks. Not going to have buyer's remorse atleast.
@sndrstpnv8419 20 днів тому
why it is better than aws . everyone uses aws ?
@TrelisResearch 19 днів тому
a) Can be cheaper
b) On AWS it can be hard to get allocations of good GPUs unless you are a massive company.

Наступне

Автоматичне відтворення

Why Build Enterprise RAG with Postgres?

Why Build Enterprise RAG with Postgres?

Multi modal Audio + Text Fine tuning and Inference with Qwen

Multi modal Audio + Text Fine tuning and Inference with Qwen

Advanced Embedding Models and Techniques for RAG

Advanced Embedding Models and Techniques for RAG

How Strong Is Tape?

How Strong Is Tape?

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts

Cat mode and a glass of water #family #humor #fun

Cat mode and a glass of water #family #humor #fun

TOY STORY IN BRAWL STARS!?

TOY STORY IN BRAWL STARS!?

The intro to Docker I wish I had when I started

The intro to Docker I wish I had when I started

Charlie Snell, UC Berkeley. Title: Scaling LLM Test-Time Compute

Charlie Snell, UC Berkeley. Title: Scaling LLM Test-Time Compute

How difficult is AI alignment? | Anthropic Research Salon

How difficult is AI alignment? | Anthropic Research Salon

Serve Multiple LoRA Adapters on a Single GPU

Serve Multiple LoRA Adapters on a Single GPU

Multi GPU Fine tuning with DDP and FSDP

Multi GPU Fine tuning with DDP and FSDP

Kubernetes Autoscaling: HPA vs. VPA vs. Keda vs. CA vs. Karpenter vs. Fargate

Kubernetes Autoscaling: HPA vs. VPA vs. Keda vs. CA vs. Karpenter vs. Fargate

Ilya Sutskever: OpenAI Meta-Learning and Self-Play | MIT Artificial General Intelligence (AGI)

Ilya Sutskever: OpenAI Meta-Learning and Self-Play | MIT Artificial General Intelligence (AGI)

How to Build Effective AI Agents (without the hype)

How to Build Effective AI Agents (without the hype)

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

Как найти себе жену? Больше - тут @stas.yornik.shorts

Как найти себе жену? Больше - тут @stas.yornik.shorts

Тайское мороженое в Калининграде

Тайское мороженое в Калининграде

#JasonDeruloTV // Funny #GotPermissionToPost From @SofiManassyan #SlowLow

#JasonDeruloTV // Funny #GotPermissionToPost From @SofiManassyan #SlowLow

ВОТ ПОЧЕМУ Япония живет в будущем 🤫 Утилизация масла #япония #токио #путешествия #shorts

ВОТ ПОЧЕМУ Япония живет в будущем 🤫 Утилизация масла #япония #токио #путешествия #shorts

ПРАНК НАД БОЯРСКИМ | КОНФЛИКТ НА ДОРОГЕ

ПРАНК НАД БОЯРСКИМ | КОНФЛИКТ НА ДОРОГЕ

How to treat Acne💉

How to treat Acne💉

СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️

СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️

СКАНДАЛЬНЫЙ бой Али, когда в ринге ему противостояли сразу ДВОЕ #shorts

СКАНДАЛЬНЫЙ бой Али, когда в ринге ему противостояли сразу ДВОЕ #shorts