Qwen-Agent: Build Autonomous Agents with The Best Open Weight Model

Self-Host and Deploy Local LLAMA-3 with NIMs

Cheap mini runs a 70B LLM 🤯

⚡5 МИНУТ НАЗАД! ТРАМП СДЕЛАЛ ПЕРВОЕ ЗАЯВЛЕНИЕ ПОСЛЕ ПОБЕДЫ! ЧТО БУДЕ ДАЛЬШЕ?

БАСКЕТБОЛИСТЫ ИГРАЮТ В НАСТОЛЬНЫЙ ТЕННИС #иванабрамов #дедищев #баскетбол #пингпонг #shorts

🔴ЗСУ ЖОРСТОКО ПОМСТИЛИСЬ! СПЕЦНАЗ РФ - РОЗНЕСЛИ В ХЛАМ! КОРЕЙЦІВ ВЖЕ ПАКУЮТЬ У ЧОРНІ ПАКЕТИ!

Deploy Open LLMs with LLAMA-CPP Server

Prompt Engineering

Переглядів 10 104

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 7 лис 2024

КОМЕНТАРІ • 12

@engineerprompt 5 місяців тому
If you want to build robust RAG applications based on your own datasets, this is for you: prompt-s-site.thinkific.com/courses/rag
@unclecode 4 місяці тому ⁺³
👏 I'm glad to see you're focusing on DevOps options for AI apps. In my opinion, LlamaCpp will remain the best way to launch a production LLM server. One notable feature is its support for hardware-level concurrency. Using the `-np 4` (or `--parallel 4`) flag allows running 4 slots in parallel, where 4 can be any number of concurrent requests you want.
One thing to remember the context window will be divided accordingly. For example, if you pass `-c 4096`, each slot will have a context size of 1024. Adding the `--n-gpu-layers` (`-ngl 99`) flag will offload the model layers to your GPU, providing the best performance. So, a command like `-c 4096 -np 4 -ngl 99` will offer excellent concurrency on a machine with a 4090 GPU.
@johnkost2514 4 місяці тому
Mozilla's Llamafile format is very flexible for deploying LLM(s) across operating systems. NIM has the advantage of bundling other types of models like audio or video.
@Nihilvs 5 місяців тому ⁺¹
amazing thanks !
@thecodingchallengeshow 3 місяці тому
can we finetune it using lora? i need it to be about ai so i have doqnloded data about ai and i want to add it to this model
@marcaodd 4 місяці тому
Which server specs did you use?
@engineerprompt 4 місяці тому
Its running on A6000 with 48GB vRAM. Hope that helps.
@andreawijayakusuma6008 4 місяці тому
bro, I wanna ask, do I need to use GPU to run this ?
@sadsagftrwre 4 місяці тому ⁺¹
No, llama-cpp specifically enables llms on cpus. its just going to be a bit slow, mate.
@andreawijayakusuma6008 4 місяці тому ⁺¹
@@sadsagftrwre oke thanks for the answer. I just want to tried it but afraid it won't worked without GPU.
@sadsagftrwre 4 місяці тому
@@andreawijayakusuma6008 I tried on CPU and it worked.

Наступне

Автоматичне відтворення

Qwen-Agent: Build Autonomous Agents with The Best Open Weight Model

Qwen-Agent: Build Autonomous Agents with The Best Open Weight Model

Self-Host and Deploy Local LLAMA-3 with NIMs

Self-Host and Deploy Local LLAMA-3 with NIMs

Cheap mini runs a 70B LLM 🤯

Cheap mini runs a 70B LLM 🤯

⚡5 МИНУТ НАЗАД! ТРАМП СДЕЛАЛ ПЕРВОЕ ЗАЯВЛЕНИЕ ПОСЛЕ ПОБЕДЫ! ЧТО БУДЕ ДАЛЬШЕ?

⚡5 МИНУТ НАЗАД! ТРАМП СДЕЛАЛ ПЕРВОЕ ЗАЯВЛЕНИЕ ПОСЛЕ ПОБЕДЫ! ЧТО БУДЕ ДАЛЬШЕ?

БАСКЕТБОЛИСТЫ ИГРАЮТ В НАСТОЛЬНЫЙ ТЕННИС #иванабрамов #дедищев #баскетбол #пингпонг #shorts

БАСКЕТБОЛИСТЫ ИГРАЮТ В НАСТОЛЬНЫЙ ТЕННИС #иванабрамов #дедищев #баскетбол #пингпонг #shorts

🔴ЗСУ ЖОРСТОКО ПОМСТИЛИСЬ! СПЕЦНАЗ РФ - РОЗНЕСЛИ В ХЛАМ! КОРЕЙЦІВ ВЖЕ ПАКУЮТЬ У ЧОРНІ ПАКЕТИ!

🔴ЗСУ ЖОРСТОКО ПОМСТИЛИСЬ! СПЕЦНАЗ РФ – РОЗНЕСЛИ В ХЛАМ! КОРЕЙЦІВ ВЖЕ ПАКУЮТЬ У ЧОРНІ ПАКЕТИ!

家里的东西越扔越少了...#電車 #車文化 #跑車

家里的东西越扔越少了...#電車 #車文化 #跑車

Ollama with Vision - Enabling Multimodal RAG

Ollama with Vision - Enabling Multimodal RAG

Marker: This Open-Source Tool will make your PDFs LLM Ready

Marker: This Open-Source Tool will make your PDFs LLM Ready

Quantize any LLM with GGUF and Llama.cpp

Quantize any LLM with GGUF and Llama.cpp

Deploy AI Models to Production with NVIDIA NIM

Deploy AI Models to Production with NVIDIA NIM

GGUF quantization of LLMs with llama cpp

GGUF quantization of LLMs with llama cpp

Goodbye Text-Based RAG, Hello Vision AI: Introducing LocalGPT Vision!

Goodbye Text-Based RAG, Hello Vision AI: Introducing LocalGPT Vision!

Demo: Rapid prototyping with Gemma and Llama.cpp

Demo: Rapid prototyping with Gemma and Llama.cpp

Making Long Context LLMs Usable with Context Caching

Making Long Context LLMs Usable with Context Caching

host ALL your AI locally

host ALL your AI locally

ТЕРМІНОВО! ЗСУ взяли у полон СОЛДАТА КНДР! (ВІДЕО) ЗІЗНАННЯ корейця РВЕ МЕРЕЖУ. Послухайте, що КАЖЕ!

ТЕРМІНОВО! ЗСУ взяли у полон СОЛДАТА КНДР! (ВІДЕО) ЗІЗНАННЯ корейця РВЕ МЕРЕЖУ. Послухайте, що КАЖЕ!

Ванька пошел!!!! 🥰

Ванька пошел!!!! 🥰

Повертається до життя після пострілу снайпера у голову

Повертається до життя після пострілу снайпера у голову

😱 ТРАМП НЕ СТРИМАЄ ОБІЦЯНКУ! Неочікуване ПРОРОЦТВО | Валерій Шатилович @shatilovich_valery

😱 ТРАМП НЕ СТРИМАЄ ОБІЦЯНКУ! Неочікуване ПРОРОЦТВО | Валерій Шатилович @shatilovich_valery

爆笑電梯整蠱！今天這個妹子的自我防護意識我給100分！

爆笑電梯整蠱！今天這個妹子的自我防護意識我給100分！

Хліб возять раз на тиждень - як живуть у маленьких селах на Львівщині #shorts

Хліб возять раз на тиждень – як живуть у маленьких селах на Львівщині #shorts

Дурнєв дивиться сторіс ZОМБІ #56 (napisy PL, eng subtitles)

Дурнєв дивиться сторіс ZОМБІ #56 (napisy PL, eng subtitles)

ВЕЛИКИЙ ЕКСКЛЮЗИВ: Чому Оля Полякова більше не співає для чоловіка? Він розкаже! Вперше за 20 років

ВЕЛИКИЙ ЕКСКЛЮЗИВ: Чому Оля Полякова більше не співає для чоловіка? Він розкаже! Вперше за 20 років