Це відео не доступне.
Перепрошуємо.

Deploy Open LLMs with LLAMA-CPP Server

Поділитися
Вставка
  • Опубліковано 19 сер 2024

КОМЕНТАРІ • 12

  • @engineerprompt
    @engineerprompt  2 місяці тому

    If you want to build robust RAG applications based on your own datasets, this is for you: prompt-s-site.thinkific.com/courses/rag

  • @unclecode
    @unclecode 2 місяці тому +3

    👏 I'm glad to see you're focusing on DevOps options for AI apps. In my opinion, LlamaCpp will remain the best way to launch a production LLM server. One notable feature is its support for hardware-level concurrency. Using the `-np 4` (or `--parallel 4`) flag allows running 4 slots in parallel, where 4 can be any number of concurrent requests you want.
    One thing to remember the context window will be divided accordingly. For example, if you pass `-c 4096`, each slot will have a context size of 1024. Adding the `--n-gpu-layers` (`-ngl 99`) flag will offload the model layers to your GPU, providing the best performance. So, a command like `-c 4096 -np 4 -ngl 99` will offer excellent concurrency on a machine with a 4090 GPU.

  • @johnkost2514
    @johnkost2514 2 місяці тому

    Mozilla's Llamafile format is very flexible for deploying LLM(s) across operating systems. NIM has the advantage of bundling other types of models like audio or video.

  • @Nihilvs
    @Nihilvs 2 місяці тому +1

    amazing thanks !

  • @thecodingchallengeshow
    @thecodingchallengeshow 11 днів тому

    can we finetune it using lora? i need it to be about ai so i have doqnloded data about ai and i want to add it to this model

  • @andreawijayakusuma6008
    @andreawijayakusuma6008 Місяць тому

    bro, I wanna ask, do I need to use GPU to run this ?

    • @sadsagftrwre
      @sadsagftrwre Місяць тому

      No, llama-cpp specifically enables llms on cpus. its just going to be a bit slow, mate.

    • @andreawijayakusuma6008
      @andreawijayakusuma6008 Місяць тому +1

      @@sadsagftrwre oke thanks for the answer. I just want to tried it but afraid it won't worked without GPU.

    • @sadsagftrwre
      @sadsagftrwre Місяць тому

      @@andreawijayakusuma6008 I tried on CPU and it worked.

  • @marcaodd
    @marcaodd Місяць тому

    Which server specs did you use?

    • @engineerprompt
      @engineerprompt  Місяць тому

      Its running on A6000 with 48GB vRAM. Hope that helps.