Try llama.cpp with alpaca-lora-30B-ggml

Поділитися
Вставка
  • Опубліковано 29 січ 2025

КОМЕНТАРІ • 13

  • @BB-ko3fh
    @BB-ko3fh Рік тому +1

    is there a particular reason why they transferred the model to c++ (newbie question) other than to make the model smaller

    • @TMichael66
      @TMichael66 Рік тому +2

      C++ allows the entire model to be loaded into regular RAM. This is helpful for those of us without beefy GPUs.

    • @JackLin-ct3wx
      @JackLin-ct3wx  Рік тому

      There's a 7B model, which takes up only 4GB memory. But I was not sure if 7B can work at that time because there was a breaking change in this project. So the authors not only run them using C++ but also make smaller models.

    • @Phasma6969
      @Phasma6969 Рік тому

      Because otherwise you will have to use a GPU which uses different RAM (VRAM) compared to system RAM. You can also get more RAM for less money than multiple GPUs. Most consumer GPUs have very little VRAM, on average 4-8GB, which isn't enough usually. Although GPU is much, much faster than CPU inference as you get the parallel compute with higher floating point precision for next token predictions.

    • @samas69420
      @samas69420 Рік тому

      Everyone is saying that it is because in this way you can load the model in regular ram, but if I'm not mistaken pytorch already has this feature and so you don't need to reimplement everything in cpp if you only care about where to load the model, instead i think the difference here is that you need to reimplement stuff if you want to use custom protocols or formats (like in this case with the ggml format) and control how they are managed at low level to have more efficiency so i guess that's the main reason

  • @csabaczcsomps7655
    @csabaczcsomps7655 Рік тому

    How quit from chat, is ask ai and he say Ctrl+t but not work, finally I close the window o prompt, but I think can be quit somehow?

    • @JackLin-ct3wx
      @JackLin-ct3wx  Рік тому

      Just press Ctrl+C for 2 or 3 times (in case the prompt didn't catch it), which is the termination signal in Linux.

  • @joakimjocka8022
    @joakimjocka8022 Рік тому

    Did you slow down this video ?

  • @misterxxxxxxxx
    @misterxxxxxxxx Рік тому

    how did you transform the model ? (.tmp ?) I get a too old, regenerate your model files or convert.. error when trying to use it...

    • @JackLin-ct3wx
      @JackLin-ct3wx  Рік тому

      I followed the comment github.com/ggerganov/llama.cpp/issues/382#issuecomment-1479091459 to transform the model.

    • @JackLin-ct3wx
      @JackLin-ct3wx  Рік тому

      But I notice there are some newer alpaca lora projects with more user-friendly setup like github.com/nomic-ai/gpt4all. Maybe you can try it.

  • @Patrick-rj8gh
    @Patrick-rj8gh Рік тому

    Your computer be slow.