Can the Ollama API be slower than the CLI

Поділитися
Вставка
  • Опубліковано 10 вер 2024

КОМЕНТАРІ • 15

  • @eldaria
    @eldaria 9 днів тому

    I came to your video searching for this, and it makes so much sense, thanks.

  • @albertbozesan
    @albertbozesan 24 дні тому

    This is such a specific question, I never thought I'd get such a good video about it! Thanks!

  • @vexy1987
    @vexy1987 23 дні тому

    I have been having this "issue" with open webui. Thanks for clearing that up, I thought I had something setup incorrectly.

  • @fabriai
    @fabriai 26 днів тому

    As usual, wonderful video, Matt. This is exactly the kind of question that I make to myself when learning ollama. And here’s the answer on a silver tray.

  • @SlykeThePhoxenix
    @SlykeThePhoxenix 26 днів тому +1

    So I made a Discord bot that streams from the Ollama API to Discord. Discord has an API rate limit of like 3 requests per second, so I had to buffer all the stream payloads in memory and dump them in chunks to Discord. I did this in NodeRed if anyone wants the code. I had to implement the HTTP client again from the TCP layer to support streaming, but have wrapped it up nicely into a single function node. I should also mention that it supports multiple conversations concurrently (without mixing up the streams).

    • @technovangelist
      @technovangelist  26 днів тому

      If you are getting rate limited at 3/s your code is probably doing something wrong. Discord allows 50 requests/sec. I guess unless you have lots of bots.

    • @SlykeThePhoxenix
      @SlykeThePhoxenix 26 днів тому +1

      @@technovangelist This is the only bot I have on my test server, lol. It's either discord or the nodered plugin for Discord. It's definitely around 3/s. It's possible it's because it's an unapproved bot that I just use for testing and the rate limit is lifted when your bot is approved (This could be to help prevent abuse).

  • @twilkpsu
    @twilkpsu 26 днів тому

    Great educational content. Bravo! 🎉🎉🎉

  • @YeryBytes
    @YeryBytes 25 днів тому

    Can you explain why the Windows Executable is significantly slower than when running in WSL? I also found that when running Ollama on Docker with WSL2 backend is faster than just running in WSL. Why!?

    • @technovangelist
      @technovangelist  25 днів тому

      Running on windows native in most cases is 10-15% faster than using wsl. If it’s not there is something wrong with the install

  • @UnwalledGarden
    @UnwalledGarden 26 днів тому

    Keep up the great myth busting.

  • @romulopontual6254
    @romulopontual6254 27 днів тому

    When accessing Ollama via the API, can we set keep alive to forever? If yes would it prevent the API from later switching models?

    • @technovangelist
      @technovangelist  27 днів тому +6

      You can set it to -1 which will keep it in memory until you run out and change models.

  • @yuvrajkukreja9727
    @yuvrajkukreja9727 10 днів тому

    any samel code ???