On-Device LLM Inference at 600 Tokens/Sec.: All Open Source

Поділитися
Вставка
  • Опубліковано 23 жов 2024

КОМЕНТАРІ • 21

  • @akj3344
    @akj3344 6 місяців тому +2

    For people facing this error: Failed to initialise task...
    You have to enable webgpu. Then your application will work.

    • @toniramchandani
      @toniramchandani 6 місяців тому

      Same error, working on a windows 10 machine not sure what's the problem. Messaged Sonu on LinkedIn with screenshot

    • @akj3344
      @akj3344 6 місяців тому

      @@toniramchandani I'd suggest using chrome for this project. Your OS doesnt matter. Also check your browser console for errors.

    • @toniramchandani
      @toniramchandani 6 місяців тому

      @@akj3344 I am using chrome only I just updating on the system

    • @toniramchandani
      @toniramchandani 6 місяців тому

      Hey it's working now, i had a very stupid small error.
      There is a saying don't commit small crimes I did it here 😎😄

    • @kushis4ever
      @kushis4ever Місяць тому

      what is webgpu? how to enable it?

  • @janux.
    @janux. 6 місяців тому +1

    It works on desktop , but not working on Android chrome , any possibility? Tried spinning up server both in flask and node express , in termux and other pydroid IDE . Please advice .

  • @janux.
    @janux. 5 місяців тому

    Hello, great tutorial, a quick question, is it possible to do function calling with this LLM which is using mediapipe inferencing. By creating a customLLM using Langchain or other frameworks

  • @Epirium
    @Epirium 6 місяців тому

    Will this stress client side, i mean when i tested on my browser, my browser hangs for 2 to 4 seconds and then okay

  • @sheikhakbar2067
    @sheikhakbar2067 6 місяців тому

    Could anyone recommend an inference solution for fine-tuned models ... Yes, I heard of qroq, but I want make use of the models I fine-tuned... I have access to Google Colab Pro, I intend to use something other than my fine-tuned models for production. The models I've fine-tuned are based on llama3-8b using the unsloth library... I tried VLLM, but it crashed.

  • @Vvvv-oh6pe
    @Vvvv-oh6pe 6 місяців тому

    Hey buddy does it run on just cpu?

  • @fakhrullo5836
    @fakhrullo5836 6 місяців тому

    it keeps saying: Failed to initialize the task. I tried both GPU and CPU models, the same error keeps repeating.

  • @HeRaUSA
    @HeRaUSA 6 місяців тому

    All such LLM cannot create an output form where any product description update information will be written.

    • @HeRaUSA
      @HeRaUSA 6 місяців тому

      If you can do this I will pay you

  • @snehaarora-f8y
    @snehaarora-f8y 6 місяців тому +1

    can we use it as talk with my pdf, i have 5000+ pages pdf

    • @KhajaMD143
      @KhajaMD143 6 місяців тому

      Better use case for Google Gemini

    • @NLPprompter
      @NLPprompter 6 місяців тому

      try ollama + anythingllm with gemma or anything small model, but i don't think 5000+ pdf can be done since... token limit...

  • @user4-j1w
    @user4-j1w 6 місяців тому

    U r fast bro

  • @laboninath
    @laboninath 4 місяці тому

    Sort