Document Querying with Qwen2-VL-7B and JSON Output

Поділитися
Вставка
  • Опубліковано 8 лис 2024

КОМЕНТАРІ • 20

  • @kenchang3456
    @kenchang3456 Місяць тому

    That's impressive accuracy, thanks for showing this. I wonder how it would do if I wanted to add fields that are use case specific? I'll have to give it a try for sure. Thanks again.

  • @harrykekgmail
    @harrykekgmail Місяць тому +1

    Fantastic! Thanks very much

  • @hadyanpratama
    @hadyanpratama 23 дні тому

    Hi thank you for your amazing video. Do you know how to fine tune the qwen2 for this case using our own dataset? Thanks!

    • @AndrejBaranovskij
      @AndrejBaranovskij  22 дні тому +1

      Hi, I may sound unpopular - but I believe in most cases fine-tuning is not required. Qwen2-VL model is general enough to handle various use cases out of the box.

  • @kareemyoussef2304
    @kareemyoussef2304 Місяць тому

    How would this handle a PDF consisting of images/diagrams? E.g technical documentation

    • @AndrejBaranovskij
      @AndrejBaranovskij  Місяць тому

      You can try yourself using sample HF space for this model: huggingface.co/spaces/GanymedeNil/Qwen2-VL-7B

  • @hsnavas
    @hsnavas Місяць тому

    Which OCR do u recommend to use along with this model for hand written dara extraction. I used tesseract the results are not promising.

    • @AndrejBaranovskij
      @AndrejBaranovskij  Місяць тому +2

      Qwen2 Vision LLM handles OCR out of the box, you dont need separate OCR.

    • @hsnavas
      @hsnavas Місяць тому

      @@AndrejBaranovskij thank you.
      So if I need to do hand written extraction how can we achieve that. Do we need to use an oct or will it be handled out of box

    • @hsnavas
      @hsnavas Місяць тому

      Also would like to know if I can train this model with hand written docs.
      I can share few docs if required.

    • @AndrejBaranovskij
      @AndrejBaranovskij  Місяць тому

      @@hsnavas It should work out of the box with vision LLM as described in this video.

    • @AndrejBaranovskij
      @AndrejBaranovskij  Місяць тому +1

      @@hsnavas Normally you dont need to train vision LLM, it already will know how to recognize hand written text

  • @cristiantironi296
    @cristiantironi296 День тому

    Hey great video! I have always the problem that my colab run out of memory even if i am running on A100 , tried also your notebook but always the same at
    # Inference: Generation of the output
    generated_ids = model.generate(**inputs, max_new_tokens=1024)
    do you know any solution?

    • @AndrejBaranovskij
      @AndrejBaranovskij  20 годин тому

      Hey, I was facing this issue, when input image resolution was too big. It works better, when resolution is resized to max_width=1250, max_height=1750

    • @cristiantironi296
      @cristiantironi296 18 годин тому

      @@AndrejBaranovskij Thanks u very much , i had to split RAG model to retrieve the page number in one iteration and then try to apply the retrieved image and text to vml to generate the answer.... and i must resized to max_width=600, max_height=800 and still i was using 33 out of 40 available RAM.
      Do you know how can i improve the use of my RAM to use less
      Still thanks a lot

    • @AndrejBaranovskij
      @AndrejBaranovskij  16 годин тому

      @@cristiantironi296 Don't know about RAM improvement. But in general, I always try to use one iteration only - get all page data with Visual LLM and then process this data without LLM, using my own code. In case of multipage doc, splitting it into pages and processing each page separately. Afterwards merging results.

  • @harunulrasheedshaik5879
    @harunulrasheedshaik5879 Місяць тому

    Could you please share invoice document?

    • @AndrejBaranovskij
      @AndrejBaranovskij  Місяць тому

      Sample doc is inside Sparrow repo: github.com/katanaml/sparrow/tree/main/sparrow-ml/llm/data