Best OCR Models to Extract Text from Images (EasyOCR, PyTesseract, Idefics2, Claude, GPT-4, Gemini)

Поділитися
Вставка
  • Опубліковано 4 гру 2024

КОМЕНТАРІ • 23

  • @SethMitchell-b6m
    @SethMitchell-b6m Місяць тому +2

    Wow! this was EXACTLY what I was looking for. Took me going on reddit to find it lol

  • @arashpirak
    @arashpirak Місяць тому +1

    Thank you! It was useful 👌

  • @KevinCleppe-gl7hx
    @KevinCleppe-gl7hx Місяць тому +1

    awesome info!

  • @ihebennine3110
    @ihebennine3110 Місяць тому

    Thank you for the video! For better visualization of the results one could suggest making scatter plots: Accuracy vs Speed where each dot represents a model (or Accuracy vs cost), great content though!

  • @AzizKerimzhanov-ho3cb
    @AzizKerimzhanov-ho3cb Місяць тому +1

    I was doing text recognition from the images, namely serial numbers and would say that the most accurate and consistent is AWS Textract. I was comparing Gpt-4o-mini, Azure Document AI and AWS. Gpt4 sometimes misses 1 number among 16 or add additional number so the total becomes 17, sometimes instead of D letter, it puts 0 and so on (usually it only happens in the middle of the serial number, the beginning and the end was ok).
    Images were in pretty good quality (product information), AWS and Azure detected them all correctly. But AWS can retrieve info based on customer query(nlp), which is better than Azure. I tested on 16 images, where Azure and AWS detected all of them correctly, where GPT4o-mini detected correct serial number only in 10 out 17 images. Just if someone needs it.

    • @kevinwoodrobotics
      @kevinwoodrobotics  Місяць тому +1

      Thanks for sharing your experience! Definitely useful to hear these use cases

    • @rafaeel731
      @rafaeel731 19 днів тому

      GPT-4o mini uses way too many tokens too, wouldn't recommend it for vision

    • @AzizKerimzhanov-ho3cb
      @AzizKerimzhanov-ho3cb 19 днів тому

      @@rafaeel731 it took me around 1300 tokens for input, but output is around 20 tokens, so approximately 0.0015 per 1 image processing

  • @AmphibianDev
    @AmphibianDev Місяць тому +1

    How about Llama vision model?

  • @rafaeel731
    @rafaeel731 19 днів тому

    I find Gemini Flash 1.5 8B to be particularly good. It's a very new model. Plus you can't really run any LLM locally even if small; it has to be open source at least.

  • @ahmedsaliem7041
    @ahmedsaliem7041 Місяць тому +1

    Valid for Arabic text

  • @maniksarmaal
    @maniksarmaal Місяць тому

    What about PaddleOCR??

  • @tikkivolta2854
    @tikkivolta2854 22 дні тому

    very useful. unfortunately easyOCR isn't able to crack a well scanned gas station slip. it just can't.

    • @kevinwoodrobotics
      @kevinwoodrobotics  22 дні тому

      Oh man. Anything that worked for you?

    • @tikkivolta2854
      @tikkivolta2854 22 дні тому

      @@kevinwoodrobotics tried a tuned easyOCR model with at least 6 diff settings: greyscale, blur, contrast whatnot. had no idea OCR is the endgame. subsequently we need an open source sonnet. no product out there (worst is the praised AABBYY fine reader) can handle this. tried them all. turns out our brain is still the best to decipher handwriting. in the meantime i'd go for 65% accuracy and then let an LLM stitch the missing parts together. .pdf is the worst invention ever made. never meant to transport data efficiently.

    • @rafaeel731
      @rafaeel731 19 днів тому

      @@tikkivolta2854 So AABBYY fine reader was bad? A client uses FlexiCapture and it is horrible, wondering if fine reader is better