Thank you for the video! For better visualization of the results one could suggest making scatter plots: Accuracy vs Speed where each dot represents a model (or Accuracy vs cost), great content though!
I was doing text recognition from the images, namely serial numbers and would say that the most accurate and consistent is AWS Textract. I was comparing Gpt-4o-mini, Azure Document AI and AWS. Gpt4 sometimes misses 1 number among 16 or add additional number so the total becomes 17, sometimes instead of D letter, it puts 0 and so on (usually it only happens in the middle of the serial number, the beginning and the end was ok). Images were in pretty good quality (product information), AWS and Azure detected them all correctly. But AWS can retrieve info based on customer query(nlp), which is better than Azure. I tested on 16 images, where Azure and AWS detected all of them correctly, where GPT4o-mini detected correct serial number only in 10 out 17 images. Just if someone needs it.
I find Gemini Flash 1.5 8B to be particularly good. It's a very new model. Plus you can't really run any LLM locally even if small; it has to be open source at least.
@@kevinwoodrobotics tried a tuned easyOCR model with at least 6 diff settings: greyscale, blur, contrast whatnot. had no idea OCR is the endgame. subsequently we need an open source sonnet. no product out there (worst is the praised AABBYY fine reader) can handle this. tried them all. turns out our brain is still the best to decipher handwriting. in the meantime i'd go for 65% accuracy and then let an LLM stitch the missing parts together. .pdf is the worst invention ever made. never meant to transport data efficiently.
Wow! this was EXACTLY what I was looking for. Took me going on reddit to find it lol
Glad to hear!
Thank you! It was useful 👌
Glad to hear
awesome info!
Thanks!
Thank you for the video! For better visualization of the results one could suggest making scatter plots: Accuracy vs Speed where each dot represents a model (or Accuracy vs cost), great content though!
Yes great idea!
I was doing text recognition from the images, namely serial numbers and would say that the most accurate and consistent is AWS Textract. I was comparing Gpt-4o-mini, Azure Document AI and AWS. Gpt4 sometimes misses 1 number among 16 or add additional number so the total becomes 17, sometimes instead of D letter, it puts 0 and so on (usually it only happens in the middle of the serial number, the beginning and the end was ok).
Images were in pretty good quality (product information), AWS and Azure detected them all correctly. But AWS can retrieve info based on customer query(nlp), which is better than Azure. I tested on 16 images, where Azure and AWS detected all of them correctly, where GPT4o-mini detected correct serial number only in 10 out 17 images. Just if someone needs it.
Thanks for sharing your experience! Definitely useful to hear these use cases
GPT-4o mini uses way too many tokens too, wouldn't recommend it for vision
@@rafaeel731 it took me around 1300 tokens for input, but output is around 20 tokens, so approximately 0.0015 per 1 image processing
How about Llama vision model?
Would be cool to test
I find Gemini Flash 1.5 8B to be particularly good. It's a very new model. Plus you can't really run any LLM locally even if small; it has to be open source at least.
Oh good to know!
Valid for Arabic text
What about PaddleOCR??
Yes I heard it’s good. Will evaluate it
very useful. unfortunately easyOCR isn't able to crack a well scanned gas station slip. it just can't.
Oh man. Anything that worked for you?
@@kevinwoodrobotics tried a tuned easyOCR model with at least 6 diff settings: greyscale, blur, contrast whatnot. had no idea OCR is the endgame. subsequently we need an open source sonnet. no product out there (worst is the praised AABBYY fine reader) can handle this. tried them all. turns out our brain is still the best to decipher handwriting. in the meantime i'd go for 65% accuracy and then let an LLM stitch the missing parts together. .pdf is the worst invention ever made. never meant to transport data efficiently.
@@tikkivolta2854 So AABBYY fine reader was bad? A client uses FlexiCapture and it is horrible, wondering if fine reader is better