OCR Text from PDFs and Image Documents using docTR | Better than Tesseract OCR | Text Extraction

Karndeep Singh

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 9 гру 2024
OCR text extraction using docTR. OCR text output seems to be better on Table data as well. Tesseract OCR generally fails to extract the structured data.
docTR github: github.com/min...
✅Recommended Gaming Laptops For Machine Learning and Deep Learning :
👉 1. HP Pavillion (Ryzen 5 / RTX 3050) - amzn.to/3HM2hI1
👉 2. Asus TUF (Ryzen 7 / RT 3050) - amzn.to/3sISj5P
👉 3. Acer Nitro 5 (Ryzen 5/ GTX 1650) - amzn.to/3HII8mi
👉 4. Acer Nitro 5 (Intel Core i5-11th Gen/ GTX 1650) - amzn.to/3hHBAcN
👉 5. Lenovo Legion 5 (Ryzen 5/ GTX 1650) - amzn.to/3KjpB1r
✅ Best Work From Home utilities to Purchase for Data Scientist :
👉 1. Wifi Range Extender - amzn.to/3INxUCf
👉 2. Samsung LED Monitor (24 Inches) - amzn.to/35U8sN3
👉 3. Laptop Stand - amzn.to/3KhUzqS
👉 3. Office Chair - amzn.to/3IJoiZl
👉 4. Power bank - amzn.to/3IMISrQ
👉 5. Wireless Keyboard and Mouse (Without Backlit) - amzn.to/3tthnNC
👉 6. Table Lamp - amzn.to/3IJIieg
👉 7. Table - amzn.to/3tv6tXA
👉 8. Mic - amzn.to/35rnzOb
✅ Recommended Books to Read on Machine Learning And Deep Learning:
👉 1. Natural Language Processing - amzn.to/3KhqszI
👉 2. Hands-On Machine Learning with Keras and Tensorflow - amzn.to/3KddeE2
👉 3. Deep Learning with Pytorch - amzn.to/35Lk2Kd
👉 4. Practical Machine Learning for Computer Vision - amzn.to/3HFfaDz
👉 5. Applied Data Science using Pyspark - amzn.to/3sLaV5s
Connect with me on :
1. LinkedIn: / karndeepsingh
2. Telegram Group: telegram.me/da...
3. Github: www.github.com...
#datascience #nlp #deeplearning #documentunderstanding

КОМЕНТАРІ • 33

@NickWindham 2 роки тому ⁺¹
Thanks a lot for sharing this better OCR Engine
@anubhavsrivastav196 2 роки тому ⁺¹
Thanks for such an informative video.
@pranay6177 Рік тому ⁺¹
is DOC TR OCR can be used for commercial purpose.
@ramyas9837 2 роки тому ⁺¹
Thanks a lot for sharing this concept..
Can you explain about docTR training text detection and recognition
Pls
@josuedegbun6270 7 місяців тому
please can you make a video on how to fine-tune DocTr on custom dataset
@ramnivasjat6326 2 роки тому ⁺¹
not able to read pdf filr
error : module 'pypdfium2' has no attribute 'render_pdf_topil'
@robindas9474 2 роки тому ⁺¹
need to downgrade the pypdfium2.. pip install pypdfium=1.0.0
@celinesyriac6199 Рік тому ⁺¹
From where I can get the code?
@pratikshapawar-u2i Рік тому
hi..plz help me
i got this one error.... partially initialized module 'doctr.models' has no attribute 'classification' (most likely due to a circular import)
@copaceticobserver 10 місяців тому
Is there anyway to turn the exported js object/json back into a pdf?
@JaiKumar-ds2rq 2 роки тому ⁺¹
Do you have any process of getting text from different bank's passbook scans. information like Account Holder name, Accout no. Nominee Name, IFSC code. save it in the dataframe
But remember all the passbook have different layout and different clarity and quality
@karndeepsingh 2 роки тому ⁺²
You can train layout model to extract such entities from banks template
@Tamilgamesandtech 2 роки тому
@@karndeepsingh how to train a layout model karn
@Tamilgamesandtech 2 роки тому
@@karndeepsingh can we extract a only needed text from entities like (account number :12345 ) like key value pair
@gokuliveyt3564 Рік тому
i have a problem i wanted the extracted text in same format as image can you tell me how to get the structured output same as image?
@shreyajang Рік тому
hi i am facing error related to the doctr_io related
@umamaheswararaom7909 2 роки тому
Hey, how to convert if we have many individuals I'd cards in a scanned image pdf and need to convert them into excel
@karndeepsingh 2 роки тому
If you want specific things to be extracted then you can do object detection ( only if templates remains same) then apply OCR for the detected region or else First apply ocr then NER
@mushafmughal4760 Рік тому ⁺¹
Hi buddy i followed your this video "OCR Text from PDFs and Image Documents using docTR | Better than Tesseract OCR | Text Extraction" and got json file of my text present in images. now can you tell me how to get that text in to a txt file or docx file on anyother format u suggest where i can get the same structure of text like it was in the img. Also how to do that? like i tried my all possible ways but all was failures. Can you help me to get out of this problem? please its related to my fyp. Thanks in advance
@gokuliveyt3564 Рік тому
same condition i tried all the possible way too i used paddle ocr is give output in text but the problem is not giving structured manner same as image format
@felixdittrich9959 11 місяців тому
result.render() 😊 instead of .export()
@cafercalisan 7 місяців тому
can i use offline
@jaikumardaiya4503 2 роки тому
What about after extract the text , could you please show us storing values in excel file or in dataframe
@karndeepsingh 2 роки тому ⁺²
Once you have JSON output, you can format the output in any format
@giritejareddy8195 2 роки тому
Hey did you try replacing different extraction algorithms like Master,sar_resnet31 I tried it's not working they didn't release those models as open source?
@karndeepsingh 2 роки тому
Haven’t tried with different variation of models but it should work.
@venkateshvanka8964 Рік тому
Thanks for the video. When I try to install doctr on Jupyter, I get the following error :
OSError: cannot load library 'gobject-2.0-0': error 0x7e. Additionally, ctypes.util.find_library() did not manage to locate a library called 'gobject-2.0-0'
However, I am able to install on Google Colab. Any help with the Jupyter installation would be a great help !!
@karndeepsingh Рік тому
May be there are some dependencies changes that might have happened.
You can try to install old versions of OCR
@GuruTechHub 2 роки тому
hi. please make video on extract hindi table contains text in devnagri or utf-8 to csv from images. i try lot on inter but not found any video or method.. please make video on this it will help lot
@karndeepsingh 2 роки тому
Sure.
@machinelearningzone.6230 2 роки тому
Nice Video,could you please tag the colab notebook link ?
I am facing an error ' pypdfium2 --> AttributeError: module 'pypdfium2' has no attribute 'render_pdf_topil'. i even down graded pypdfium2 to 1.0.0 without any solution.Could you shed some light on it?
thanks
@bruhm0ment767 2 роки тому ⁺¹
Hey, did you find any solution yet?
@mrityunjaykarmankar9239 Рік тому
Code

Наступне

Автоматичне відтворення

Extract Key Information from Documents using LayoutLM | LayoutLM Fine-tuning | Deep Learning