omg. I was watching your video to install Tesseract. Meanwhile, I was amazed that you can read Korean. I thought you chose a random non-english language to prove Tesseract works with different language. Amazed as a Korean. I am trying to learn how OCR works because I want to make an app that requires OCR. But I have no coding experience or anything even close to digital languages, I am having some difficulties. At least I was able to use Tesseract after watching this video. Thank you so much!
Thanks for this tutorial: I have had trouble with converting text in mayan language here in Guatemala, I followed your steps and voila! Next step for me is to figure out how to train a set of recognition for our local mayan alphabets. Thanks a lot.
FYI, If we never add anything to PATH other than default one, it will not pup-up that edit selection box. So by looking your video, i need to manually make the entry by separating new one with ";" (semicolon) Afterwards, if i click the edit button, i get the same pop up edit box.
Hi, a very good tutorial, but as mentioned by yourself, and a comment by another, ref batch folder/file processing , I can not see or find any uploaded tutorial video ?????
Alright, alright, I got that to work. Now I am wondering how you write the code to make it run all the pngs at once instead of having to do each one line by line, one at a time.
Your instructions are phenomenal. You are amazing to explain computer commands and tricks. The only problem is that this program sucks and it is a nightmare to use it Its not your fault. Thanks so much for teaching so many tricks.
a video on tips on how to train tesseract would be great! anyway thanks a lot for this video so far! helpful for my first steps and really appreciated! I'm wondering if someone has already done -as something more looking like a sort of end user application rather than an in-the-field programmer use - (or eventually how to do it ) 1) an overlay of the pictured document and the ocr recognition in such a way that the original document remain displayed as it is but "highlight-able " or 2) aslo how to generate a parallel ocr document which keeps the letter positioning and layout in the space page of the ocr output like on the original picture and in case of a document keep the original cutted picture in case of difficulties and low confidence level in the recognition. like for example on graphs pictures drawings...
Hi, we are looking for some knowledgeable with OCR, specifically for text from a Video feed. The text would appear most often distorted, non-horizontal and sometimes wrapped or partially wrapped. The text to be read is strictly a short sequence of number and/or letters. There can be multiple variations of those sequences in the same image. Contact me that rings your bell :)
Thanks for the information. How can I install additional languages to the ones you sample? Maybe you already said it but my English is not very good and I didn't listen to it.
lstm_recognizer_->DeSerialize(&fp):Error:Assert failed:in file ../../../../ccmain/tessedit.cpp, line 193 i got the above error when try to perform tesseract.exe 3.jpeg ..\out1.txt -l ben plz help me out
Nice tutorial, makes everything nice and simple to handle - On another note, I want to call the tesseract.exe file from a .NET application that has just taken an image of some text, is there a way to get the output of the OCR as a string in the console? Or would I have to wait until the character recognition has completed, then go and read that text file at a later time?
Not sure if you will answer to this but i'd love if you could help me doing the powershell/batch code you spoke about at the end to make it work on a hole file. I'm currently trying but not success yet. Good video btw !
Thank you very much for your nice tutorial. Buy I would like to help with you that how to use this tesseract ocr without power she'll. How can I have can I use this very easy way that is either the first year I take the PNG or image then how to use is the tesseract another way so that I can easily without any complexity. After installation the it is a vector and the language platform how I can use this very easy way from the text and from the image.
Excellent video, however, my output was dreadful. English, clear to see and it rendered about 90% fine, however, there are wingding style artefacts all over the place. A bit pants really. Can also render as different file formats with some more easily readable formatting (.odt) etc etc Will look for an alternative to compare against
If you'll be using the same types of input, you may want to train a new classifier on your specific dataset. For a random image 90% is not bad. I would make a filter script to clean the text and remove wingdings, etc.
Congratulations on the video. I'm from Rio de Janeiro - Brazil. Great accent in English! Can we work with tesseract with PHP? By the way what's your name?
good info, but it would much better if the author could make a condensed video. He has repeated same info or provided unnecessary info at multiple places
Yes, there are a lot of situations where the current training will not work. You may need to create a training set based on the problems you are working on, and retrain tesseract with your problem set. I'm working on a video to make custom training sets for tesseract.
Default models are so-so. You'll definitely need to train on your specific problem. I've used default models for general ocr where high error wasn't a problem.
This is really good tutorial. I appreciate the care you took in going step by step, especially through altering the path.
This is the most helpful tutorial on Tesseract that I've found. Thank you.
omg. I was watching your video to install Tesseract. Meanwhile, I was amazed that you can read Korean. I thought you chose a random non-english language to prove Tesseract works with different language. Amazed as a Korean.
I am trying to learn how OCR works because I want to make an app that requires OCR. But I have no coding experience or anything even close to digital languages, I am having some difficulties. At least I was able to use Tesseract after watching this video. Thank you so much!
Very very good tutorial for tessseract for koreans and clear pronunciation. Thank you.
keep making these videos man! interesting content
Thanks for this tutorial: I have had trouble with converting text in mayan language here in Guatemala, I followed your steps and voila!
Next step for me is to figure out how to train a set of recognition for our local mayan alphabets.
Thanks a lot.
Did you get to train it for a different alphabet? Can you help me? I'm trying to get OCR working for IPA characters recognition
Your voice makes me happy to browse youtube, so clear fuark
FYI, If we never add anything to PATH other than default one, it will not pup-up that edit selection box.
So by looking your video, i need to manually make the entry by separating new one with ";" (semicolon)
Afterwards, if i click the edit button, i get the same pop up edit box.
Can you tell how to train our own dataset ??
Thanks a lot for this but can i use this for manuscripts as well? And if so plz tell me how :)
Hi, a very good tutorial, but as mentioned by yourself, and a comment by another, ref batch folder/file processing , I can not see or find any uploaded tutorial video ?????
thanks so much. easy to understand and so helpful. you're a legend
you can change your pdf to a one tiff file instead of converting it to several png files
How did you turn each page of the pdf into pngs? Thank you for this high-quality video.
Alright, alright, I got that to work. Now I am wondering how you write the code to make it run all the pngs at once instead of having to do each one line by line, one at a time.
Hey there, you can use Snip & Sketch on Windows. I'm making a guide on just that currently.
Thanks for no bs tutorial!
thank you that was very helpful:-D
Glad it helped!
It helped me a lot. Thank you very much
nice video, it's what I'm looking for , So, thank you very much!😀
I have photographs of people with the date printed below, can this solution extract the date? I need to do this for 1000s of photos. (batch)
Hi, how do link to the batch folder converting thingy?
Interestingly enough, the default install path for the Windows x64 version is:
C:\Users\username\AppData\Local\Programs\Tesseract-OCR
Your instructions are phenomenal. You are amazing to explain computer commands and tricks. The only problem is that this program sucks and it is a nightmare to use it
Its not your fault. Thanks so much for teaching so many tricks.
thats a good video
but, how to preprocess the input image and then pass through tesseract
can u please help on it ASAP
제가 찾던 동영상이네요 고맙습니다. ^^
So, should I do it one by one? I have complete books, is there no way to do this for several images?
a video on tips on how to train tesseract would be great! anyway thanks a lot for this video so far! helpful for my first steps and really appreciated!
I'm wondering if someone has already done -as something more looking like a sort of end user application rather than an in-the-field programmer use - (or eventually how to do it ) 1) an overlay of the pictured document and the ocr recognition in such a way that the original document remain displayed as it is but "highlight-able " or 2) aslo how to generate a parallel ocr document which keeps the letter positioning and layout in the space page of the ocr output like on the original picture and in case of a document keep the original cutted picture in case of difficulties and low confidence level in the recognition. like for example on graphs pictures drawings...
Really good tutorial, clear.
I need help with mixed language pdfs - English and Ancient Greek. Also, I would like to target positions within the image taken from a pdf file.
Have you maybe tried out wether it also works with handwritten texts?
Hand-written text (block letters) will work, but not be very accurate. Ideally, Tesseract should be re-trained on whatever font you are focused on.
@@DFIRScience I see, thank you very much!
Hi, we are looking for some knowledgeable with OCR, specifically for text from a Video feed. The text would appear most often distorted, non-horizontal and sometimes wrapped or partially wrapped. The text to be read is strictly a short sequence of number and/or letters. There can be multiple variations of those sequences in the same image. Contact me that rings your bell :)
Thanks bro it is really helpful
Thanks a lot! I appreciate it.
im kind of skeptical of allowing changes to hardware. is it completely safe?
Thanks for the information.
How can I install additional languages to the ones you sample? Maybe you already said it but my English is not very good and I didn't listen to it.
Thanks a lot. How can I add a new language after the installation?
Hi sir
Much needed video..
Can u tell me how to train tesseract to identify specific font
Hi..Please guide me how I can retrieve the coordinate positions of the word that I retrieved from the image..
i usually play trivia games and i want to use it there can u plz try to make a video on that?
Basicly nice Video. But why you open and use PowerSHELL ISE, and then don't use anything from Powershell?
What mic are you using? Great video, thanks!
lstm_recognizer_->DeSerialize(&fp):Error:Assert failed:in file ../../../../ccmain/tessedit.cpp, line 193
i got the above error when try to perform
tesseract.exe 3.jpeg ..\out1.txt -l ben
plz help me out
try completely uninstalling and dowloading a updated version :v
hope it helps
Great tutorial! thx
Nice tutorial, makes everything nice and simple to handle - On another note, I want to call the tesseract.exe file from a .NET application that has just taken an image of some text, is there a way to get the output of the OCR as a string in the console? Or would I have to wait until the character recognition has completed, then go and read that text file at a later time?
Yeah, I'm pretty sure you have to read the file after. I'll check if you can output to pipe.
How do u train the tesseract? Can u point me in the right direction with something I can use?
I'll try to do a video about that shortly. Until then you can check the documentation here: github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract
Can u pls help me in training tesseract..,, for the sake of public help.. I will be very thankful to you
What a great job.
Not sure if you will answer to this but i'd love if you could help me doing the powershell/batch code you spoke about at the end to make it work on a hole file. I'm currently trying but not success yet. Good video btw !
Hey there. Sure, I can help with that. I'll post back after recording.
@@DFIRScience did you make a tutorial for training the ocr to get another alphabet? I'm trying to get it to work with IPA
Hi, please share this pdf file to download.
Can I actually use this to categorize a file into different folders? Btw, im using php so i dont know if it will work
can you please do the batch file video?
how to train the new language which is not in the language list
How did you manage to get such fast results? It is taking me at least 15 seconds to OCR a full page...
The quality of your image will make a difference. Try around 300dpi. That will give you good recognition but should reduce processing time.
what happen with the tutorial to make your own datatrainer :(
hey, does anyone know how to scan multiple pictures in one go and measure the amount of time taken for the same?
Thanks for the great video
Thanks a lot, brother
Thank you very much for your nice tutorial. Buy I would like to help with you that how to use this tesseract ocr without power she'll. How can I have can I use this very easy way that is either the first year I take the PNG or image then how to use is the tesseract another way so that I can easily without any complexity. After installation the it is a vector and the language platform how I can use this very easy way from the text and from the image.
please help me find how can i use it on MAC
pleeeeease
can we convert captcha image into text
형님 감사합니다.
the moment i type tesseract.exe --help, it opens the exe for installation ,don't know why
Try uninstalling, and downloading the installer from here: digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.01.exe
Excellent video, however, my output was dreadful.
English, clear to see and it rendered about 90% fine, however, there are wingding style artefacts all over the place. A bit pants really.
Can also render as different file formats with some more easily readable formatting (.odt) etc etc
Will look for an alternative to compare against
If you'll be using the same types of input, you may want to train a new classifier on your specific dataset. For a random image 90% is not bad. I would make a filter script to clean the text and remove wingdings, etc.
hello, i want to use another language in tesseract
can do this with a captcha image??????
greet vidoe very clear .
you have a vidoe on how to train tesseract
?
please it can be very useful for me
how to convert multiple images from the folder. without giving image name one by one.
is there is any commend to do it.?
Hey there, you can use Snip & Sketch on Windows. I'm making a guide on just that currently.
How can I increase the accuracy?
You will need to retrain the model based on your specific problem. I'm working on a video for training tesseract.
Congratulations on the video. I'm from Rio de Janeiro - Brazil. Great accent in English! Can we work with tesseract with PHP?
By the way what's your name?
Wanted this same thing using java ..Please help!!!!
Warning. Invalid resolution 0 dpi. Using 70 instead and blank text comes. please help
What is your input file? JPEG? PNG?
Png
You might try the solution here: stackoverflow.com/questions/42990139/tesseract-ocr-how-do-i-improve-result
Thanks . It worked
good info, but it would much better if the author could make a condensed video. He has repeated same info or provided unnecessary info at multiple places
Thanks!
using powershell ? so its not really for windows? this is DOS.
Did you ever make a powershell script?
Sir ocr can extract text from video ?
unfortunately no, but if you extract the frames and turn them into individual pictures, you can then execute the program and get the .txt files :3
first how to create pdf to images
I've used pdftoppm.exe from poppler. Works very well.
but this is not detecting text from product images
Yes, there are a lot of situations where the current training will not work. You may need to create a training set based on the problems you are working on, and retrain tesseract with your problem set. I'm working on a video to make custom training sets for tesseract.
also one has to set TESSDATA_PREFIX to "installdir\tessdata"
tnx a lot
btw default windows ocr better than tesseract in my language
Tesseract is crud... Use Tabula and PDF's... You can select your tables also...
It doesn't appear that tesseract is any good
Default models are so-so. You'll definitely need to train on your specific problem. I've used default models for general ocr where high error wasn't a problem.
Korean?
Wonderful Dad!!..lol
so it is easy to use to everyone and I am the one who is freaking out?!
Suzy!!!!
tesseract 0001.jpg -l eng
Tesseract OCR is terrible.
how in all dir by one click