omg. I was watching your video to install Tesseract. Meanwhile, I was amazed that you can read Korean. I thought you chose a random non-english language to prove Tesseract works with different language. Amazed as a Korean. I am trying to learn how OCR works because I want to make an app that requires OCR. But I have no coding experience or anything even close to digital languages, I am having some difficulties. At least I was able to use Tesseract after watching this video. Thank you so much!
Very very good tutorial for tessseract for koreans and clear pronunciation. Thank you.
4 роки тому
Thanks for this tutorial: I have had trouble with converting text in mayan language here in Guatemala, I followed your steps and voila! Next step for me is to figure out how to train a set of recognition for our local mayan alphabets. Thanks a lot.
FYI, If we never add anything to PATH other than default one, it will not pup-up that edit selection box. So by looking your video, i need to manually make the entry by separating new one with ";" (semicolon) Afterwards, if i click the edit button, i get the same pop up edit box.
Hi, a very good tutorial, but as mentioned by yourself, and a comment by another, ref batch folder/file processing , I can not see or find any uploaded tutorial video ?????
Alright, alright, I got that to work. Now I am wondering how you write the code to make it run all the pngs at once instead of having to do each one line by line, one at a time.
a video on tips on how to train tesseract would be great! anyway thanks a lot for this video so far! helpful for my first steps and really appreciated! I'm wondering if someone has already done -as something more looking like a sort of end user application rather than an in-the-field programmer use - (or eventually how to do it ) 1) an overlay of the pictured document and the ocr recognition in such a way that the original document remain displayed as it is but "highlight-able " or 2) aslo how to generate a parallel ocr document which keeps the letter positioning and layout in the space page of the ocr output like on the original picture and in case of a document keep the original cutted picture in case of difficulties and low confidence level in the recognition. like for example on graphs pictures drawings...
Nice tutorial, makes everything nice and simple to handle - On another note, I want to call the tesseract.exe file from a .NET application that has just taken an image of some text, is there a way to get the output of the OCR as a string in the console? Or would I have to wait until the character recognition has completed, then go and read that text file at a later time?
Your instructions are phenomenal. You are amazing to explain computer commands and tricks. The only problem is that this program sucks and it is a nightmare to use it Its not your fault. Thanks so much for teaching so many tricks.
Thanks for the information. How can I install additional languages to the ones you sample? Maybe you already said it but my English is not very good and I didn't listen to it.
Not sure if you will answer to this but i'd love if you could help me doing the powershell/batch code you spoke about at the end to make it work on a hole file. I'm currently trying but not success yet. Good video btw !
Hi, we are looking for some knowledgeable with OCR, specifically for text from a Video feed. The text would appear most often distorted, non-horizontal and sometimes wrapped or partially wrapped. The text to be read is strictly a short sequence of number and/or letters. There can be multiple variations of those sequences in the same image. Contact me that rings your bell :)
lstm_recognizer_->DeSerialize(&fp):Error:Assert failed:in file ../../../../ccmain/tessedit.cpp, line 193 i got the above error when try to perform tesseract.exe 3.jpeg ..\out1.txt -l ben plz help me out
Thank you very much for your nice tutorial. Buy I would like to help with you that how to use this tesseract ocr without power she'll. How can I have can I use this very easy way that is either the first year I take the PNG or image then how to use is the tesseract another way so that I can easily without any complexity. After installation the it is a vector and the language platform how I can use this very easy way from the text and from the image.
Yes, there are a lot of situations where the current training will not work. You may need to create a training set based on the problems you are working on, and retrain tesseract with your problem set. I'm working on a video to make custom training sets for tesseract.
Congratulations on the video. I'm from Rio de Janeiro - Brazil. Great accent in English! Can we work with tesseract with PHP? By the way what's your name?
Excellent video, however, my output was dreadful. English, clear to see and it rendered about 90% fine, however, there are wingding style artefacts all over the place. A bit pants really. Can also render as different file formats with some more easily readable formatting (.odt) etc etc Will look for an alternative to compare against
If you'll be using the same types of input, you may want to train a new classifier on your specific dataset. For a random image 90% is not bad. I would make a filter script to clean the text and remove wingdings, etc.
good info, but it would much better if the author could make a condensed video. He has repeated same info or provided unnecessary info at multiple places
Default models are so-so. You'll definitely need to train on your specific problem. I've used default models for general ocr where high error wasn't a problem.
This is really good tutorial. I appreciate the care you took in going step by step, especially through altering the path.
This is the most helpful tutorial on Tesseract that I've found. Thank you.
omg. I was watching your video to install Tesseract. Meanwhile, I was amazed that you can read Korean. I thought you chose a random non-english language to prove Tesseract works with different language. Amazed as a Korean.
I am trying to learn how OCR works because I want to make an app that requires OCR. But I have no coding experience or anything even close to digital languages, I am having some difficulties. At least I was able to use Tesseract after watching this video. Thank you so much!
Very very good tutorial for tessseract for koreans and clear pronunciation. Thank you.
Thanks for this tutorial: I have had trouble with converting text in mayan language here in Guatemala, I followed your steps and voila!
Next step for me is to figure out how to train a set of recognition for our local mayan alphabets.
Thanks a lot.
Did you get to train it for a different alphabet? Can you help me? I'm trying to get OCR working for IPA characters recognition
keep making these videos man! interesting content
Your voice makes me happy to browse youtube, so clear fuark
FYI, If we never add anything to PATH other than default one, it will not pup-up that edit selection box.
So by looking your video, i need to manually make the entry by separating new one with ";" (semicolon)
Afterwards, if i click the edit button, i get the same pop up edit box.
Can you tell how to train our own dataset ??
Thanks a lot for this but can i use this for manuscripts as well? And if so plz tell me how :)
Hi, a very good tutorial, but as mentioned by yourself, and a comment by another, ref batch folder/file processing , I can not see or find any uploaded tutorial video ?????
How did you turn each page of the pdf into pngs? Thank you for this high-quality video.
Alright, alright, I got that to work. Now I am wondering how you write the code to make it run all the pngs at once instead of having to do each one line by line, one at a time.
Hey there, you can use Snip & Sketch on Windows. I'm making a guide on just that currently.
Thanks for no bs tutorial!
im kind of skeptical of allowing changes to hardware. is it completely safe?
So, should I do it one by one? I have complete books, is there no way to do this for several images?
What mic are you using? Great video, thanks!
you can change your pdf to a one tiff file instead of converting it to several png files
thats a good video
but, how to preprocess the input image and then pass through tesseract
can u please help on it ASAP
Have you maybe tried out wether it also works with handwritten texts?
Hand-written text (block letters) will work, but not be very accurate. Ideally, Tesseract should be re-trained on whatever font you are focused on.
@@DFIRScience I see, thank you very much!
Hi sir
Much needed video..
Can u tell me how to train tesseract to identify specific font
a video on tips on how to train tesseract would be great! anyway thanks a lot for this video so far! helpful for my first steps and really appreciated!
I'm wondering if someone has already done -as something more looking like a sort of end user application rather than an in-the-field programmer use - (or eventually how to do it ) 1) an overlay of the pictured document and the ocr recognition in such a way that the original document remain displayed as it is but "highlight-able " or 2) aslo how to generate a parallel ocr document which keeps the letter positioning and layout in the space page of the ocr output like on the original picture and in case of a document keep the original cutted picture in case of difficulties and low confidence level in the recognition. like for example on graphs pictures drawings...
thanks so much. easy to understand and so helpful. you're a legend
Hi, how do link to the batch folder converting thingy?
Nice tutorial, makes everything nice and simple to handle - On another note, I want to call the tesseract.exe file from a .NET application that has just taken an image of some text, is there a way to get the output of the OCR as a string in the console? Or would I have to wait until the character recognition has completed, then go and read that text file at a later time?
Yeah, I'm pretty sure you have to read the file after. I'll check if you can output to pipe.
can we convert captcha image into text
I have photographs of people with the date printed below, can this solution extract the date? I need to do this for 1000s of photos. (batch)
Your instructions are phenomenal. You are amazing to explain computer commands and tricks. The only problem is that this program sucks and it is a nightmare to use it
Its not your fault. Thanks so much for teaching so many tricks.
thank you that was very helpful:-D
Glad it helped!
nice video, it's what I'm looking for , So, thank you very much!😀
Thanks for the information.
How can I install additional languages to the ones you sample? Maybe you already said it but my English is not very good and I didn't listen to it.
제가 찾던 동영상이네요 고맙습니다. ^^
Interestingly enough, the default install path for the Windows x64 version is:
C:\Users\username\AppData\Local\Programs\Tesseract-OCR
It helped me a lot. Thank you very much
How do u train the tesseract? Can u point me in the right direction with something I can use?
I'll try to do a video about that shortly. Until then you can check the documentation here: github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract
Can u pls help me in training tesseract..,, for the sake of public help.. I will be very thankful to you
Hi..Please guide me how I can retrieve the coordinate positions of the word that I retrieved from the image..
Thanks a lot. How can I add a new language after the installation?
I need help with mixed language pdfs - English and Ancient Greek. Also, I would like to target positions within the image taken from a pdf file.
what happen with the tutorial to make your own datatrainer :(
i usually play trivia games and i want to use it there can u plz try to make a video on that?
Basicly nice Video. But why you open and use PowerSHELL ISE, and then don't use anything from Powershell?
Not sure if you will answer to this but i'd love if you could help me doing the powershell/batch code you spoke about at the end to make it work on a hole file. I'm currently trying but not success yet. Good video btw !
Hey there. Sure, I can help with that. I'll post back after recording.
@@DFIRScience did you make a tutorial for training the ocr to get another alphabet? I'm trying to get it to work with IPA
Really good tutorial, clear.
hello, i want to use another language in tesseract
how to train the new language which is not in the language list
can you please do the batch file video?
Can I actually use this to categorize a file into different folders? Btw, im using php so i dont know if it will work
Hi, please share this pdf file to download.
How can I increase the accuracy?
You will need to retrain the model based on your specific problem. I'm working on a video for training tesseract.
can do this with a captcha image??????
Thanks bro it is really helpful
Thanks a lot! I appreciate it.
What a great job.
Hi, we are looking for some knowledgeable with OCR, specifically for text from a Video feed. The text would appear most often distorted, non-horizontal and sometimes wrapped or partially wrapped. The text to be read is strictly a short sequence of number and/or letters. There can be multiple variations of those sequences in the same image. Contact me that rings your bell :)
Wanted this same thing using java ..Please help!!!!
hey, does anyone know how to scan multiple pictures in one go and measure the amount of time taken for the same?
Thanks for the great video
Great tutorial! thx
first how to create pdf to images
how to convert multiple images from the folder. without giving image name one by one.
is there is any commend to do it.?
Hey there, you can use Snip & Sketch on Windows. I'm making a guide on just that currently.
How did you manage to get such fast results? It is taking me at least 15 seconds to OCR a full page...
The quality of your image will make a difference. Try around 300dpi. That will give you good recognition but should reduce processing time.
lstm_recognizer_->DeSerialize(&fp):Error:Assert failed:in file ../../../../ccmain/tessedit.cpp, line 193
i got the above error when try to perform
tesseract.exe 3.jpeg ..\out1.txt -l ben
plz help me out
try completely uninstalling and dowloading a updated version :v
hope it helps
형님 감사합니다.
Sir ocr can extract text from video ?
unfortunately no, but if you extract the frames and turn them into individual pictures, you can then execute the program and get the .txt files :3
please help me find how can i use it on MAC
pleeeeease
also one has to set TESSDATA_PREFIX to "installdir\tessdata"
Thanks a lot, brother
Thank you very much for your nice tutorial. Buy I would like to help with you that how to use this tesseract ocr without power she'll. How can I have can I use this very easy way that is either the first year I take the PNG or image then how to use is the tesseract another way so that I can easily without any complexity. After installation the it is a vector and the language platform how I can use this very easy way from the text and from the image.
but this is not detecting text from product images
Yes, there are a lot of situations where the current training will not work. You may need to create a training set based on the problems you are working on, and retrain tesseract with your problem set. I'm working on a video to make custom training sets for tesseract.
Congratulations on the video. I'm from Rio de Janeiro - Brazil. Great accent in English! Can we work with tesseract with PHP?
By the way what's your name?
Excellent video, however, my output was dreadful.
English, clear to see and it rendered about 90% fine, however, there are wingding style artefacts all over the place. A bit pants really.
Can also render as different file formats with some more easily readable formatting (.odt) etc etc
Will look for an alternative to compare against
If you'll be using the same types of input, you may want to train a new classifier on your specific dataset. For a random image 90% is not bad. I would make a filter script to clean the text and remove wingdings, etc.
Warning. Invalid resolution 0 dpi. Using 70 instead and blank text comes. please help
What is your input file? JPEG? PNG?
Png
You might try the solution here: stackoverflow.com/questions/42990139/tesseract-ocr-how-do-i-improve-result
Thanks . It worked
using powershell ? so its not really for windows? this is DOS.
Did you ever make a powershell script?
greet vidoe very clear .
you have a vidoe on how to train tesseract
?
please it can be very useful for me
the moment i type tesseract.exe --help, it opens the exe for installation ,don't know why
Try uninstalling, and downloading the installer from here: digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.01.exe
good info, but it would much better if the author could make a condensed video. He has repeated same info or provided unnecessary info at multiple places
I've used pdftoppm.exe from poppler. Works very well.
Thanks!
It doesn't appear that tesseract is any good
Default models are so-so. You'll definitely need to train on your specific problem. I've used default models for general ocr where high error wasn't a problem.
btw default windows ocr better than tesseract in my language
tnx a lot
Korean?
Tesseract is crud... Use Tabula and PDF's... You can select your tables also...
so it is easy to use to everyone and I am the one who is freaking out?!
Wonderful Dad!!..lol
tesseract 0001.jpg -l eng
Suzy!!!!
Tesseract OCR is terrible.
how in all dir by one click