How to Install and Use Tesseract OCR on Windows - Optical Character Recognition

Поділитися
Вставка
  • Опубліковано 3 гру 2024

КОМЕНТАРІ • 96

  • @augusthyden1077
    @augusthyden1077 9 місяців тому +18

    Holy you should be the standard for youtube tutorials, never experienced such quick and concise tutoring!

    • @JayMartMedia
      @JayMartMedia  9 місяців тому +2

      Glad you found it helpful!

    • @RoyelPayne
      @RoyelPayne 4 місяці тому +2

      No kidding, this guy has just set the gold standard. Very good, thanks for sharing!

  • @clehil
    @clehil Рік тому +12

    This tutorial should be the canon online tutorial Windows users of Tesseract. The work you did to eliminate distractions to make the instructions work successfully at the fast and precise rate is very apparent.

    • @JayMartMedia
      @JayMartMedia  Рік тому +1

      Thanks! Glad to hear that you found the video helpful!

  • @MrBendybruce
    @MrBendybruce 2 роки тому +39

    Thank you for making this video. I am visually impaired and am currently in the biggest battle of my life to try and save the vision I still have left. This OCR software is a valuable tool in allowing me to be able to read my own physical mail by scanning it into my computer as a JPEG and then converting it into text which can then be read aloud. Thanks again.

    • @TVDaJa
      @TVDaJa 2 роки тому +7

      Keep up keeping up, Bendy

    • @maxsilon
      @maxsilon Рік тому +1

      Good job! Keep you man!!

  • @yekna459
    @yekna459 4 роки тому +13

    By far the best tutorial on Tesseract on youtube. Thanks for uploading

  • @Sodomantis
    @Sodomantis 11 місяців тому +1

    02:03 you can just press the crop button and the image size will conform to the image you pasted in. Thanks for video.

  • @matthewjohnson3610
    @matthewjohnson3610 4 роки тому +8

    Thanks so much. I've been looking for a solution for this for years. Every few months I go looking and have never been able to find anything. I got a bit lucky this time that I stumbled upon Tesseract but couldn't figure out how to get started. This was perfect.

    • @JayMartMedia
      @JayMartMedia  4 роки тому +1

      Great, I'm glad you found this video helpful. Thanks for the encouraging feedback!

  • @falsigo
    @falsigo 2 місяці тому +1

    I could understand without any volume, kudos! Thanks

  • @RogerCooley
    @RogerCooley 9 місяців тому +4

    Thank You. A complete presentation. Following your video I was able to extract a few PNG files. I wish you spoke a little slower. I had to stop and rewind segments a few times to understand what you were saying. Please Annunciate. Thanks again.

  • @vadimcastro332
    @vadimcastro332 4 роки тому +8

    thank you sir!! this was very helpful and respectful of the viewer's time, really appreciate it!

  • @matthewjohnson3610
    @matthewjohnson3610 4 роки тому +7

    Also in case anybody is wondering, put something like this in a batch file if you want to process a folder of files: for %%X in (*.png) do "tesseract.exe" "%%X" "%%X-ocr"

  • @shreesingh3137
    @shreesingh3137 3 роки тому +4

    *Mind Blowing Video* on *Tesseract-ORC* 🔥🔥

  • @jacobhadden5407
    @jacobhadden5407 Рік тому +2

    That worked great. I am not particularly sophisticated on the cmd prompts but was able to sort it out. The tesseract ocr is very accurate

  • @CharlieKelloggPilot
    @CharlieKelloggPilot 2 роки тому +2

    Well done. Thankyou for being quick, and to the point.

  • @beastexperiments6155
    @beastexperiments6155 15 днів тому +1

    I am very grateful that your guidance helped me a lot. Thank you so much for making this video.

  • @aligeovany4645
    @aligeovany4645 2 місяці тому +1

    Jay Mart, by this video, you make me your fan. thanks for sharing this quick, perfect and usefull video.

  • @betting55555
    @betting55555 Рік тому +2

    Awesome video, good step by step. Thank you!

  • @MedoHamdani
    @MedoHamdani Рік тому +2

    Thank you straight to the point, is the video about Python integration available or not yet? Is it possible to process batch of images and is it possible to extract directly from PDF scanned images file. Lastly, is it possible to put them in a GUI?
    Thanks mate

  • @Sameh_Abdel-Qawy
    @Sameh_Abdel-Qawy 11 місяців тому +1

    This was very helpful. Thanks a lot! I'd like to know if there is a script to extract text of all the images automatically without select it? Thanks again.

  • @23498cna
    @23498cna 3 місяці тому +1

    wow, you are absolutely fantastic! Thank you so much!

    • @JayMartMedia
      @JayMartMedia  3 місяці тому

      Thanks for commenting! Glad you found it helpful!

  • @HiPh0Plover1
    @HiPh0Plover1 4 роки тому +5

    nice vid , where is the python integration part ?

  • @sahil5124
    @sahil5124 11 місяців тому +2

    thank you so much, it is working

  • @LilaGovindaDas
    @LilaGovindaDas 11 місяців тому +1

    try restarting your pc if after setting the new environment path cmd still doesnt recognize it. worked for me

  • @legitordont
    @legitordont Рік тому +2

    Thank you are the best

  • @PlanetXtreme
    @PlanetXtreme 4 місяці тому +1

    godsend tutorial maker

  • @samrahmazhar2716
    @samrahmazhar2716 3 роки тому +2

    how to give pdf input to tesseract?

  • @rudeus8998
    @rudeus8998 14 днів тому +1

    1:45 followed till this then typed "tesseract" in command prompt but it's still saying "tesseract not recognized as internal or external command"

    • @JayMartMedia
      @JayMartMedia  13 днів тому

      Double check that the file path for the tesseract executable has been added to the PATH environment variable.
      Also, you will need to open a NEW command prompt after the file path has been added to the PATH environment variable. This is because the PATH environment variable is loaded when cmd prompt starts, so if PATH is updated while cmd prompt is already open it does not have the new PATH value.

    • @rudeus8998
      @rudeus8998 13 днів тому

      @@JayMartMedia i double checked. Still didn't work

    • @bishalsharma9238
      @bishalsharma9238 6 днів тому

      @@JayMartMedia same goes for me , didn't work

  • @ElBart0oo8
    @ElBart0oo8 4 місяці тому +1

    Hi, thank you so much for the video I found it highly professional. How could you set up Tesseract to continuously extract text from a given portion of the screen?

    • @JayMartMedia
      @JayMartMedia  4 місяці тому

      There may be a better tool to use for this.
      But in theory you could do something like this by writing a script to capture a screenshot and run it through tesseract every few seconds. It could take a bit of programming though.

  • @manhvo242
    @manhvo242 5 місяців тому

    0:45 How did you download so fast? I tried to download it but it only doing ~250Kb/s (sometimes it got to 0)
    I know I have a fast network but it just download so slow

  • @AshleySheley-zk9rt
    @AshleySheley-zk9rt 6 місяців тому +1

    When I open the tesseract-result it is opened in Notepad. How do I open it in tesseract like you did?

    • @JayMartMedia
      @JayMartMedia  6 місяців тому

      Are you talking about at 2:22 ? If so, the file is being opened in a text editor called Atom. It's not being opened in Tesseract.
      Atom is a free text editor, but most people just use VSCode be Microsoft nowadays

  • @guillermocascomiranda
    @guillermocascomiranda 2 роки тому +2

    Very interesting video. How can I make it recognize only 100% black text, and discriminate against other colors such as gray, blue, buttons and images?

    • @JayMartMedia
      @JayMartMedia  2 роки тому +2

      I don't see anything in the tesseract documentation specifically about matching color: github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md
      But one option you could do would be to pass the image through an image processor first to filter out any unneeded color, before passing the image to tesseract. You could do that with manually with a photo editing software such as gimp, or automate it with a script
      Python script using the opencv library: medium.com/featurepreneur/colour-filtering-and-colour-pop-effects-using-opencv-python-3ce7d4576140
      Or use 'convert' from the command line: stackoverflow.com/questions/29742123/remove-all-except-one-colour-from-an-image-commandline-or-code

    • @guillermocascomiranda
      @guillermocascomiranda 2 роки тому +1

      @@JayMartMedia You're right, your help was huge!!! I tried this code "threshold(img,T,255,cv.THRESH_BINARY)" and the filter works, where 'T' is the threshold value, below or above which it turns everything black or white. The problem is that I have thousands of screenshots, and the idea is to automate them in such a way that I don't have to manually edit them with an image editor like Paint, Gimp, Ps, Ai, etc. In addition, the additional problem (and advantage) that I have is that the words ALWAYS appear in the same region of the capture, and I would like it to take ONLY that region, not the surrounding images or buttons. It's just black and gray letters, I need only the black ones, and have them list one below the other in a TXT or CSV file for excel. The captures are digital, from a mobile application, so the quality is very good, centered, and the same typography always, but I don't know how to automate it as much as possible (it extracts and lists in a TXT only the words in black listed in the screenshot)

    • @JayMartMedia
      @JayMartMedia  2 роки тому +2

      This article about using tesseract in python may help, it is definitely a little bit python programming intensive: nanonets.com/blog/ocr-with-tesseract/

    • @guillermocascomiranda
      @guillermocascomiranda 2 роки тому +3

      @@JayMartMedia UR simply a genius. You have no idea how much help me. Kind regards from Argentina. You earned a new subscriber :)

  • @AkioEndo197
    @AkioEndo197 5 місяців тому +1

    Can I simply launch it as an executable rather then changing my registry?

    • @JayMartMedia
      @JayMartMedia  5 місяців тому

      You can include the full path the tesseract rather than adding to the PATH.
      For example: C:\Program Files\...
      May need to wrap the path to tesseract in quotes if it contains a space.

    • @AkioEndo197
      @AkioEndo197 5 місяців тому

      @@JayMartMedia I want to be able to launch it as an executable without changing my files or anything like that. I'm fine with downloading and uninstalling though.

    • @JayMartMedia
      @JayMartMedia  5 місяців тому

      When running it through the command prompt, it is running the executable.
      Are you wondering if there is a graphical user interface, rather than using the command line? If so, here is a web based tesseract tool that you could use: tesseract.projectnaptha.com/
      Video on web passed project: ua-cam.com/video/tFW0ExG4QZ4/v-deo.htmlsi=NjpporeeM7q07szi

    • @AkioEndo197
      @AkioEndo197 5 місяців тому

      @@JayMartMedia I want the Japanese one.

  • @zidouneca
    @zidouneca 20 днів тому +1

    And how to output 1000 image to text in the same time? What is the code for that?

    • @JayMartMedia
      @JayMartMedia  20 днів тому

      This video has an example of how to do that with a python script: ua-cam.com/video/HNCypVfeTdw/v-deo.html

    • @zidouneca
      @zidouneca 20 днів тому +1

      Thank you

  • @lwjunior2
    @lwjunior2 Рік тому

    Can the program conduct OCR on an entire Folder of images?

  • @Midnasv
    @Midnasv Рік тому +3

    Thanks. To the point.

    • @Midnasv
      @Midnasv Рік тому

      Do you know if it is possible to use OCR with password protected PDF?

  • @QuranicHealingIN
    @QuranicHealingIN 8 місяців тому

    Hi Jay! I want to use Tesseract OCR to convert bulk image to text. Please help!

  • @cindylloyd306
    @cindylloyd306 Рік тому +1

    I cannot see what you're typing in the command lines. Also, I can make it fine up to CD Pictures. How do you tell Tesseract the directory of where my image file is? I have an external drive, where I store image pdfs. I would like to OCR those but all I get is nothing. I've no idea what the output file is, nor how to enter it. I would have preferred seeing the full screen command lines and having directories explained. I know the others are thrilled but I'm a goob as far as this kind of stuff goes. 🤣🤣🤣

  • @tazyeenalam
    @tazyeenalam 5 місяців тому

    does this work for pdf files that have numerous pages scanned images also?

    • @MOHAMEDIBRAHIM-yw6pt
      @MOHAMEDIBRAHIM-yw6pt 5 місяців тому

      Hi, I am also searching for the feasibility to read a pdf file with multiple pages and detect a signature in that file using OCR, if you have done that can you please help me out?

  • @yeahx32p69
    @yeahx32p69 10 місяців тому +1

    thx a lot. Going to automate my grocery list record misery 😂😂

  • @GlobalEconInsights
    @GlobalEconInsights 6 місяців тому +1

    Best tutorial for that

  • @पापानटोले
    @पापानटोले 4 роки тому

    Great.
    Any idea how we can train a special character like checkbox with tick ?

  • @rizaladhi7066
    @rizaladhi7066 Рік тому

    please share tutorial to find specified image contain text in folder that have 500 image

  • @krishnaagarwal2056
    @krishnaagarwal2056 Рік тому

    I can't download the package. It has been detected with virus. Can you suggest any other software to download

  • @123LuisX
    @123LuisX 2 роки тому

    i added the path to a variable but still not recognize in my windows. do you know what i need to do?=

    • @JayMartMedia
      @JayMartMedia  2 роки тому +2

      You may need to close all your command prompts, and then reopen the command prompt in order to reload those variables

    • @alexlenhoff6274
      @alexlenhoff6274 2 роки тому +2

      @@JayMartMedia Thanks! I was having the same problem and that fixed it

  • @suniltiwari4387
    @suniltiwari4387 3 роки тому

    Can you help us how to install cvat in local server ? In windows server 2019 Please ?

  • @Farrolet
    @Farrolet 6 місяців тому +1

    thank you sir

  • @Black-ie8qz
    @Black-ie8qz Місяць тому +1

    Thanks God.

  • @drallisimo34
    @drallisimo34 7 місяців тому +1

    very useful tutorial! 5*

  • @MrBorgj
    @MrBorgj 4 роки тому +2

    anyone who just wants a gui interface for tesseract should look for gImageReader :D

  • @AtomicTech37
    @AtomicTech37 3 роки тому +1

    pretty helpful!

  • @IR_Mediaa
    @IR_Mediaa Рік тому

    Mort OCR try that guys its work well

  • @aakashstudyspecials8196
    @aakashstudyspecials8196 4 роки тому +2

    ty
    so much

  • @Papiii_benz
    @Papiii_benz 7 місяців тому

    I'm still having trouble

    • @JayMartMedia
      @JayMartMedia  7 місяців тому

      What are you having trouble with?

  • @bilawalmalik-tm6np
    @bilawalmalik-tm6np 6 місяців тому

    good video

  • @bouchelligamohamedhedi2747
    @bouchelligamohamedhedi2747 4 роки тому +1

    pretty helpfull

  • @sebastianparias1962
    @sebastianparias1962 7 місяців тому

    thanks

  • @lopezgladwell2014
    @lopezgladwell2014 4 роки тому

    Didn't work.

  • @JF-pl2fh
    @JF-pl2fh 7 місяців тому +1

    Best not to install with program files and keep it in users/ because windows messes with program files and was ruining my workflow. Good tutorial still.

  • @papastalin3498
    @papastalin3498 4 місяці тому

    "The system cannot find the path specified" mf it's Pictures

  • @mohamedkhalith4629
    @mohamedkhalith4629 4 роки тому +2

    Nice Video but over speed
    I will set playback speed to 0.5x who did it?

  • @cornevanzyl5880
    @cornevanzyl5880 2 роки тому

    I dont like the implementation. I want something with 2 clicks. Open the app, snip my content and copy the text and use it. simple

    • @mattg2770
      @mattg2770 2 роки тому +1

      Well there are some of those but you have to pay. This is not that.

  • @MyProfitCodeDotCom
    @MyProfitCodeDotCom Рік тому +1

    Thanks for the short, effective guide... are you still going to show how to integrate with Python? (This has been asked about more than once by commentors.)

  • @BashfulNuke
    @BashfulNuke 4 місяці тому +1

    everything up to this point works correctly tesseract test.png tesseract-results
    Error; 'tesseract' is not recognized as an internal or external command,
    operable program or batch file.

    • @JayMartMedia
      @JayMartMedia  4 місяці тому +1

      Have you added the full path to the folder with the tesseract.exe file to your PATH environment variable? After adding as an environment variable you will need to close and relaunch the command prompt. (Environment variables such as PATH are loaded when the command prompt starts. So when it is already open and you add or change a variable it doesn't get loaded immediately. The updated variables only get loaded when starting.)
      Another thing you could try is opening the command prompt in the folder where the tesseract.exe is and seeing if the "tesseract" command will work there.
      Let me know if that helps!

    • @BashfulNuke
      @BashfulNuke 4 місяці тому +1

      @@JayMartMedia i appreciate the quick response ill need to look back and see what i did but i got it working you earned my sub for this i appreciate you