How To Read PDF Files in Python using PyPDF2

Поділитися
Вставка
  • Опубліковано 12 лис 2024

КОМЕНТАРІ • 53

  • @himankshekher8645
    @himankshekher8645 4 роки тому +5

    Keep doing the great work Mukesh.. I have learnt alot from your channel.. will continue to do so.. you upload small videos which is good to me since I get bored in one hour kinda video and loose interest in the topic
    Thanks for all you efforts..

    • @Mukeshotwani
      @Mukeshotwani  4 роки тому +1

      Hi Himank thank you so much, I am glad you liked Python series.

  • @snicket87
    @snicket87 2 роки тому +1

    Thank you! Helped my project from across the world! Greeting from Brazil!

  • @wendyesquivel8179
    @wendyesquivel8179 2 роки тому

    this is exactly what I was looking for! Thanks :D

  • @elizabethhanks1042
    @elizabethhanks1042 3 роки тому +1

    YESSSS thank you for your help!!

  • @tenzindorjee7689
    @tenzindorjee7689 2 роки тому +1

    Thank you somkuch bhaiya, really helped my project 🙏🙏🙏❤️

  • @KrishnaReddy-zz4yu
    @KrishnaReddy-zz4yu 3 роки тому

    What is the difference between jupyter notebook and Pycharm? In Jupyter, where we use Pandas to open a PDF/CSV/TXT files. Which is efficient to learn and apply in real time?

  • @emanuelalves593
    @emanuelalves593 Рік тому

    Very useful! Thank you!

  • @nitinkumarshukla6967
    @nitinkumarshukla6967 Рік тому

    Can we do the same thing with uploaded pdf by user?

  • @bigbro1231000
    @bigbro1231000 2 роки тому

    Hello thank u for the informative video
    I have a problem compiling the code the pypdf gives me error progressbar not recognised how to solve it please

  • @verkar1965
    @verkar1965 3 роки тому

    Hi, is it a way to get splitted lines instead of just one merged line ? thank you

  • @KhalilYasser
    @KhalilYasser 4 роки тому +1

    Thanks a lot. Very useful.

  • @Kenneth-f5d
    @Kenneth-f5d Рік тому

    How to get the title of the PDF's content such as "A Simple PDF File"?

  • @SandeepKumar-px4kf
    @SandeepKumar-px4kf 2 роки тому

    please can you make one lacture that is taken input as excel file and output is docx file for that excel file

  • @sidstarsiddhu9275
    @sidstarsiddhu9275 3 роки тому +2

    sir i am having attribution error at line 3:
    reader=PyPDF2.pdfFileReader(file)
    AttributeError: module 'PyPDF2' has no attribute 'pdfFileReader'

    • @Mukeshotwani
      @Mukeshotwani  3 роки тому

      Hi Sidstar seems you have not installed lib properly. please try installing again and if you are working in Pycharm then do add in Pycharm too.

    • @Kmysiak1
      @Kmysiak1 3 роки тому

      its PdfFileReader() not pdfFileReader()

  • @KkdvPrasad
    @KkdvPrasad 3 роки тому

    Mukesh can we have an similar logic in Eclipse using Java?

  • @en_coded
    @en_coded 2 роки тому

    can you do a simple write? I haven't seen a write 'hello world' its always read this pdf and write it on another pdf. what if I just want to start a pdf with strings and images...??

  • @LeZinZin95
    @LeZinZin95 2 роки тому +1

    It worked, thank you

  • @subhransupanda7052
    @subhransupanda7052 4 роки тому +3

    This will work for a simple PDF file but for a complex PDF where we have tables ,multiple pages,images,non English character ,there it would not work...could u plz show us reading a complex PDF file..

    • @Mukeshotwani
      @Mukeshotwani  4 роки тому +5

      Hi Subhranshu current approach will fetch the data from tables as well. Multiple pages already covered in video. When it comes to image you can open pdf in rb (read binary) mode which will return binary data. For non english char you can change the enconding.
      I will try to make video on this.

    • @subhransupanda7052
      @subhransupanda7052 4 роки тому

      @@Mukeshotwani Thanks a lot Mukesh..plz make a video on dis..u r a great inspiration for us..we are waiting for dat video ..

    • @NOTHING-j2h
      @NOTHING-j2h 4 роки тому +2

      How we can compare two pdfs where contents on both pdfs are same but they positioned in different locations of pdfs. We can’t compare line by line.

    • @Mukeshotwani
      @Mukeshotwani  4 роки тому

      Hi Ankur, above video is when you need to validate specific String or keyword in pdf. When it comes to comparing two pdfs then we have many lib in python which can help you. Please explore the same. One of the lib is pypi.org/project/diff-pdf-visually/

  • @kamal3777
    @kamal3777 3 роки тому +2

    İ try to extract the text but it just gives an empty string

    • @Mukeshotwani
      @Mukeshotwani  3 роки тому

      Please debug your code. I have dedicated video on How to debug your code.

  • @ahmadrahmatulloyev162
    @ahmadrahmatulloyev162 Рік тому

    very useful. Thanks

  • @nazishsultana5273
    @nazishsultana5273 3 роки тому

    Sir plz help me my code is not work it give warning xref table not zero index .I'd no for object will be corrected [pdf.py:1736]😢😢😢😢

  • @suhelmallick
    @suhelmallick Рік тому +2

    a lot is old syntax.
    mine is the newer ayntax
    import PyPDF2
    file=open('Ansible+Roles.pdf', 'rb')
    reder=PyPDF2.PdfReader(file)
    print(len(reder.pages))
    page1=reder.pages[1]
    #print(page1.extract_text())
    pdfdata=page1.extract_text()
    assert "PRINCE" in pdfdata
    print("PRINCE" in pdfdata)

  • @srirajid
    @srirajid 3 роки тому +1

    May I know where you are writing the code

    • @Mukeshotwani
      @Mukeshotwani  3 роки тому

      Hi Raji I am using Pycharm ua-cam.com/play/PL6flErFppaj3FhVG-3RGGQx-Mvj7DXrpX.html

  • @centrodoreforco-aulasderef7743
    @centrodoreforco-aulasderef7743 2 роки тому

    Perfect!

  • @ayushmittal2754
    @ayushmittal2754 2 роки тому +2

    what to do for extracting all pages of pdf. I have been searching for this solution for last 24 hours

    • @Mukeshotwani
      @Mukeshotwani  2 роки тому

      Hi Ayush, you can run a loop which will iterate all the pages one by one.

    • @wilianuhlmann5284
      @wilianuhlmann5284 2 роки тому

      @@Mukeshotwani You can give an example plz?

  • @freedoom4090
    @freedoom4090 2 роки тому +1

    very nice! does it work without java?

  • @lasnroo
    @lasnroo 2 роки тому +1

    how can I read line by line?

    • @lokusok5080
      @lokusok5080 2 роки тому +1

      from PyPDF2 import PdfReader
      reader = PdfReader("file.pdf")
      all_pages = reader.pages
      for page in range(len(all_pages)):
      text = all_pages[page].extract_text()
      for line in text.split("
      "):
      print(line)

  • @logapriyas6911
    @logapriyas6911 2 роки тому

    When I follow the above instructions I get superflous whitespace error 🙂 can any one help me with this issue

  • @monicalelli5369
    @monicalelli5369 2 роки тому

    import isn't working, any hint please?

    • @Jason-ot6jv
      @Jason-ot6jv 2 роки тому

      make sure yo do 'pip3 install PyPDF2' in the terminal

    • @thekyreefuller
      @thekyreefuller 2 роки тому

      @@Jason-ot6jv Hi Jason, I did this an in my terminal it says "Requirement already satisfied". I'm still getting the same "No module named PyPDF2" issue. Any thoughts?

  • @tanny_edits
    @tanny_edits 3 роки тому

    Bro I can't access my pdf using your code

    • @tanny_edits
      @tanny_edits 3 роки тому

      @Xeno The Strange i literally just copied what he does bro

    • @bryanhernandez2861
      @bryanhernandez2861 3 роки тому

      I had the same issue. I used r"\\users\\... and "C:"

  • @johnalbertson4424
    @johnalbertson4424 2 роки тому +1

    What language R U speaking?!?!
    It isn't english

    • @Mukeshotwani
      @Mukeshotwani  2 роки тому +4

      Main thing is did you get the concept or not?