Keep doing the great work Mukesh.. I have learnt alot from your channel.. will continue to do so.. you upload small videos which is good to me since I get bored in one hour kinda video and loose interest in the topic Thanks for all you efforts..
What is the difference between jupyter notebook and Pycharm? In Jupyter, where we use Pandas to open a PDF/CSV/TXT files. Which is efficient to learn and apply in real time?
can you do a simple write? I haven't seen a write 'hello world' its always read this pdf and write it on another pdf. what if I just want to start a pdf with strings and images...??
This will work for a simple PDF file but for a complex PDF where we have tables ,multiple pages,images,non English character ,there it would not work...could u plz show us reading a complex PDF file..
Hi Subhranshu current approach will fetch the data from tables as well. Multiple pages already covered in video. When it comes to image you can open pdf in rb (read binary) mode which will return binary data. For non english char you can change the enconding. I will try to make video on this.
Hi Ankur, above video is when you need to validate specific String or keyword in pdf. When it comes to comparing two pdfs then we have many lib in python which can help you. Please explore the same. One of the lib is pypi.org/project/diff-pdf-visually/
a lot is old syntax. mine is the newer ayntax import PyPDF2 file=open('Ansible+Roles.pdf', 'rb') reder=PyPDF2.PdfReader(file) print(len(reder.pages)) page1=reder.pages[1] #print(page1.extract_text()) pdfdata=page1.extract_text() assert "PRINCE" in pdfdata print("PRINCE" in pdfdata)
from PyPDF2 import PdfReader reader = PdfReader("file.pdf") all_pages = reader.pages for page in range(len(all_pages)): text = all_pages[page].extract_text() for line in text.split(" "): print(line)
@@Jason-ot6jv Hi Jason, I did this an in my terminal it says "Requirement already satisfied". I'm still getting the same "No module named PyPDF2" issue. Any thoughts?
Keep doing the great work Mukesh.. I have learnt alot from your channel.. will continue to do so.. you upload small videos which is good to me since I get bored in one hour kinda video and loose interest in the topic
Thanks for all you efforts..
Hi Himank thank you so much, I am glad you liked Python series.
Thank you! Helped my project from across the world! Greeting from Brazil!
this is exactly what I was looking for! Thanks :D
YESSSS thank you for your help!!
Thank you somkuch bhaiya, really helped my project 🙏🙏🙏❤️
Thanks Dorjee
What is the difference between jupyter notebook and Pycharm? In Jupyter, where we use Pandas to open a PDF/CSV/TXT files. Which is efficient to learn and apply in real time?
Very useful! Thank you!
Can we do the same thing with uploaded pdf by user?
Hello thank u for the informative video
I have a problem compiling the code the pypdf gives me error progressbar not recognised how to solve it please
Hi, is it a way to get splitted lines instead of just one merged line ? thank you
Thanks a lot. Very useful.
You are welcome! Yasser
How to get the title of the PDF's content such as "A Simple PDF File"?
please can you make one lacture that is taken input as excel file and output is docx file for that excel file
sir i am having attribution error at line 3:
reader=PyPDF2.pdfFileReader(file)
AttributeError: module 'PyPDF2' has no attribute 'pdfFileReader'
Hi Sidstar seems you have not installed lib properly. please try installing again and if you are working in Pycharm then do add in Pycharm too.
its PdfFileReader() not pdfFileReader()
Mukesh can we have an similar logic in Eclipse using Java?
can you do a simple write? I haven't seen a write 'hello world' its always read this pdf and write it on another pdf. what if I just want to start a pdf with strings and images...??
It worked, thank you
You're welcome!
This will work for a simple PDF file but for a complex PDF where we have tables ,multiple pages,images,non English character ,there it would not work...could u plz show us reading a complex PDF file..
Hi Subhranshu current approach will fetch the data from tables as well. Multiple pages already covered in video. When it comes to image you can open pdf in rb (read binary) mode which will return binary data. For non english char you can change the enconding.
I will try to make video on this.
@@Mukeshotwani Thanks a lot Mukesh..plz make a video on dis..u r a great inspiration for us..we are waiting for dat video ..
How we can compare two pdfs where contents on both pdfs are same but they positioned in different locations of pdfs. We can’t compare line by line.
Hi Ankur, above video is when you need to validate specific String or keyword in pdf. When it comes to comparing two pdfs then we have many lib in python which can help you. Please explore the same. One of the lib is pypi.org/project/diff-pdf-visually/
İ try to extract the text but it just gives an empty string
Please debug your code. I have dedicated video on How to debug your code.
very useful. Thanks
Sir plz help me my code is not work it give warning xref table not zero index .I'd no for object will be corrected [pdf.py:1736]😢😢😢😢
a lot is old syntax.
mine is the newer ayntax
import PyPDF2
file=open('Ansible+Roles.pdf', 'rb')
reder=PyPDF2.PdfReader(file)
print(len(reder.pages))
page1=reder.pages[1]
#print(page1.extract_text())
pdfdata=page1.extract_text()
assert "PRINCE" in pdfdata
print("PRINCE" in pdfdata)
May I know where you are writing the code
Hi Raji I am using Pycharm ua-cam.com/play/PL6flErFppaj3FhVG-3RGGQx-Mvj7DXrpX.html
Perfect!
what to do for extracting all pages of pdf. I have been searching for this solution for last 24 hours
Hi Ayush, you can run a loop which will iterate all the pages one by one.
@@Mukeshotwani You can give an example plz?
very nice! does it work without java?
Yes for Java we have diff lib
@@Mukeshotwani thanks!
how can I read line by line?
from PyPDF2 import PdfReader
reader = PdfReader("file.pdf")
all_pages = reader.pages
for page in range(len(all_pages)):
text = all_pages[page].extract_text()
for line in text.split("
"):
print(line)
When I follow the above instructions I get superflous whitespace error 🙂 can any one help me with this issue
import isn't working, any hint please?
make sure yo do 'pip3 install PyPDF2' in the terminal
@@Jason-ot6jv Hi Jason, I did this an in my terminal it says "Requirement already satisfied". I'm still getting the same "No module named PyPDF2" issue. Any thoughts?
Bro I can't access my pdf using your code
@Xeno The Strange i literally just copied what he does bro
I had the same issue. I used r"\\users\\... and "C:"
What language R U speaking?!?!
It isn't english
Main thing is did you get the concept or not?