How to perform text analytics in R on Multiple PDF Documents

Поділитися
Вставка
  • Опубліковано 6 вер 2024

КОМЕНТАРІ • 73

  • @prometeo34
    @prometeo34 2 роки тому +2

    Madame, you are one of the best teachers I have seen...well done! Thanks so much for these videos.

  • @saqibwarriach
    @saqibwarriach Рік тому

    The missing thing in Data Centric Inc series of tutorial is annotaion of Function words and content words as pre-processing steps, it will be highly pleasing to get your insights and hands-on annotaion and removal of function words prior to analysis.

  • @ehecatl3830
    @ehecatl3830 2 роки тому +1

    Your english is very clear Thanks

  • @rebeccadsolson1207
    @rebeccadsolson1207 2 роки тому +3

    You are a great teacher! Such clear explainations. Thank you so much!

    • @DataCentricInc
      @DataCentricInc  2 роки тому

      You're very welcome Rebecca! Glad it was helpful.

  • @christopherkhaddockphd9511
    @christopherkhaddockphd9511 Рік тому

    This video is excellent. You have a tremendous talent for teaching!

  • @carlitofernandes5491
    @carlitofernandes5491 2 роки тому +1

    Fantástico, thanks, i search skill work most pdfs, obrigado, from Brazil

  • @kemalgunay
    @kemalgunay 2 роки тому +2

    Very helpful content, thanks for sharing

  • @brianisinga918
    @brianisinga918 Рік тому

    This is fantastic. Thank you. Could you kindly consider making a video on how to remove the fist say 5 lines from several pdf files and merging them. Or rather combining data from different pdf files after the 5th line/row.

  • @affanasif7506
    @affanasif7506 Рік тому

    how to know the frequency of some particular words. for example I want to know the frequency of certain words like "Technology, blockchain, peer to peer transaction, new systems etc
    '

  • @alancelaya3123
    @alancelaya3123 9 місяців тому

    THANKS FOR THE TUTORIALS... I HAVE A QUESTION: I need to apply OCR on pdfs before starting to analyze them do you have a tutorial about this issue?

  • @pcsksa5
    @pcsksa5 2 роки тому +1

    That's brilliant. Thank you for sharing.

  • @jahanzebtube
    @jahanzebtube 2 роки тому +1

    Great explanation by explaining concepts in an easy way. You do it with simple ease. Thank you.
    I was running the same codes and I came across a problem. I was wondering if you could put some light on it. Basically, when I run the Corpus function is gives this error:
    Error in file(con, "r") : invalid 'description' argument
    Can you please help?

  • @christopherbrown576
    @christopherbrown576 Рік тому

    How do you search for individual specific terms, rather than frequently used terms? Thanks!

  • @vengateshprasathramamurthy2801
    @vengateshprasathramamurthy2801 2 роки тому +1

    Great Video! Thank you!

  • @andreubrito11
    @andreubrito11 2 роки тому +1

    Very good tutorial!!

  • @shehurufai9273
    @shehurufai9273 2 роки тому +1

    I look forward to working with you for my PhD thesis. Hope you will respond soon.

  • @universoflearningacademy9503
    @universoflearningacademy9503 4 місяці тому

    i tried lots of time by creating different project but always object database not found. what I will do when I run this pdfdatabase

  • @igwegbehenrychinaza7908
    @igwegbehenrychinaza7908 2 роки тому

    Thank you ma'am
    Kindly share the link to download the PDF so that I can repeat what you did at home.
    Thanks in anticipation.

  • @dawitzewde6654
    @dawitzewde6654 Рік тому

    You're fantastic, as always. Thanks so much for your help.

  • @vincentdepaulsavarimuthu779
    @vincentdepaulsavarimuthu779 2 роки тому +1

    really you are great madam.

  • @SanjayFuloria
    @SanjayFuloria Рік тому

    Thank you very much. I have a problem. When I run the Corpus function to create the pdfdatabase, I get the following error: PDF error: Unknown Metadata type: 'XMP'. Could you please help me with that?

  • @kats_pajamas6908
    @kats_pajamas6908 2 роки тому +1

    thank you so much! Amazing video!

  • @MsBambi01
    @MsBambi01 2 роки тому +1

    Thank you for a great video! It has helped me so much :)

  • @lowperformer_berlin
    @lowperformer_berlin Рік тому

    hey, really cool video! thank you very much! I have one question for the results of line 21. (Frequency analysis) So if we do not count the words with that function, what are the numbers in the [...] brackets tell me?

  • @josephjohns1336
    @josephjohns1336 2 роки тому +1

    Could you please make a video about how to scrape, clean, and visualize data from within tables in a pdf using R? Preferably not a video that uses the tabulizer library or family of libraries. Only pdf tools please.

    • @DataCentricInc
      @DataCentricInc  2 роки тому

      Hi Joseph, I will take a look at this and let you know.

    • @DataCentricInc
      @DataCentricInc  2 роки тому

      Hi John, you can look out for this video next Monday.

  • @zachabenz
    @zachabenz 2 роки тому +1

    Thanks you for your interesting video. I just ask plz where to get the "tm" pkg

    • @DataCentricInc
      @DataCentricInc  2 роки тому

      Hi Zacha B, you can type the following line to install the tm library: install.packages("tm")

    • @zachabenz
      @zachabenz 2 роки тому +1

      @@DataCentricInc Thank you very much. 👍🙏

  • @justdrawing9207
    @justdrawing9207 2 роки тому +1

    Hello, thank you for your videos, they help us so much! Please how many papers we can analyze? We can analyze more than 3 PAPERS ??

    • @DataCentricInc
      @DataCentricInc  2 роки тому +1

      You are welcome JustDrawing. You can analyze more than three. I have done up to 30 and you could probably do more.

    • @justdrawing9207
      @justdrawing9207 2 роки тому +1

      @@DataCentricInc Thank you so much professor 🙏🏻🙏🏻🙏🏻

  • @agustincsn
    @agustincsn 2 роки тому

    I tried and followed the scripts given but when I load command opinion

  • @itumelengmosala5335
    @itumelengmosala5335 2 роки тому

    Am continuing to struggle with the code below. Giving me error
    list.files(path = folder , pattern = "pdf$")
    folder

    • @DataCentricInc
      @DataCentricInc  2 роки тому

      Hi itumeleng, I have asked you to send me an email. Check the previous replies I have sent.

  • @17Adamovic
    @17Adamovic 2 роки тому +1

    thank you for the great work/video! One question, what would be the line to run to search for a specific set of words?

    • @DataCentricInc
      @DataCentricInc  2 роки тому +1

      Thanks 17Adamovic. If you watch parts 2 & 3 of text analytics on PDF, you will see additional ways to analyze the content on page level, document level and filter by words. Kindly see the following titles: How to perform Text Analytics on PDF Documents in R? Multiple PDF Analysis in R

    • @17Adamovic
      @17Adamovic 2 роки тому +1

      @@DataCentricInc as im brand new to learning R, and need it to do some research work for a professor, I've been watching and learning from your videos! I did watch the other parts, but I don't believe the search/count of a specific word was shown, unless I missed it. You show us how to filter or search for the most frequent words, but I was wondering if we could simply count the amount of a specific word, like "cyber"

    • @DataCentricInc
      @DataCentricInc  2 роки тому +3

      @@17Adamovic Kindly see code below that you can use to filter the frequency of words in the Term Document Matrix. Hope this helps :).
      inspect(opinions.tdm[c("cyber"),])#search for specific words

    • @17Adamovic
      @17Adamovic 2 роки тому +1

      @@DataCentricInc Ahh!! You are the best... thank you!

    • @17Adamovic
      @17Adamovic 2 роки тому

      @@DataCentricInc do you have a video on the cleaning code that needs to be done to avoid missing out on the search words with " ' " in them (like, cyber's)?
      When i apply the cleaning code in your current videos such as removePunctuation, stopwords, tolower, stemming, removeNumbers, bounds, and then search for a specific word, it still avoids the words with any apostrophes in them, even if i change the search term to say "cybers" since the previous coding might remove the apostrophe

  • @kripa_dristi
    @kripa_dristi 2 роки тому +1

    Can you please make a video on text mining in search of pdf online by using one keyword

    • @DataCentricInc
      @DataCentricInc  2 роки тому

      Thanks for your feedback Kripa however I need a little more clarity on this request. Is it that you want to search for a PDF file on the web using R and then perform text mining on the results?

    • @kripa_dristi
      @kripa_dristi Рік тому

      @@DataCentricInc can you directly implement text mining to search & download any pdf available in web or from any publisher

  • @harmandeepsingh8903
    @harmandeepsingh8903 2 роки тому

    Great Work Mam, i have one thing if you help me out on that, for example, we took only a specific term from the pdf and then want to analyze for that specific term. Is it possible

    • @DataCentricInc
      @DataCentricInc  2 роки тому +1

      Yes it is possible to focus on a term.

    • @harmandeepsingh8903
      @harmandeepsingh8903 2 роки тому

      Thank you for response, please do a video on that for your subscriber

  • @itumelengmosala5335
    @itumelengmosala5335 2 роки тому

    Apologies. I copied the code line incorrectly : Its refusing to accept apply function
    files

    • @DataCentricInc
      @DataCentricInc  2 роки тому

      Unfortunately if you do not send the email as per request I will not be able to assist you.

    • @Khomo.96
      @Khomo.96 Рік тому

      @itumeleng mosala did you succeed in applying the function?

  • @er2759
    @er2759 2 роки тому

    Hello thanky for the great videos!! I have some issues with line 4 its not working. I sent you an mail hopefully you can help me.
    The error is: Error in lapply(files, pdf_text) : object 'files' not found

    • @DataCentricInc
      @DataCentricInc  2 роки тому

      Hi ER, it is difficult to diagnose the problem just from this error, ensure you run the line that create the files variable just to so be sage because that could cause an error as well.

  • @agatabreczko6388
    @agatabreczko6388 2 роки тому

    Hello! When I am writing the code, in the line 9: "pdfdatabase

    • @DataCentricInc
      @DataCentricInc  2 роки тому

      Ensure you run the line to require pdftools

  • @jekieraya4000
    @jekieraya4000 Рік тому

    Hey maam. can i connect your code in a php file?