Getting Word Frequency from a Text File using Python Dictionaries

Поділитися
Вставка
  • Опубліковано 5 жов 2024

КОМЕНТАРІ • 25

  • @최주연-x9j
    @최주연-x9j Рік тому +5

    Hi, I am a college student majoring in computer engineering in South Korea. Your video really helped me a lot with my studies. Thank you.😊

  • @comrade_dankbob6876
    @comrade_dankbob6876 6 місяців тому +1

    You are my most beautiful sunshine Adam Gaweda, you give me the light of my tunnel. You make the grey days bright with your wonderful smile. You are my pookie wookie stuffy bear-boy and I want to cherish you for days-on-end. Adam, I love you dai-dai-dai-dai-ski

  • @solomonngare8382
    @solomonngare8382 Рік тому +1

    Thanks bro

  • @ammaralamin-z4l
    @ammaralamin-z4l Рік тому

    hi thanks, can I use it for the Arabic language to count words for me

  • @RRB47tv
    @RRB47tv Рік тому

    How do I get the least frequent? Excellent video!! Thank you

    • @AMGaweda
      @AMGaweda  Рік тому

      When I did sorted_values = sorted(sorted_values) you would omit the [::-1] portion. The [::-1] reverses the list so the largest appear first, but sorted(sorted_values) will have the least frequent first. It will be a lot of 1 count words, but that should do what you are looking for.

  • @bahaminakhtari4997
    @bahaminakhtari4997 Рік тому

    Hello, I enjoy watching your videos. This video helped a lot. I do have a question. How would you put the top ten words into a dictionary, where the key would be the word and the count would be the value?

    • @AMGaweda
      @AMGaweda  Рік тому +1

      Around Minute 8 there is a function that creates a sorted list of the most frequent words. If you wanted to put the top 10 in a dictionary, you'd need to create a new dictionary and add only the words from the sorted list into it.

    • @bahaminakhtari4997
      @bahaminakhtari4997 Рік тому

      @@AMGaweda I see. Thank you so much for replying!

  • @pradnyakasar614
    @pradnyakasar614 2 роки тому

    sir,How to find out the count of unique words from multiple text file at one time?

    • @AMGaweda
      @AMGaweda  2 роки тому

      I would still recommend using the counting method from this video but process it across a list of files. Once you've finished each file, the dictionary will have a list of keys you can look at (using the .keys() function). This will give you the list of unique words which you can then get how many by using len()

  • @thomaskersig5291
    @thomaskersig5291 2 роки тому

    Thanks for this!
    Using my own file (a .csv which I saved as .txt), I get the following output after running list.(word_count.keys())[:10] =
    [' \x00']
    Any suggestions of what to do? Does it make sense to rewrite the code to open the .csv, or will I run into the same problem?
    Best
    Thomas

    • @AMGaweda
      @AMGaweda  2 роки тому

      You'll most likely still run into the issue, since CSV files are just TXT files. Its mostly programs that treat them differently. I'd recommend doing a little "preprocessing" before you count your words by doing things like making all letters lowercase and removing excess white space. Such as your example, it might be good to do something like sentence = sentence.replace("\x00", "") to remove these kind of characters from analysis

  • @andytamburino1743
    @andytamburino1743 2 роки тому

    Do you teach a masters class at NCU? im aobut to finish my BA in Comp Sci and man you are an awesome teacher

    • @AMGaweda
      @AMGaweda  2 роки тому

      Thanks, I'm finishing up my PhD now, but hopefully in the fall wherever I end up I'll be teaching there

  • @LukaDonesnitch
    @LukaDonesnitch Рік тому

    can you explain how to swap out the .txt file for a .csv file? I'm trying to add a user input line and when the user searches for a word in the csv file on column 3 it prints the output of how many occurrences the word is in the csv file. so far when i make the changes to csv and increase the increment by 1 it has an error message TypeError: string indices must be integers.

    • @AMGaweda
      @AMGaweda  Рік тому

      It depends on the format of the file, but take a look at my video on using CSVReader ua-cam.com/video/116KWyLc6J8/v-deo.html
      You'll follow the same ideas - getting a list of the words, then use a dictionary to get the count of that word. You may also not need the dictionary, since a running total just needs a for loop to iterate through a list. One trick I like to use is to load the contents of a file into a "contents" file first, ala contents = open(filename, 'r').readlines(). This way, I no longer need to worry about the file handling aspect of my analysis and can instead rely on the list.

  • @Vagabund92
    @Vagabund92 2 роки тому

    Thank you. I learned a lot. I also appeciate the comments in the code.
    Only thing is that I didn't get rid off all the punctuation in my text (that I wrote myself as a dummy. I mass copied "thousand.thousand.thousand.thousand" next to each other and it stayed that way).
    Would be cool if you could share the code and the Alice in Wonderland text.

    • @AMGaweda
      @AMGaweda  2 роки тому

      I don't share my code mostly to encourage students to code along with me BUT you can download a copy of Alice in Wonderland on Project Gutenburg www.gutenberg.org/ebooks/11

    • @Vagabund92
      @Vagabund92 2 роки тому +1

      @@AMGawedaokay, I already replicated you Code and thought that copying it would have been handy. :D

  • @sharma3226
    @sharma3226 2 роки тому

    Sir could you Pleasee guide me how to sort number of frequent words used in pdf document. because i want to learn the most important major words for exam would be very helpful 🙏🏽🙏🏽.

    • @AMGaweda
      @AMGaweda  2 роки тому

      There isn't a "clean" way to extract text from a PDF, however you can utilize some of Python's third-party libraries to do this. For example, PDFPlumber (github.com/jsvine/pdfplumber) will allow you to extract text. Please note, this is expecting the PDF's text to be TEXT. Text inside of graphics or pictures, or pictures of text, will not get extracted.

    • @sharma3226
      @sharma3226 2 роки тому

      @@AMGaweda okk i able to convert pdf text into text file then...?

    • @AMGaweda
      @AMGaweda  2 роки тому

      @@sharma3226 Then you can do the methods shown in the video