Scraping weather data from the internet with R and the tidyverse (CC231)

Поділитися
Вставка
  • Опубліковано 17 жов 2024

КОМЕНТАРІ • 43

  • @MrMandarpriya
    @MrMandarpriya 22 дні тому

    Thanks a tons Sir. I am in Germany and i was able to get the lattitude and longitude for my place. This is so incredible .

    • @Riffomonas
      @Riffomonas  21 день тому

      Wonderful - that's great! 🤓

  • @sven9r
    @sven9r 2 роки тому +6

    For everybody having a hard time with parentheses like Pat has @13:00
    Tools -> "Global options "-> "Code" -> On the top to "Display" and then tick Rainbow parentheses

    • @Riffomonas
      @Riffomonas  2 роки тому

      You don’t like my “see if we get an error message”? 😂

    • @sven9r
      @sven9r 2 роки тому +1

      Not at all! I'm loving it! But beginners often struggle with this stuff!
      Cheers

    • @Riffomonas
      @Riffomonas  2 роки тому

      @@sven9r 🤣

    • @yaqinguo8971
      @yaqinguo8971 Рік тому

      It's a good hint. But, interestingly, i did not have this option.

  • @davidmantilla1899
    @davidmantilla1899 2 роки тому +2

    Your tutorials are great. I have a purely wet bio background and your videos helped me kickstart my computational biology literacy. Thank you for openly sharing your knowledge.

    • @Riffomonas
      @Riffomonas  2 роки тому +1

      My pleasure! Thanks for watching David 🤓

  • @NdengoMarcel
    @NdengoMarcel Рік тому +1

    This tutorial in practice is very interesting. I did manage to run the entire code but using my local latitude and longitude as you suggested. I did work. My interested variables were TMAX and PRCP. In Rwanda we do not have SNOW. Thanks a lot.

    • @Riffomonas
      @Riffomonas  Рік тому

      Wonderful! I'm glad to hear you got it working. Sorry that you all miss out on snow 😂

  • @eric13hill
    @eric13hill 2 роки тому +1

    This is my favorite video of yours. It is so useful for what I want to do. Thanks!

    • @Riffomonas
      @Riffomonas  2 роки тому

      That’s awesome to hear! What part do you find most useful?

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 роки тому +1

    This is great! I like how you build it up and have a specific goal in mind. This is also a problem any of us can tackle since the data is readily available.
    I typically write my own code for these sort of exercise (since I at least I can understand my own code) - that is how I learn best. I came up with a slightly different way of finding my "closet" weather station. I wrote a couple functions to do this - and tested the distance on Houston-Chicago and got pretty close. Here is how I tackled the problem.
    I set up two functions to run inside tidyverse - so used rlang (hence the enquo() and the bang bang !!).
    The first function converts to radians:
    radians_func %
    distinct(station) %>%
    pull(station)
    My closest station was about 500 m form my current location but has only operated for a couple of years. The filter gave me another station about 4 km away with a more extensive record. I decided to filter for stations with over 100 year record (although it is not clear what kind of record that is).
    It seems like the search should be more focused, though. What are we after? Temperature it seems. And it seems like that is the one variable most often measured.

  • @sven9r
    @sven9r 2 роки тому +1

    Great episode as always! I just ended a course about german raster data with some students :) !

    • @Riffomonas
      @Riffomonas  2 роки тому

      Awesome! As always thanks for watching 🤓

  • @djangoworldwide7925
    @djangoworldwide7925 2 роки тому +1

    + looks like a fun assignment to create a shiny dashboard containing time series plots of this data

    • @Riffomonas
      @Riffomonas  2 роки тому +1

      Yeah I’ve thought about this but I’d probably build all the plots in the backend using a cron job or something. Then serve them up with minimal JavaScript. I don’t think the overhead of shiny would really be necessary 🤷‍♂️

  •  2 роки тому +1

    Excellent! There's one station in my city!

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 роки тому +1

    I had no trouble pulling up data for my best neighborhood station. However, my question is the temperature - what is the unit? Kelvin?

    • @Riffomonas
      @Riffomonas  2 роки тому +1

      I think that was a question that is flashed in the last 5 min or so of the episode. I’ll definitely cover it in tomorrows episode

  • @zjardynliera-hood5609
    @zjardynliera-hood5609 Рік тому +2

    I love this, use the rainbow parentheses btw!!

    • @Riffomonas
      @Riffomonas  Рік тому

      Hah! I try to stick close to the defaults so beginners don't get too freaked out when they see something that looks different from their computer

  • @djangoworldwide7925
    @djangoworldwide7925 2 роки тому +1

    I might be wrong but mehh, I'm just gonna make this assumption.
    Science in a nutshell 😅
    Great tutorial sir. I always enjoy your videos since I learn so much more than what I came for (might you elaborate about top_n ? Couldn't quite grasp this one)

    • @Riffomonas
      @Riffomonas  2 роки тому

      Thanks for the question! top_n returns the n rows (plus ties) for a particular variable that have the highest value. If you give it a negative number you’ll get the smallest values. There’s also slice_min and slice_max which are a bit similar

  • @lancesnodgrass8016
    @lancesnodgrass8016 9 місяців тому

    I'm having issues finding the same website as shown in 1:45 and beyond. Any info on how the path has changed from a year ago?

    • @Riffomonas
      @Riffomonas  7 місяців тому

      I just checked it and everything was working. Perhaps the site was down when you tried.

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 роки тому +1

    Pat,
    I used vroom to read in the file and it read it fast and detected the columns. The only thing I had to do was to clean the column names.

    • @Riffomonas
      @Riffomonas  2 роки тому

      Great - I haven’t tried vroom yet

  • @kmbrahm
    @kmbrahm 2 роки тому

    TMAX looks very high, is that combining rows?

    • @kmbrahm
      @kmbrahm 2 роки тому +1

      answered my question - TMAX = Maximum temperature (tenths of degrees C)

    • @Riffomonas
      @Riffomonas  2 роки тому +1

      Good sleuthing! I’ll fix this and the precipitation in the next video 🤓

  • @ahmedmostafaahmedkamel8532
    @ahmedmostafaahmedkamel8532 Рік тому

    where is this script, please?

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 роки тому

    I must add the vroom read it in fast (lazy loading I suspect) but I not so sure about the column allocations. It seems to have created new ones with mixed type data. So be aware.

    • @Riffomonas
      @Riffomonas  2 роки тому

      Some times the simpler packages are good enough

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 роки тому

    Pat,
    Webscraping has - at least in my mind - a different meaning that what you are doing here. It uses rvest etc. It might be misleading for those looking for actual webscraping.

    • @Riffomonas
      @Riffomonas  2 роки тому

      🤷‍♂️I’m getting data from a website. It’s a form of webscraping

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 роки тому

    Must be F with errant readings...