R Basics: How to Use filter() to Select Rows Based on Column Values

Поділитися
Вставка
  • Опубліковано 22 січ 2025

КОМЕНТАРІ • 21

  • @tomhenry-datasciencewithr6047
    @tomhenry-datasciencewithr6047  2 роки тому

    🎉 Subscribe if you want more videos like this! - ua-cam.com/channels/b5aI-GwJm3ZxlwtCsLu78Q.html
    😃 Comment below if you have questions about how to use filter()!

  • @lefterisparasyris808
    @lefterisparasyris808 Рік тому +2

    How does the code change if the columns have text instead of numbers? I want to select the rows that contain data for specific forest areas, which are marked with text in the data frame.

  • @bukolaadebayo1891
    @bukolaadebayo1891 2 роки тому

    Thanks! Very explanatory and super simple to understand.

  • @henrytadja7327
    @henrytadja7327 2 роки тому +1

    Hey sir, i was wondering if you know how to remove only a specific percentage of row where we have a specific value. For exemple we have 10 rows with a value of 1 and we want to remove half of those row

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  2 роки тому

      What is your goal? Depending on what you are aiming to do, there may be more specific methods. However, a general approach works like this:
      your_data %>%
      group_by(specific_value) %>%
      sample_frac(0.5) %>%
      ungroup()

    • @henrytadja7327
      @henrytadja7327 2 роки тому +1

      @@tomhenry-datasciencewithr6047 Hey Tom thank you for your answer. I try but it din't work. Instead i used the following approach!
      tmp_data % filter(x == ma_valeur) %>% slice_sample(prop = 0.5)
      data %>% anti_join(tmp_data)

  • @ogclinton4780
    @ogclinton4780 Рік тому

    please how do i filter rows with character data types. For example, rows that have total in them from a dataset

  • @AgneKif
    @AgneKif Рік тому +1

    Hi how do you filter if you want for exemple rows where depth is BETWEEN 100-150?

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  Рік тому

      To filter rows in R where depth is between 100 and 150, you can use:
      library(tidyverse) # assuming you've loaded this earlier
      filtered_data % filter(depth >= 100 & depth = 100 & depth < 150) # if you don't want to include 150
      For more information on the filter function and other data manipulation tools in R, check out the book "R for Data Science" by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund, available online at r4ds.hadley.nz/data-transform#filter :)

  • @JC-wt5fw
    @JC-wt5fw 2 роки тому +1

    Thanks for your help. How would you filter with percentages? Like say if I want the top and bottom 1% of people and discard that data how would I be able to do that? Thank you very much

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  2 роки тому +1

      It depends on the exact scenario.
      Assuming you have one row per person (i.e., not multiple rows per person), you can do it like this:
      your_data %
      mutate(variable_of_interest.percentile = percent_rank(variable_of_interest))
      your_data %>% filter(variable_of_interest.percentile % filter(variable_of_interest.percentile > 0.99) # top 1%

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  2 роки тому +1

      If you are interested only in excluding the top and bottom 1%, you could do
      your_data %>% filter(variable_of_interest.percentile >= 0.01 & variable_of_interest.percentile < 0.99)
      **However** - my guess is that you are trying to filter out outliers - is that right?
      If so, depending on your situation I would usually recommend two alternatives:
      1. retaining outliers but using a technique that is more resilient to outliers (e.g., quantile regression using the quantreg package - basically quantile regression optimizes for the median, whereas regular linear regression optimizes for the mean)
      e.g.
      # install.packages("quantreg")
      library(quantreg)
      rq(y ~ height + age, data = ...)
      (same kind of interface as linear regression)
      2. figuring out non-arbitrary absolute cut-offs - e.g. if you know that a customer spending more than a certain amount is unrealistic or a patient will never have a lab value less than a particular amount, then just filter out records where they exceed those values.
      The reason is that in many datasets, the top 1% (say) may have some invalid records but also a lot of valid records, so it's best to be explicit about your inclusion/exclusion criteria rather than using a % criteria for exclusion, where possible.

    • @JC-wt5fw
      @JC-wt5fw 2 роки тому

      @@tomhenry-datasciencewithr6047 Wow thank you so much for your help! Amazing!!! Love from the UK

    • @JC-wt5fw
      @JC-wt5fw 2 роки тому +1

      @@tomhenry-datasciencewithr6047 You got it exactly right, I needed to get rid of outliers and I will take these better suggestions from you on board. Thank you so much

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  2 роки тому

      @@JC-wt5fw Hope it goes well! :)

  • @crazyneon285
    @crazyneon285 9 місяців тому

    My output keeps saying "object __ not found"; Error in "select ()" I don't understand, I rewrote everything

  • @cheriseregier4729
    @cheriseregier4729 2 роки тому +1

    Thanks for this video Tom. Do you know how to filter the data with multiple conditions. So if X=1 then filter y>2000 AND if X=2 then filter Y>3000 AND if X=3 then filter Y>4000. Thanks for your help.

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  2 роки тому

      You can do this like so:
      your_data %>%
      filter((x == 1 & y > 2000) | (x == 2 & y > 3000) | (x == 3 & y > 4000))
      (OR is the `|` symbol, and AND is the `&` symbol).
      There are other ways of doing it which depend on your use-cases and which could be better, but this is the general method.

    • @cheriseregier4729
      @cheriseregier4729 2 роки тому

      @@tomhenry-datasciencewithr6047 This is SO helpful. Thanks Tom. I look forward to watching more of your videos. They are super clear!

  • @hornytholigist
    @hornytholigist 2 роки тому +1

    Appreciate your videos!