Hadley Wickham's "dplyr" tutorial at useR 2014 (2/2)
Вставка
- Опубліковано 10 лют 2025
- Part 2/2 of the dplyr workshop held at UCLA during the useR 2014 conference.
dplyr is the premier data manipulation tool for data analysts who work in the R language. This package makes it easier than ever to sort, manage, and clean your dirty data with speed and efficiency.
Visit Hadley's Github at github.com/had... for more information, and also check out other related packages at www.Rstudio.com.
topics covered: Grouped Mutate/Filter, Joins, Do, Databases
Scripts and data from this tutorial can be accessed here www.dropbox.co...
Excited about this video but I am having two initial problems. The nycflights2013 data.frame does not have a plane column. It has a tailnum colum, which appears to be the same thing, but some renaming needs to be done. Also, when I run the code at 1:30 I get an error "Error in n() : This function should not be called directly" I'm not sure what this is about. I am running R 3.1.1 in RStudio 0.98.1028 on OSX .
dropbox does not work, i also could not find the airports dataset.
Malory Knox Its here www.dropbox.com/sh/i8qnluwmuieicxc/AAAgt9tIKoIm7WZKIyK25lh6a and you can find all the datasets.
Darn. Ignore second problem. User error. Very sorry I posted before checking more carefully.
I think the z score part is not correct.
Should be like this:
planes_z %
filter(!is.na(arr_delay)) %>%
group_by(plane) %>%
filter(n() >30) %>%
mutate(z_delay =
(arr_delay - mean (arr_delay))/sd(arr_delay)) %>%
filter(z_delay >=3) %>%
select(plane, z_delay) %>%
arrange(desc(z_delay))
View(planes_z)