您的位置:首页 > 其它

R packages: dplyr拾遗

2016-04-18 09:36 471 查看
dplyr是HW的数据处理的利器

dplyr的介绍dplyr

备忘:

distinct去重

distinct(select(flights, tailnum))
#> Source: local data frame [4,044 x 1]
#>
#>    tailnum
#>      (chr)
#> 1   N14228
#> 2   N24211
#> 3   N619AA
#> 4   N804JB
#> ..     ...


sample_n; sample_frac抽样

sample_n(flights, 10)
#> Source: local data frame [10 x 16]
#>
#>     year month   day dep_time dep_delay arr_time arr_delay carrier tailnum
#>    (int) (int) (int)    (int)     (dbl)    (int)     (dbl)   (chr)   (chr)
#> 1   2013     6    12     1428        -7     1733       -17      DL  N370NW
#> 2   2013     3    23      600        -7      955        23      UA  N510UA
#> 3   2013     3    29     1814        14     1920         3      B6  N238JB
#> 4   2013     2    25     1957       -10     2156       -10      EV  N19966
#> ..   ...   ...   ...      ...       ...      ...       ...     ...     ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#>   (dbl), distance (dbl), hour (dbl), minute (dbl)
sample_frac(flights, 0.01)
#> Source: local data frame [3,368 x 16]
#>
#>     year month   day dep_time dep_delay arr_time arr_delay carrier tailnum
#>    (int) (int) (int)    (int)     (dbl)    (int)     (dbl)   (chr)   (chr)
#> 1   2013     5     7     1156        -4     1321       -17      UA  N432UA
#> 2   2013     4    18     1543        -5     1755       -25      DL  N369NB
#> 3   2013     3    26     1408        -7     1623        -8      DL  N344NB
#> 4   2013     1    24     2001        66     2211        71      MQ  N526MQ
#> ..   ...   ...   ...      ...       ...      ...       ...     ...     ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#>   (dbl), distance (dbl), hour (dbl), minute (dbl)


top_n

排名前n的

lag

实现diff类的功能

tally

summarise(n())
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  dplyr