工作中常用的R语言函数(持续更新中……)
2017-02-15 11:04
387 查看
1、日期数据生成
seq(as.Date("2015/12/14"),by="week", length.out=62) #按周增长
seq(as.Date("2015/12/14"),by="3 days", length.out=62) #按天增长
2、检查数据的函数
> str(tsdata_tmp)
'data.frame': 1116 obs.of 6 variables:
$ corname : chr "日本" "日本" "日本" "日本"...
$ cityname : chr "东京" "东京" "东京" "东京"...
$ date : chr "2015-12-21" "2015-12-28" "2016-01-04""2016-01-11" ...
$ weeknum : int 1 2 3 4 5 6 7 8 9 1 ...
$ ciiquantity: int 9386 8521 5224 7770 10610 12100 11413 1569510926 309 ...
$ y_stlf : num 8593 8312 6515 7452 7965 ...
> str(advancedbooking_tmp)
'data.frame': 1539 obs.of 5 variables:
$ cityid : int 228 228 228 228 228 228 228 228 228 228 ...
$ cityname : chr "东京" "东京" "东京" "东京"...
$ date : chr "2015/11/30" "2015/11/30" "2015/11/30""2015/11/30" ...
$ weeknum : int 3 1 2 5 6 4 8 7 6 3 ...
$ ciiquantity: int 0 0 0 0 0 0 0 0 0 0 ...
3、R建模常用的构造时间特征的函数
library(lubridate)
DataSet$quarter<- quarter(DataSet$date)
DataSet$month<- month(DataSet$date)
DataSet$week <- week(DataSet$date) #一年的第几周
isoweek('2017-01-01') #一年的第几周
?lubridate::week #查看帮助
4、Merging Data
Adding Columns
Tomerge two data frames (datasets) horizontally,
use the merge function.
In mostcases, you join two data frames by one or more common key variables(i.e., an inner join).
# merge two dataframes by ID
total <-merge(data frameA,data frameB,by="ID") #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
# merge two dataframes by ID and Country
total <-merge(data frameA,data frameB,by=c("ID","Country")) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
Inner join: merge(df1, df2) will
work for these examples because R automatically joins theframes by common variable names, but you would most likely want to specify merge(df1, df2,by="CustomerId") tomake
sure that you were matching on only the fields you desired. You canalso use the by.x and by.y parameters if the matching variables have differentnames in the different data frames.
Outer join: merge(x
= df1, y = df2, by ="CustomerId", all = TRUE) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
Left outer: merge(x
= df1, y = df2, by ="CustomerId", all.x=TRUE) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
Right outer: merge(x
= df1, y = df2, by ="CustomerId", all.y=TRUE) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
Cross join: merge(x
= df1, y = df2, by =NULL) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
4、数据框值更改
将数据框的某一列大于0的数,用同一行另一列的值替换,可以如下处理:
output_results[output_results$pred<0,][,'pred'] <-output_results[output_results$pred<0,][,'act_quantity'] #对负值进行处理
seq(as.Date("2015/12/14"),by="week", length.out=62) #按周增长
seq(as.Date("2015/12/14"),by="3 days", length.out=62) #按天增长
2、检查数据的函数
> str(tsdata_tmp)
'data.frame': 1116 obs.of 6 variables:
$ corname : chr "日本" "日本" "日本" "日本"...
$ cityname : chr "东京" "东京" "东京" "东京"...
$ date : chr "2015-12-21" "2015-12-28" "2016-01-04""2016-01-11" ...
$ weeknum : int 1 2 3 4 5 6 7 8 9 1 ...
$ ciiquantity: int 9386 8521 5224 7770 10610 12100 11413 1569510926 309 ...
$ y_stlf : num 8593 8312 6515 7452 7965 ...
> str(advancedbooking_tmp)
'data.frame': 1539 obs.of 5 variables:
$ cityid : int 228 228 228 228 228 228 228 228 228 228 ...
$ cityname : chr "东京" "东京" "东京" "东京"...
$ date : chr "2015/11/30" "2015/11/30" "2015/11/30""2015/11/30" ...
$ weeknum : int 3 1 2 5 6 4 8 7 6 3 ...
$ ciiquantity: int 0 0 0 0 0 0 0 0 0 0 ...
3、R建模常用的构造时间特征的函数
library(lubridate)
DataSet$quarter<- quarter(DataSet$date)
DataSet$month<- month(DataSet$date)
DataSet$week <- week(DataSet$date) #一年的第几周
isoweek('2017-01-01') #一年的第几周
?lubridate::week #查看帮助
4、Merging Data
Adding Columns
Tomerge two data frames (datasets) horizontally,
use the merge function.
In mostcases, you join two data frames by one or more common key variables(i.e., an inner join).
# merge two dataframes by ID
total <-merge(data frameA,data frameB,by="ID") #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
# merge two dataframes by ID and Country
total <-merge(data frameA,data frameB,by=c("ID","Country")) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
Inner join: merge(df1, df2) will
work for these examples because R automatically joins theframes by common variable names, but you would most likely want to specify merge(df1, df2,by="CustomerId") tomake
sure that you were matching on only the fields you desired. You canalso use the by.x and by.y parameters if the matching variables have differentnames in the different data frames.
Outer join: merge(x
= df1, y = df2, by ="CustomerId", all = TRUE) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
Left outer: merge(x
= df1, y = df2, by ="CustomerId", all.x=TRUE) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
Right outer: merge(x
= df1, y = df2, by ="CustomerId", all.y=TRUE) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
Cross join: merge(x
= df1, y = df2, by =NULL) #by指定的列中的值必须是唯一的,不能重复出现两行有相同的ID
4、数据框值更改
将数据框的某一列大于0的数,用同一行另一列的值替换,可以如下处理:
output_results[output_results$pred<0,][,'pred'] <-output_results[output_results$pred<0,][,'act_quantity'] #对负值进行处理
相关文章推荐
- 工作中常用到的Linux命令(持续更新)
- 工作中常用的SQL Server操作<持续更新>
- 开发工作中常用的linux命令(持续更新)
- 工作笔记(五):常用jquery函数以及操作,持续更新..
- Linux工作中常用到的一些命令(持续更新)常用的
- 工作常用svn命令-持续更新
- STL中常用的一些算法函数[持续更新
- perl常用函数-持续更新中
- Linux 常用软件列表[持续更新中]
- .net常用面试核心技术点(持续更新中)
- 反射常用功能-持续更新
- Eclipse常用插件(持续更新中)更新地址
- 常用速查地址, 持续更新
- Ubuntu下常用软件[持续更新]
- 常用游戏/程序设计网站(持续更新)
- linux常用命令(持续更新中)
- Ubuntu常用软件安装,持续更新中。。。
- jsf几个常用取值方法和技巧总结(持续更新)
- 常用命令-持续更新
- 工作中几条小经验整理(持续更新)