您的位置:首页 > 其它

Get data out of excel and into R with readxl

2015-04-16 08:44 513 查看
(This article was first published on  RStudio Blog, and kindly contributed to
R-bloggers)     

I’m pleased to announced that the first version of readxl is now available on CRAN. Readxl makes it easy to get tabular data out of excel. It:
Supports both the legacy
.xls
format and the modern xml-based
.xlsx
format.
.xls
support is made possible the with
libxls C library, which abstracts away many of the complexities of the underlying binary format. To parse
.xlsx
, we use the insanely fast
RapidXML C++ library.
Has no external dependencies so it’s easy to use on all platforms.
Re-encodes non-ASCII characters to UTF-8.
Loads datetimes into POSIXct columns. Both Windows (1900) and Mac (1904) date specifications are processed correctly.
Blank columns are automatically dropped.
Returns output with class
c("tbl_df", "tbl", "data.frame")
so if you also use

dplyr you’ll get an enhanced print method (i.e. you’ll see just the first ten rows, not the first 10,000!).

You can install it by running:
install.packages("readxl")

There’s not really much to say about how to use it:
library(readxl)
# Use a excel file included in the package
sample <- system.file("extdata", "datasets.xlsx", package = "readxl")

# Read by position
head(read_excel(sample, 2))
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

# Or by name:
excel_sheets(sample)
#> [1] "iris"     "mtcars"   "chickwts" "quakes"
head(read_excel(sample, "mtcars"))
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

You can see the documentation for more info on the
col_names
,
col_types
and
na
arguments. Readxl is still under active development. If you have problems loading a dataset, please try the
development version, and if that doesn’t work,

file an issue.

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐