R文本处理
2016-05-10 20:56
316 查看
最近写了一些代码,用R处理文本格式。就写一些我最近用的一些处理字符串的函数
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE)
value=FALSE表示的是grep()返回的是所匹配字符串的位置 value=TRUE返回是所要找的字符串。perl=TRUE代表可以用适合perl的正则表达式
fixed=TRUE表示pattern is a string to be matched as is,use exact matching.
> grep("[aeiou]",c("apple","banana","peak"))
[1] 1 2 3
> grep("[aeio]",c("apple","banana","peak","unit","bbddff"),value = TRUE)
[1] "apple" "banana" "peak" "unit"
用正则表达式时,如果搜索的结果是字符如*. ,则需要用\\来表示 如"\\*"来表示*
2,字符串的替代函数
sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)只替代第一个
gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE) 能匹配的都被替代
3,字符数的统计和翻译
nchar(x, type = c("bytes", "chars", "width"), allowNA = FALSE)
例如:
nchar(c("apple","banana","peak","unit","NA"),type = "width",allowNA = TRUE)
[1] 5 6 4 4 2
另外三个函数用法也很简单:
> DNA <- "AtGCtttACC"
> tolower(DNA)
[1] "atgctttacc"
> toupper(DNA)
[1] "ATGCTTTACC"
> chartr("Tt", "Uu", DNA)
[1] "AuGCuuuACC"
> chartr("Tt", "UU", DNA)
[1] "AUGCUUUACC"
4,字符串的连接
paste (x,y, sep = " ", collapse = NULL) sep参数是设置连接每个x[1]和y[1]之间的方式,collapse参数设置的是x[1]y[1]和x[2]y[2]之间连接的方式
> paste("A", 1:6,collapse = ".")
[1] "A 1.A 2.A 3.A 4.A 5.A 6"
> paste("A", 1:6, sep = " ")
[1] "A 1" "A 2" "A 3" "A 4" "A 5" "A 6"
5,字符串的拆分
strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE)
value=FALSE表示的是grep()返回的是所匹配字符串的位置 value=TRUE返回是所要找的字符串。perl=TRUE代表可以用适合perl的正则表达式
fixed=TRUE表示pattern is a string to be matched as is,use exact matching.
> grep("[aeiou]",c("apple","banana","peak"))
[1] 1 2 3
> grep("[aeio]",c("apple","banana","peak","unit","bbddff"),value = TRUE)
[1] "apple" "banana" "peak" "unit"
用正则表达式时,如果搜索的结果是字符如*. ,则需要用\\来表示 如"\\*"来表示*
2,字符串的替代函数
sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)只替代第一个
gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE) 能匹配的都被替代
3,字符数的统计和翻译
nchar(x, type = c("bytes", "chars", "width"), allowNA = FALSE)
例如:
nchar(c("apple","banana","peak","unit","NA"),type = "width",allowNA = TRUE)
[1] 5 6 4 4 2
另外三个函数用法也很简单:
> DNA <- "AtGCtttACC"
> tolower(DNA)
[1] "atgctttacc"
> toupper(DNA)
[1] "ATGCTTTACC"
> chartr("Tt", "Uu", DNA)
[1] "AuGCuuuACC"
> chartr("Tt", "UU", DNA)
[1] "AUGCUUUACC"
4,字符串的连接
paste (x,y, sep = " ", collapse = NULL) sep参数是设置连接每个x[1]和y[1]之间的方式,collapse参数设置的是x[1]y[1]和x[2]y[2]之间连接的方式
> paste("A", 1:6,collapse = ".")
[1] "A 1.A 2.A 3.A 4.A 5.A 6"
> paste("A", 1:6, sep = " ")
[1] "A 1" "A 2" "A 3" "A 4" "A 5" "A 6"
5,字符串的拆分
strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
相关文章推荐
- UISegmentedControl
- bzoj 2298: [HAOI2011]problem a
- 用JAXP对xml文档进行DOM编程
- opencv-python 摄像头的简单应用
- html5绘制文字
- bzoj 2190: [SDOI2008]仪仗队
- 全文索引----中文分词器mmseg4j
- C#目录操作 Path类与Directory类
- NYOJ - 171 聪明的kk
- CMake入门指南-编译教程
- Session里存的密码或其他信息如何获取。
- bzoj 4403: 序列统计
- Qt CS架构 客户端代码编写技巧 QTcpSocket
- 程序员日常
- lintcode_单词切分
- 树——判断是否为平衡二叉树
- int main(int argc,char* argv[]),int main(int argc,char** argv)
- linux启动是自动加载的几个文件说明bashrc等 .
- C# + WinForm + EmguCV 学习二:创建和操作图片
- C语言 旋转数组的最小数字