您的位置:首页 > 编程语言 > Python开发

python 正则表达式

2016-01-27 11:05 393 查看
在python中re模块支持正则表达式,主要的方法有:

search()

match()

findall()

compile()

sub()

subn()

group()

groups()

split()

complile

预编译正则表达式,编译调用的是方法,不编译,使用函数,但是名字都是相同的,如:

reg = r’(\d+)-(\d{3})-(\w+)’

pattern = re.compile(reg)

match()

匹配,从字符串的开头来对模式进行匹配

3.search

会扫描整个字符串并返回第一个成功的匹配

4.findall

返回所有的结果

Example:

1. 在开头,都能匹配到

html = """123-456-abc-ert
1116-66666-hhhh-sdsssssss
"""
reg = r'(\d+)-(\d+)-(\w+)'
pattern  = re.compile(reg)
print "match:"
aaa = re.match(pattern ,html)
if aaa is not None:
print aaa.groups()
print "search:"
bbb = re.search(pattern,html)
print bbb.groups()
list = re.findall(pattern,html)
print "findall:"
print list


结果:

match:

(‘123’, ‘456’, ‘abc’)

search:

(‘123’, ‘456’, ‘abc’)

findall:

[(‘123’, ‘456’, ‘abc’), (‘1116’, ‘66666’, ‘hhhh’)]

2.

html = """:123-456-abc-ert
1116-66666-hhhh-sdsssssss
"""
reg = r'(\d+)-(\d+)-(\w+)'
pattern  = re.compile(reg)
print "match:"
aaa = re.match(pattern ,html)
if aaa is not None:
print aaa.groups()
print "search:"
bbb = re.search(pattern,html)
print bbb.groups()
list = re.findall(pattern,html)
print "findall:"
print list


结果:

match:

search:

(‘123’, ‘456’, ‘abc’)

findall:

[(‘123’, ‘456’, ‘abc’), (‘1116’, ‘66666’, ‘hhhh’)]

3.group() groups()

html = """123-456-abc-ert
1116-66666-hhhh-sdsssssss
"""
# reg = 'href=([\"|\']{0,1})([^javascript]*?)([\"|\'|>])'
reg = r'(\d+)-(\d+)-(\w+)'
pattern  = re.compile(reg)
print "match:"
aaa = re.match(pattern ,html)
print "group"
print aaa.group()
print "group 1"
print aaa.group(1)
print "groups"
print aaa.groups()


输出结果:

group

123-456-abc

group 1

123

groups

(‘123’, ‘456’, ‘abc’)

注:group() 输出的是匹配的整个字符串

group(num)输出第几个子组,也就是第几个括号匹配的

groups() 输出的是所有的子组的touple

4.sub和subn()进行搜索和替换

html = """123-456-abc-ert
1116-66666-hhhh-sdsssssss
"""
# reg = 'href=([\"|\']{0,1})([^javascript]*?)([\"|\'|>])'
reg = r'\d+?'
pattern  = re.compile(reg)
print "match:"
aaa = re.sub(r"(\d+?)-(\d{3})","zhu",html)
print aaa


输出结果:

match:

zhu-abc-ert

zhu66-hhhh-sdsssssss

5.split

html = """123-456-abc-ert
1116-66666-hhhh-sdsssssss
"""
# reg = 'href=([\"|\']{0,1})([^javascript]*?)([\"|\'|>])'
reg = r'\d+?'
pattern  = re.compile(reg)
print "match:"
aaa = re.sub(r"(\d+?)-(\d{3})","zhu",html)
print re.split(r"(\d{3})",html)


输出结果:

[”, ‘123’, ‘-‘, ‘456’, ‘-abc-ert\n ‘, ‘111’, ‘6-‘, ‘666’, ‘66-hhhh-sdsssssss\n ‘]

按照正则表达式来分隔
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: