python库学习之re
2012-10-22 14:57
197 查看
re库 (以下内容来自Python v3.2.3 documentation)
前段时间因为要做实验当误了好长时间,从今天开始继续学习python,后面将学习一系列的库,为了加深映像所以把文档中常用的摘抄如下,希望自己在理解的基础上牢记它们。
正则表达式中"\"表示转义字符,如果要匹配一个"\",一般情况下要用"\\\\"作为匹配符,但是用Python’s raw string 只需要写为r"\\"
特殊字符如下:
'.' this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.
'^' (Caret.) Matches the start of the string, and in MULTILINE mode also matches immediately after each newline.
'$' Matches the end of the string
'*' Causes the resulting RE to match 0 or more repetitions of the preceding RE
'+' Causes the resulting RE to match 1 or more repetitions of the preceding RE.
*?, +?, ?? The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible.Adding '?' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched.
{m} Specifies that exactly m copies of the previous RE should be matched; fewer matches cause the entire RE not to match. For example, a{6} will match exactly six 'a' characters, but not five.
{m,n} Causes the resulting RE to match from m to n repetitions of the preceding RE
{m,n}? Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as few repetitions as possible.
'\' Either escapes special characters (permitting you to match characters like '*', '?', and so forth), or signals a special sequence
[ ] Used to indicate a set of characters. In a set:Ranges of characters can be indicated by giving two characters and separating them by a '-' Characters that are not within a range can be matched by complementing the set. If the first character of the set is '^', all the characters that are not in the set will be matched
'|' A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B.
(...) Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group
\number Matches the contents of the group of the same number. Groups are numbered starting from 1. For example, (.+) \1 matches 'the the' or '55 55', but not 'the end'
\A Matches only at the start of the string.
\b Matches the empty string, but only at the beginning or end of a word.
\B Matches the empty string, but only when it is not at the beginning or end of a word. This is just the opposite of \b
\d For 8-bit (bytes) patterns: Matches any decimal digit; this is equivalent to [0-9].
\D Matches any character which is not a Unicode decimal digit. This is the opposite of \d.
\s Matches Unicode whitespace characters
\w For 8-bit (bytes) patterns: Matches characters considered alphanumeric in the ASCII character set; this is equivalent to [a-zA-Z0-9_].
\W Matches any character which is not a Unicode word character. This is the opposite of \w.
Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string
>>> re.match("c", "abcdef") # No match
>>> re.search("c", "abcdef") # Match
<_sre.SRE_Match object at ...>
Module Contents
re.compile(pattern, flags=0)
Compile a regular expression pattern into a regular expression object, which can be used for matching using its match() and search() methods
The sequence
prog = re.compile(pattern)
result = prog.match(string)
is equivalent to
result = re.match(pattern, string)
but using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.
flags取值如下:
re.I
re.IGNORECASE
Perform case-insensitive matching
re.search(pattern, string, flags=0)
Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding match object.
re.match(pattern, string, flags=0)
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object.
re.split(pattern, string, maxsplit=0, flags=0)
Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.
re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of strings.
re.finditer(pattern, string, flags=0)
Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string.
re.sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.
re.escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
Regular Expression Objects
Compiled regular expression objects support the following methods and attributes:
regex.search(string[, pos[, endpos]])
regex.match(string[, pos[, endpos]])
regex.split(string, maxsplit=0)
regex.findall(string[, pos[, endpos]])
regex.finditer(string[, pos[, endpos]])
Match Objects
match.expand(template)
match.group([group1, ...])
match.groups(default=None)
Raw String Notation
Raw string notation (r"text") keeps regular expressions sane. Without it, every backslash ('\') in a regular expression would have to be prefixed with another one to escape it.
前段时间因为要做实验当误了好长时间,从今天开始继续学习python,后面将学习一系列的库,为了加深映像所以把文档中常用的摘抄如下,希望自己在理解的基础上牢记它们。
正则表达式中"\"表示转义字符,如果要匹配一个"\",一般情况下要用"\\\\"作为匹配符,但是用Python’s raw string 只需要写为r"\\"
特殊字符如下:
'.' this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.
'^' (Caret.) Matches the start of the string, and in MULTILINE mode also matches immediately after each newline.
'$' Matches the end of the string
'*' Causes the resulting RE to match 0 or more repetitions of the preceding RE
'+' Causes the resulting RE to match 1 or more repetitions of the preceding RE.
*?, +?, ?? The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible.Adding '?' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched.
{m} Specifies that exactly m copies of the previous RE should be matched; fewer matches cause the entire RE not to match. For example, a{6} will match exactly six 'a' characters, but not five.
{m,n} Causes the resulting RE to match from m to n repetitions of the preceding RE
{m,n}? Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as few repetitions as possible.
'\' Either escapes special characters (permitting you to match characters like '*', '?', and so forth), or signals a special sequence
[ ] Used to indicate a set of characters. In a set:Ranges of characters can be indicated by giving two characters and separating them by a '-' Characters that are not within a range can be matched by complementing the set. If the first character of the set is '^', all the characters that are not in the set will be matched
'|' A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B.
(...) Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group
\number Matches the contents of the group of the same number. Groups are numbered starting from 1. For example, (.+) \1 matches 'the the' or '55 55', but not 'the end'
\A Matches only at the start of the string.
\b Matches the empty string, but only at the beginning or end of a word.
\B Matches the empty string, but only when it is not at the beginning or end of a word. This is just the opposite of \b
\d For 8-bit (bytes) patterns: Matches any decimal digit; this is equivalent to [0-9].
\D Matches any character which is not a Unicode decimal digit. This is the opposite of \d.
\s Matches Unicode whitespace characters
\w For 8-bit (bytes) patterns: Matches characters considered alphanumeric in the ASCII character set; this is equivalent to [a-zA-Z0-9_].
\W Matches any character which is not a Unicode word character. This is the opposite of \w.
Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string
>>> re.match("c", "abcdef") # No match
>>> re.search("c", "abcdef") # Match
<_sre.SRE_Match object at ...>
Module Contents
re.compile(pattern, flags=0)
Compile a regular expression pattern into a regular expression object, which can be used for matching using its match() and search() methods
The sequence
prog = re.compile(pattern)
result = prog.match(string)
is equivalent to
result = re.match(pattern, string)
but using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.
flags取值如下:
re.I
re.IGNORECASE
Perform case-insensitive matching
re.search(pattern, string, flags=0)
Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding match object.
re.match(pattern, string, flags=0)
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object.
re.split(pattern, string, maxsplit=0, flags=0)
Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.
re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of strings.
re.finditer(pattern, string, flags=0)
Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string.
re.sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.
re.escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
Regular Expression Objects
Compiled regular expression objects support the following methods and attributes:
regex.search(string[, pos[, endpos]])
regex.match(string[, pos[, endpos]])
regex.split(string, maxsplit=0)
regex.findall(string[, pos[, endpos]])
regex.finditer(string[, pos[, endpos]])
Match Objects
match.expand(template)
match.group([group1, ...])
match.groups(default=None)
Raw String Notation
Raw string notation (r"text") keeps regular expressions sane. Without it, every backslash ('\') in a regular expression would have to be prefixed with another one to escape it.
相关文章推荐
- python 学习笔记re
- Python中re(正则表达式)模块学习
- Python3.x学习笔记[2.5]灵活使用urllib与re
- python中re正则表达式模块学习
- Python学习笔记6-Python中re(正则表达式)模块学习
- Python模块学习 re 正则表达式
- 【学习python】re 正则表达式匹配特定词性的conll,提取句子主干(主谓宾)
- Python中re(正则表达式)模块学习
- Python中re(正则表达式)模块学习
- Python基础学习之re正则表达式
- python标准库学习笔记01--re
- Python学习笔记6-Python中re(正则表达式)模块学习
- Python中re(正则表达式)模块学习
- python基础学习——利用requests与re来动态爬取淘宝网商品信息
- Python模块学习 ---- re 正则表达式
- Python中re(正则表达式)模块函数学习
- python基础教程_学习笔记14:标准库:一些最爱——re
- Python学习笔记--正则表达式,re模块
- Python中re(正则表达式)模块函数学习
- python学习笔记(re module && os)