您的位置：首页 > 编程语言 > Python开发

python库学习之re

2012-10-22 14:57 197 查看

re库 (以下内容来自Python v3.2.3 documentation)

前段时间因为要做实验当误了好长时间，从今天开始继续学习python，后面将学习一系列的库，为了加深映像所以把文档中常用的摘抄如下，希望自己在理解的基础上牢记它们。

正则表达式中"\"表示转义字符，如果要匹配一个"\"，一般情况下要用"\\\\"作为匹配符，但是用Python’s raw string 只需要写为r"\\"

特殊字符如下：

'.' this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.

'^' (Caret.) Matches the start of the string, and in MULTILINE mode also matches immediately after each newline.

'$' Matches the end of the string

'*' Causes the resulting RE to match 0 or more repetitions of the preceding RE

'+' Causes the resulting RE to match 1 or more repetitions of the preceding RE.

*?, +?, ?? The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible.Adding '?' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched.

{m} Specifies that exactly m copies of the previous RE should be matched; fewer matches cause the entire RE not to match. For example, a{6} will match exactly six 'a' characters, but not five.

{m,n} Causes the resulting RE to match from m to n repetitions of the preceding RE

{m,n}? Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as few repetitions as possible.

'\' Either escapes special characters (permitting you to match characters like '*', '?', and so forth), or signals a special sequence

[ ] Used to indicate a set of characters. In a set:Ranges of characters can be indicated by giving two characters and separating them by a '-' Characters that are not within a range can be matched by complementing the set. If the first character of the set is '^', all the characters that are not in the set will be matched

'|' A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B.

(...) Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group

\number Matches the contents of the group of the same number. Groups are numbered starting from 1. For example, (.+) \1 matches 'the the' or '55 55', but not 'the end'

\A Matches only at the start of the string.

\b Matches the empty string, but only at the beginning or end of a word.

\B Matches the empty string, but only when it is not at the beginning or end of a word. This is just the opposite of \b

\d For 8-bit (bytes) patterns: Matches any decimal digit; this is equivalent to [0-9].

\D Matches any character which is not a Unicode decimal digit. This is the opposite of \d.

\s Matches Unicode whitespace characters

\w For 8-bit (bytes) patterns: Matches characters considered alphanumeric in the ASCII character set; this is equivalent to [a-zA-Z0-9_].

\W Matches any character which is not a Unicode word character. This is the opposite of \w.

Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string

>>> re.match("c", "abcdef") # No match

>>> re.search("c", "abcdef") # Match

<_sre.SRE_Match object at ...>

Module Contents

re.compile(pattern, flags=0)

Compile a regular expression pattern into a regular expression object, which can be used for matching using its match() and search() methods

The sequence

prog = re.compile(pattern)

result = prog.match(string)

is equivalent to

result = re.match(pattern, string)

but using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.

flags取值如下：

re.I

re.IGNORECASE

Perform case-insensitive matching

re.search(pattern, string, flags=0)

Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding match object.

re.match(pattern, string, flags=0)

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object.

re.split(pattern, string, maxsplit=0, flags=0)

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings.

re.finditer(pattern, string, flags=0)

Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string.

re.sub(pattern, repl, string, count=0, flags=0)

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.

re.escape(string)

Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

Regular Expression Objects

Compiled regular expression objects support the following methods and attributes:

regex.search(string[, pos[, endpos]])

regex.match(string[, pos[, endpos]])

regex.split(string, maxsplit=0)

regex.findall(string[, pos[, endpos]])

regex.finditer(string[, pos[, endpos]])

Match Objects

match.expand(template)

match.group([group1, ...])

match.groups(default=None)

Raw String Notation

Raw string notation (r"text") keeps regular expressions sane. Without it, every backslash ('\') in a regular expression would have to be prefixed with another one to escape it.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航