您的位置:首页 > 其它

perl正则表达式递归

2011-02-22 12:32 134 查看
今天在chinaunix上看到有个贴,问

设最外层括号为第 1 层,请问怎么样能够除去 1 对第 2 层的括号,保留其他括号?

例如:

(((1,2),3),4) => ((1,2),3,4)

((1,2),(3,4)) => ((1,2),3,4)

or

(1,2,(3,4))

(1,(2,(3,4))) => (1,2,(3,4))

解决方案一:

#!/bin/env perl
use strict;
use warnings;
use 5.010;
while (my $str = <DATA>)
{
chomp $str;
print "$str => ";
my @stack;
foreach (0 .. 1)
{
$str =~ /(/()(?=((?:[^()]|(?1)(?2))+(/))))/g;
push(@stack, [$-[1], $-[3]]);
}
substr($str, $stack[1][1], 1) = "";
substr($str, $stack[1][0], 1) = "";
print "$str/n";
}
__DATA__
(((1,2),3),4)
((1,2),(3,4))
(1,(2,(3,4)))


解决方案二:
my $balance = qr/(/((?:[^()]++|(?-1))*+/))*/;
my $innerRe = qr/(?:[^()]*?$balance)*/;
while( <DATA> ){
chomp;
print;
if( s/^(/([^()]*?)/(($innerRe)/)/$1$2/ ){
print " => $_";
}
print "/n";
}
__DATA__
(((1,2),3),4)
((1,2),(3,4))
(1,(2,(3,4)))
(((((1,2),3),4),(5,6)))


$str =~ /

(/() # 分组1: $1匹配左括号

(?= # 整体是1个环视,这样,第1次匹配成功会从第1个左括号开始,第2个次匹配成功会从第2个左括号开始,以此类推

( # 分组2: $2匹配括号里的内容加上$3

(?: # 分组不捕获

[^()] # 要么不包括括号

|

(?1)(?2) # 要么是分组1加上分组2的递归

)+

(/)) # 分组3:$3匹配右括号

)

)

/xg;

————————————————————分割线————————————————————
http://perldoc.perl.org/perlre.html上有介绍perl 5.10以上的正则表达式新特性

(?PARNO)

(?-PARNO)

(?+PARNO)

(?R)

(?0)



Similar to
(
??
{
code
}
)

except it does not involve compiling any code,
instead it treats the contents of a capture buffer as an independent
pattern that must match at the current position. Capture buffers
contained by the pattern will have the value as determined by the
outermost recursion.

PARNO is a sequence of digits (not starting with 0) whose value reflects
the paren-number of the capture buffer to recurse to.
(?R)

recurses to
the beginning of the whole pattern.
(?0)

is an alternate syntax for
(?R)

. If PARNO is preceded by a plus or minus sign then it is assumed
to be relative, with negative numbers indicating preceding capture buffers
and positive ones following. Thus
(?-1)

refers to the most recently
declared buffer, and
(?+1)

indicates the next buffer to be declared.
Note that the counting for relative recursion differs from that of
relative backreferences, in that with recursion unclosed buffers are
included.

The following pattern matches a function foo() which may contain
balanced parentheses as the argument.

$re
= qr{ (                    # paren group 1 (full function)

foo

(                  # paren group 2 (parens)

/(

(              # paren group 3 (contents of parens)

(?:

(?> [^()]+ )  # Non-parens without backtracking

|

(?2)          # Recurse to start of paren group 2

)*

)

/)

)

)

}x
;

If the pattern was used as follows

'foo(bar(baz)+baz(bop))'
=~/$re/

and print
"/$1 = $1/n"
,

"/$2 = $2/n"
,

"/$3 = $3/n"
;

the output produced should be the following:

$1 = foo(bar(baz)+baz(bop))

$2 = (bar(baz)+baz(bop))

$3 = bar(baz)+baz(bop)

If there is no corresponding capture buffer defined, then it is a
fatal error. Recursing deeper than 50 times without consuming any input
string will also result in a fatal error. The maximum depth is compiled
into perl, so changing it requires a custom build.

The following shows how using negative indexing can make it
easier to embed recursive patterns inside of a
qr//

construct
for later use:

my
$parens
= qr/(/((?:[^()]++|(?-1))*+/))/
;

if (
/foo $parens /s+ + /s+ bar $parens/x
)
{

# do something here...

}

Note
that this pattern does not behave the same way as the equivalent
PCRE or Python construct of the same form. In Perl you can backtrack into
a recursed group, in PCRE and Python the recursed into group is treated
as atomic. Also, modifiers are resolved at compile time, so constructs
like (?i:(?1)) or (?:(?i)(?1)) do not affect how the sub-pattern will
be processed.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: