您的位置:首页 > 其它

Working with regular expressions in .NET

2004-09-10 14:32 387 查看
What are regular expressions?

If you have ever used wildcards before e.g. when trying to find a file on your computer then unwittingly you have used regular expressions. I like to think that regular expressions simply allow you to manipulate strings easier. Regular expressions consist of a lot more than just wildcards though; we will look at more complex things in a minute.

Why would I want to use them?

Regular expressions are extremely useful when you want to extract useful information from a load of junk. Say someone wrote a sentence similar this:

I told Bob that I would text him on his mobile phone. His number is 07589123654 and my number is 07891264853.

And you need to extract all phone numbers from a sentence similar to that. If you haven’t met regular expressions before you would automatically think its either not possible or is very complicated and would involve lots of Mid()’s and Int32.Parse’s

So how would I go about using them?

First you will need to know the basics. There are two things that you will need to use Regular Expressions in .NET and those are:

An Input string (in our case the string above about bob)
A search pattern (the things that tells the regular expressions class what to look for)

The only tricky bit is the search pattern, they can sometimes become quite complex (depending on what your trying to extract). To build up a search pattern we will have to do some research first.

Finding literal content

This is the easiest thing you will learn, simply because you already know it! To find literal content you just write the literal content into the search pattern and that’s it.

Finding numbers

Finding numbers is a little trickier but it’s easy once you know how. Say our input string was “My phone number begins with a 0 and ends in a 8”. We would use the following search pattern:

“[0-9]”

This would return the following:
0
8

Characters wrapped around [ ]’s indicate a range. This could be from A-Z which would be [A-Z], it could be C to F which would be [C-F] or in our case 0 to 9 which is [0-9].

Important: Ranges are case sensitive… I.e. [a-z] would only return lower case letters and [A-Z] would only return upper case letters

This approach works perfectly if its just a single character but to allow multiple characters we need to use one more method. Say our input string was “My hotel room is 1045A.” We couldn’t use a range on its own because it would return each number separately; it will not return them in the same match. So we need to use something similar to this for our search pattern:

“[0-9]{4}”

An integer that’s wrapped in { }’s specifies the length of the range. To explain more we will use an example:

Input String: “My hotel room is 1045A.”

Search Pattern: “[0-9]”
Returns:
1
0
4
5

Input String: “My hotel room is 1045A.”

Search pattern: “[0-9]{4}”

Returns:
1045

I hope that makes it a bit clearer.

Putting it all together

Now going back to extracting the phone numbers from the sentence at the start of the article, we will need to combine all of the preceding techniques to extract the phone numbers.

All valid UK Mobile Phone numbers begin with ‘07’, so we need to make sure that all the matches returned begin with ‘07’. ‘07’ is literal content so our Search pattern so far is simply “07”

Valid UK Mobile Phone numbers are 11 numbers long (including the ‘07’). So now we need to make sure that all the matches have 9 numbers following the ‘07’. This means our search string will be “07[0-9]{9}”.

Sure enough this extracts all Valid UK Mobile Phone numbers. Now we will discuss how to use Regex (Regular expressions) via .NET.

Accessing Regular Expressions in NET

Regular expressions are used in .NET via the System.Text.RegularExpressions Namespace.

> Create a new console application and import the System.Text.RegularExpressions by adding “Imports System.Text.RegularExpressions” to the top of the module.

Results from our search are returned into a ‘MatchCollection’. From this ‘MatchCollection’ we can loop through and retrieve individual matches. Let’s get our hands on some code now…

Source Code:

Sub Main()
'Make a new Regex provider
Dim objRegex As Regex

'This object is what our matches will be put into
Dim objMatches As MatchCollection

'This object is used to referance to an individual match
Dim m As Match

'This string will be our search pattern
Dim strSearchPattern As String

'This string will be our input string
Dim strInput As String

'Set the input string
strInput = "I told Bob that I would text him on his mobile phone. " &_
"His number is 07589123654 and my number is 07891264853."

'Set the search pattern
strSearchPattern = "07[0-9]{9}"

'Tell Regex our search pattern
objRegex = New Regex(strSearchPattern)

'Fill the matches full of valid phone numbers
objMatches = objRegex.Matches(strInput)

'Loop through the phone numbers that were found
For Each m In objMatches
'Write the valid phone number onto the console
Console.WriteLine(m.Value.ToString & ControlChars.CrLf)
Next

Console.ReadLine()
End Sub

That’s the basics of regular expressions, there is a lot more to learn about though. I have only scratched the surface of the syntax etc. If you search google for “Regular Expressions” then you will find a lot more info then I have provided. I will also post a sample app that demonstrates regular expressions to help you some more.

Hope you have enjoyed it, Martin.

This Article was written on 3/19/2004 3:40:00 PM by Martin
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: