Working with regular expressions in .NET
2004-09-10 14:32
387 查看
What are regular expressions?
If you have ever used wildcards before e.g. when trying to find a file on your computer then unwittingly you have used regular expressions. I like to think that regular expressions simply allow you to manipulate strings easier. Regular expressions consist of a lot more than just wildcards though; we will look at more complex things in a minute.
Why would I want to use them?
Regular expressions are extremely useful when you want to extract useful information from a load of junk. Say someone wrote a sentence similar this:
I told Bob that I would text him on his mobile phone. His number is 07589123654 and my number is 07891264853.
And you need to extract all phone numbers from a sentence similar to that. If you haven’t met regular expressions before you would automatically think its either not possible or is very complicated and would involve lots of Mid()’s and Int32.Parse’s
So how would I go about using them?
First you will need to know the basics. There are two things that you will need to use Regular Expressions in .NET and those are:
An Input string (in our case the string above about bob)
A search pattern (the things that tells the regular expressions class what to look for)
The only tricky bit is the search pattern, they can sometimes become quite complex (depending on what your trying to extract). To build up a search pattern we will have to do some research first.
Finding literal content
This is the easiest thing you will learn, simply because you already know it! To find literal content you just write the literal content into the search pattern and that’s it.
Finding numbers
Finding numbers is a little trickier but it’s easy once you know how. Say our input string was “My phone number begins with a 0 and ends in a 8”. We would use the following search pattern:
“[0-9]”
This would return the following:
0
8
Characters wrapped around [ ]’s indicate a range. This could be from A-Z which would be [A-Z], it could be C to F which would be [C-F] or in our case 0 to 9 which is [0-9].
Important: Ranges are case sensitive… I.e. [a-z] would only return lower case letters and [A-Z] would only return upper case letters
This approach works perfectly if its just a single character but to allow multiple characters we need to use one more method. Say our input string was “My hotel room is 1045A.” We couldn’t use a range on its own because it would return each number separately; it will not return them in the same match. So we need to use something similar to this for our search pattern:
“[0-9]{4}”
An integer that’s wrapped in { }’s specifies the length of the range. To explain more we will use an example:
Input String: “My hotel room is 1045A.”
Search Pattern: “[0-9]”
Returns:
1
0
4
5
Input String: “My hotel room is 1045A.”
Search pattern: “[0-9]{4}”
Returns:
1045
I hope that makes it a bit clearer.
Putting it all together
Now going back to extracting the phone numbers from the sentence at the start of the article, we will need to combine all of the preceding techniques to extract the phone numbers.
All valid UK Mobile Phone numbers begin with ‘07’, so we need to make sure that all the matches returned begin with ‘07’. ‘07’ is literal content so our Search pattern so far is simply “07”
Valid UK Mobile Phone numbers are 11 numbers long (including the ‘07’). So now we need to make sure that all the matches have 9 numbers following the ‘07’. This means our search string will be “07[0-9]{9}”.
Sure enough this extracts all Valid UK Mobile Phone numbers. Now we will discuss how to use Regex (Regular expressions) via .NET.
Accessing Regular Expressions in NET
Regular expressions are used in .NET via the System.Text.RegularExpressions Namespace.
> Create a new console application and import the System.Text.RegularExpressions by adding “Imports System.Text.RegularExpressions” to the top of the module.
Results from our search are returned into a ‘MatchCollection’. From this ‘MatchCollection’ we can loop through and retrieve individual matches. Let’s get our hands on some code now…
That’s the basics of regular expressions, there is a lot more to learn about though. I have only scratched the surface of the syntax etc. If you search google for “Regular Expressions” then you will find a lot more info then I have provided. I will also post a sample app that demonstrates regular expressions to help you some more.
Hope you have enjoyed it, Martin.
This Article was written on 3/19/2004 3:40:00 PM by Martin
If you have ever used wildcards before e.g. when trying to find a file on your computer then unwittingly you have used regular expressions. I like to think that regular expressions simply allow you to manipulate strings easier. Regular expressions consist of a lot more than just wildcards though; we will look at more complex things in a minute.
Why would I want to use them?
Regular expressions are extremely useful when you want to extract useful information from a load of junk. Say someone wrote a sentence similar this:
I told Bob that I would text him on his mobile phone. His number is 07589123654 and my number is 07891264853.
And you need to extract all phone numbers from a sentence similar to that. If you haven’t met regular expressions before you would automatically think its either not possible or is very complicated and would involve lots of Mid()’s and Int32.Parse’s
So how would I go about using them?
First you will need to know the basics. There are two things that you will need to use Regular Expressions in .NET and those are:
An Input string (in our case the string above about bob)
A search pattern (the things that tells the regular expressions class what to look for)
The only tricky bit is the search pattern, they can sometimes become quite complex (depending on what your trying to extract). To build up a search pattern we will have to do some research first.
Finding literal content
This is the easiest thing you will learn, simply because you already know it! To find literal content you just write the literal content into the search pattern and that’s it.
Finding numbers
Finding numbers is a little trickier but it’s easy once you know how. Say our input string was “My phone number begins with a 0 and ends in a 8”. We would use the following search pattern:
“[0-9]”
This would return the following:
0
8
Characters wrapped around [ ]’s indicate a range. This could be from A-Z which would be [A-Z], it could be C to F which would be [C-F] or in our case 0 to 9 which is [0-9].
Important: Ranges are case sensitive… I.e. [a-z] would only return lower case letters and [A-Z] would only return upper case letters
This approach works perfectly if its just a single character but to allow multiple characters we need to use one more method. Say our input string was “My hotel room is 1045A.” We couldn’t use a range on its own because it would return each number separately; it will not return them in the same match. So we need to use something similar to this for our search pattern:
“[0-9]{4}”
An integer that’s wrapped in { }’s specifies the length of the range. To explain more we will use an example:
Input String: “My hotel room is 1045A.”
Search Pattern: “[0-9]”
Returns:
1
0
4
5
Input String: “My hotel room is 1045A.”
Search pattern: “[0-9]{4}”
Returns:
1045
I hope that makes it a bit clearer.
Putting it all together
Now going back to extracting the phone numbers from the sentence at the start of the article, we will need to combine all of the preceding techniques to extract the phone numbers.
All valid UK Mobile Phone numbers begin with ‘07’, so we need to make sure that all the matches returned begin with ‘07’. ‘07’ is literal content so our Search pattern so far is simply “07”
Valid UK Mobile Phone numbers are 11 numbers long (including the ‘07’). So now we need to make sure that all the matches have 9 numbers following the ‘07’. This means our search string will be “07[0-9]{9}”.
Sure enough this extracts all Valid UK Mobile Phone numbers. Now we will discuss how to use Regex (Regular expressions) via .NET.
Accessing Regular Expressions in NET
Regular expressions are used in .NET via the System.Text.RegularExpressions Namespace.
> Create a new console application and import the System.Text.RegularExpressions by adding “Imports System.Text.RegularExpressions” to the top of the module.
Results from our search are returned into a ‘MatchCollection’. From this ‘MatchCollection’ we can loop through and retrieve individual matches. Let’s get our hands on some code now…
Source Code:
|
Hope you have enjoyed it, Martin.
This Article was written on 3/19/2004 3:40:00 PM by Martin
相关文章推荐
- Asp.Net Web API 2第十八课——Working with Entity Relations in OData
- Regular Expressions in C++ with Boost.Regex(1)
- Working with Windows Workflow Foundation in ASP.NET
- Working with Web Resources in ASP.NET 2.0
- Thread-safety with regular expressions in Java
- Regular Expressions in C++ with Boost.Regex(2)
- Working with Data in ASP.NET 2.0 :: Using TemplateFields in the GridView Control
- Matching Balanced Constructs with .NET Regular Expressions
- Regular Expressions in C++ with Boost.Regex(3)
- Why does Spark report “java.net.URISyntaxException: Relative path in absolute URI” when working with
- Asp.Net Web API 2第十八课——Working with Entity Relations in OData
- Working with specific AutoCAD object types in .NET
- Generating and working with GUIDs in .NET
- Regular Expressions in C++ with Boost.Regex(4)
- Regular Expressions in Grep Command with 10 Examples --reference
- Matching Balanced Constructs with .NET Regular Expressions
- Working with Multiple Forms in Visual Basic .NET: Upgrading to .NET
- Working with Web Resources in ASP.NET 2.0
- 转:Working with HttpWebRequest and HttpWebResponse in ASP.NET
- 开始正式的看 Working with Data in ASP.NET 2.0