POP3 Email Client with full MIME Support (.NET 2.0)
2008-01-22 11:02
423 查看
Download source files & project - 25 Kb
In this article I provide the code to split the raw ASCII email into body, attachment, alternate views, etc. This was much harder to do, because while the POP3 specification is simple and specified straight forward in one RFC, there are several MIME related RFCs, which provide a multitude of possibilities how simple stuff like an email's actual text can be sent. The MIME specification allows for great flexibility, but Microsoft, being Microsoft, of course supports only a subset (for example no recursion of MIME parts within MIME parts). The provided code supports both worlds completely and gives the programmer the flexibility to access information about the received email as needed.
If you wonder why I wrote this article despite the fact that there are various articles on CodeProject for MIME support, here are some of the shortcomings encountered:
some code is not managed
use of DLLs without .NET source code
too limited functionality
no integration with
no error reporting
no XML documentation, etc.
My code is based on the following work:
POP3 Email Client (.NET 2.0) by Peter Huber
QuotedPrintable Class by Bill Gearhart
The first 4 lines are called the header of the email and they are separated from the body by an empty line. The end of the email is marked with a line containing just one "." (a period sign). There will be many more header lines when you look at a real email, some RFC standard ones and others, like this one from GMail:
In order to help you with the extraction of information from MIME based emails, I'm going to explain to you the basic MIME principles. First let's have a look at a complete MIME email. It might be a bit confusing, but it gives a good overview of the various MIME elements which I will explain one by one. This email has one email header, followed by the email body text and a .GIF picture. Notice the "
"MIME-Version" is the
The
text
image
message
audio
application
multipart
Each of the media type defines its own set of subtypes, which might be followed by a set of parameters, each specified in an attribute=value pair. For example:
The
The first 3 lines are part of the email header. The end of the header is marked by an empty line. All other lines are part of the email body, which ends with the line having only a "
Each MIME entity has a entity-header and a entity-body separated by an empty line. Since emails and MIME entities use the same structure and the same kind of header lines, it is possible that whole emails can become a MIME entity, which is useful for mail systems (
The structure of this email is:
Notice that the picture is part of
Details can be found at RFC 1421, 4.3.2.4 Step 4: Printable Encoding
Emails are received by
The
Prevent exception when
Only MIME entities from
Added end markers for multiparts in
Introduction
This is part 2 of my articles about email receiving with POP3 and MIME processing. My first article POP3 Email Client (.NET 2.0) covered the reliable downloading of emails from POP3 servers, which left us with a pure ASCII representation of the email body. This was the easier part.In this article I provide the code to split the raw ASCII email into body, attachment, alternate views, etc. This was much harder to do, because while the POP3 specification is simple and specified straight forward in one RFC, there are several MIME related RFCs, which provide a multitude of possibilities how simple stuff like an email's actual text can be sent. The MIME specification allows for great flexibility, but Microsoft, being Microsoft, of course supports only a subset (for example no recursion of MIME parts within MIME parts). The provided code supports both worlds completely and gives the programmer the flexibility to access information about the received email as needed.
If you wonder why I wrote this article despite the fact that there are various articles on CodeProject for MIME support, here are some of the shortcomings encountered:
some code is not managed
use of DLLs without .NET source code
too limited functionality
no integration with
System.Net.Mail.MailMessage
no error reporting
no XML documentation, etc.
My code is based on the following work:
POP3 Email Client (.NET 2.0) by Peter Huber
QuotedPrintable Class by Bill Gearhart
Background
Structure of a simple email
A simple email in pure ASCII might look like this:Date: Sat, 2 Sep 2006 17:25:15 +0200 From: Sender@NoSpam.com To: Receiver@NoSpam.com Subject: simple plain text mail Just a plain text email .
The first 4 lines are called the header of the email and they are separated from the body by an empty line. The end of the email is marked with a line containing just one "." (a period sign). There will be many more header lines when you look at a real email, some RFC standard ones and others, like this one from GMail:
X-Gmail-Received: f105c784e77f8b689759558db72ccd07f60387ba
Introduction of MIME
In the beginning there were just plain ASCII emails as defined in RFC 2822. Plain ASCII was soon not sufficient, though, and the Multipurpose Internet Mail Extensions specification MIME was created to support non US-ASCII texts, multi-part message bodies, rich text (HTML), images, sounds and attachments. The specification tried to offer great flexibility and to cater to all kind of possibilities. The result was numerous RFCs (2045, 2046, 2047, 2049, 2231, 2387, 4288, 4289, ... ). As it often happens in big groups, the whole thing became rather complicated and, even worse, left it to the implementer how precisely body text, etc. are implemented.In order to help you with the extraction of information from MIME based emails, I'm going to explain to you the basic MIME principles. First let's have a look at a complete MIME email. It might be a bit confusing, but it gives a good overview of the various MIME elements which I will explain one by one. This email has one email header, followed by the email body text and a .GIF picture. Notice the "
--0-494165446-1157210079=:74253" line, which separates the various parts of the email, called MIME entities.
Date: Sat, 2 Sep 2006 17:25:15 +0200 From: Sender@NoSpam.com To: Receiver@NoSpam.com Subject: simple gmail mail MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="0-494165446-1157210079=:74253" Content-Transfer-Encoding: 8bit --0-494165446-1157210079=:74253 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Content-Disposition: inline This is the email body This email has a smallPic.gif attachment --0-494165446-1157210079=:74253 Content-Type: image/gif; name="SmallPic.GIF" Content-Transfer-Encoding: base64 Content-Description: 437081412-SmallPic.GIF Content-Disposition: inline; filename="SmallPic.GIF" R0lGODlhQQBBAPcAAAAAAIAAAACAAICAAAAAgIAAgACAgICAgMDAwP8AAAD/ NZWZfpnCck/OeTUXvUdXxdi9/SbDPFS4t+/fwIMLH068uPHjyJMrX868ufPn 0KNLn069uvWVAQEAOw== --0-494165446-1157210079=:74253-- .
Structure of an email header field
An email header field as defined in RFC 2822 has the following structure:field-name ":" [ field-body ] CRLF Example: MIME-Version: 1.0
"MIME-Version" is the
field-name, "1.0" is the
field-body. The MIME-Version header field is mandatory for every MIME email. All other MIME header fields start with "
Content-..."
Content-Type
The most powerful MIME header field is theContent-Typeand is defined in RFC 2046. It can look like this:
Content-Type: text/plain; Content-Type: text/plain; charset=ISO-8859-1 Content-Type: text/plain; charset=us-ascii Content-Type: text/plain; charset=utf-8 Content-Type: text/html; Content-Type: text/html; charset=ISO-8859-1 Content-Type: text/css Content-Type: image/gif; name=image004.gif Content-Type: image/jpeg; name="image005.jpg" Content-Type: message/delivery-status Content-Type: message/rfc822 Content-Type: audio/x-mpeg Content-Type: video/mpeg-2 Content-Type: application/msword Content-Type: application/mspowerpoint Content-Type: application/zip Content-Type: multipart/mixed; boundary="----=_Part_3431_12384933.1139387792352" Content-Type: multipart/alternative; boundary="----=_Part_4088_29304219.1115463798628" Content-Type: multipart/related; boundary="----=_Part_2067_9241611.1139322711488" Content-Type: multipart/digest; boundary="----=Next message 15543233913938263541" Content-Type: multipart/report; report-type=delivery-status; boundary="k04G6HJ9025016.1136391237/carbon.singnet.com.sg" Content-Type: multipart/parallel
The
Content-Typefield is used to specify the nature of the data in the body of a MIME entity, by indicating media type and subtype identifiers, and by providing auxiliary information that may be required for certain media types. Some of the media types are:
text
image
message
audio
application
multipart
Each of the media type defines its own set of subtypes, which might be followed by a set of parameters, each specified in an attribute=value pair. For example:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
The
media typeis "
text", the
subtypeis "
plain", the
attributeis "
charset" and the
attribute valueis "
ISO-8859-1". There could be more attribute=value pairs like
"format=flowed".
Content-TypeMultipart
Themedia type"multipart" provides the flexibility to split an email into several parts, like plain text, HTML text and attached files. There are versions of multipart (subtypes), but all have the same attribute "boundary". Its value is a string which is unique in the whole email and is used for marking the boundary delimiter lines of the various parts. Let's look at the previous example again, this time only with
Content-Typeinformation lines:
Headerlines Content-Type: multipart/mixed; boundary="0-494165446-1157210079=:74253" Headerlines --0-494165446-1157210079=:74253 Content-Type: text/plain; charset=iso-8859-1 Other MIME part header lines The plain text email body --0-494165446-1157210079=:74253 Content-Type: image/gif; name="SmallPic.GIF" Other MIME part header lines The attachment coded in Base64 --0-494165446-1157210079=:74253-- .
The first 3 lines are part of the email header. The end of the header is marked by an empty line. All other lines are part of the email body, which ends with the line having only a "
." (period). The boundary delimiter line breaks the body itself into the email text and the file attachment. This line always starts with "
--" followed by the boundary string. The last boundary delimiter line is followed by trailing "
--".
Each MIME entity has a entity-header and a entity-body separated by an empty line. Since emails and MIME entities use the same structure and the same kind of header lines, it is possible that whole emails can become a MIME entity, which is useful for mail systems (
Content-Type: message). But of course having an email in another email in another email leads to many complications, and so it is no wonder that most mail program use a different solution for forwarding an email, they just merge it with the email text body. This has the advantage that even mail clients not supporting MIME can handle forwarding. Similarly, even the MIME specification is recursive, Microsoft's
System.Net.Mail.MailMessageis not ! More about this later.
Content-Type: multipart/mixed
Often the top most multipart subtype is "mixed". It indicates that the email consists of several MIME entities, without specifying anything more about the kind of entities. "
multipart/mixed" is used as default, if the actual subtype is not recognised by the email client.
Content-Type: multipart/alternative
The subtype "alternative" is used, if the same email is sent in plain text and HTML. Both have the same content, but in alternative coding. The email client is supposed to display to the user the last alternative part understood by the client. If an email consist of a plain text entity followed by an HTML entity, the email client is supposed to display the HTML text, even if it also knows how to display plain text, because the HTML version came last. An email with plain text and HTML can look like this:
Some header lines MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----_=_NextPart_001_01C6CEA2.EF9BECF8" ------_=_NextPart_001_01C6CEA2.EF9BECF8 Content-Type: multipart/alternative; boundary="----_=_NextPart_002_01C6CEA2.EF9BECF8" ------_=_NextPart_002_01C6CEA2.EF9BECF8 Content-Type: text/plain; charset="iso-8859-1" HTML sample email with bold text and attachment. ------_=_NextPart_002_01C6CEA2.EF9BECF8 Content-Type: text/html; charset="iso-8859-1" <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML> <HEAD> <STYLE> DIV { FONT-SIZE: 10pt; FONT-FAMILY: Verdana, Arial, Helvetica, sans-serif } </STYLE> </HEAD> <BODY> <DIV> HTML sample email with <STRONG>bold</STRONG> text and attachment. </DIV> </BODY> </HTML> ------_=_NextPart_002_01C6CEA2.EF9BECF8-- ------_=_NextPart_001_01C6CEA2.EF9BECF8 Content-Type: image/gif; name="SmallPic.GIF" R0lGODlhQQBBAPcAAAAAAIAAAACAAICAAAAAgIAAgACAgICAgMDAwP8AAAD/ NZWZfpnCck/OeTUXvUdXxdi9/SbDPFS4t+/fwIMLH068uPHjyJMrX868ufPn 0KNLn069uvWVAQEAOw== ------_=_NextPart_001_01C6CEA2.EF9BECF8-- .
The structure of this email is:
multipart/mixed | multipart/alternative | | text/plain; format=flowed; charset=ISO-8859-1 | | text/html; charset=ISO-8859-1 | image/gif; name=SmallPic.GIF
Notice that the picture is part of
multipart/mixed, not
multipart/alternative.
Content-Type: multipart/related
Multipart-relatedcan be used to send HTML text and graphics or other related material in the same email. It is beyond the scope of this article to explain the details of any other of the multipart mediatypes.
Content-Transfer-Encoding
POP3 defines that the body of an email is 7bit US ASCII code. Since the text displayed to the user can be any Unicode and file attachments are usually array of bytes, the email sender must encode this content to ASCII and we, the receiver of the email, need to decode it. If the value is "7bit", no encoding was used. "8bit", or "binary" has the same meaning, but is not supported by the .NET framework. I treat "8bit" like "7bit", i.e. take the content as it is, whereas "binary" is illegal in POP3, because some character sequences like CRLF "." CRLF have a special meaning in POP3, but might occur in random binary.Content-Transfer-Encoding: quoted-printable
If a MIME entity consists mostly of US ASCII characters, it is enough to encode just some special characters and all bytes not covered by the US ASCII characterset. "quoted-printable" does this by sending a "=" and the hex value of the byte as ASCII characters. A carriage return (
hex: 0D) becomes: "
=0D". There are a number of rules dealing with special circumstances. I couldn't find a decoder for quoted-printable in .NET, so I copied the QuotedPrintable Class, by Bill Gearhart source code from ASP emporium.
Content-Transfer-Encoding: base64
Base64uses a limited set of characters ("A"-"Z", "a"-"z", "0"-"9", "+", "/") to express a 6 bit value. Any 3 bytes can be expressed with 4 encoding characters. As an example, let's take the first 4 ASCII characters "
R0lG" of the graphic file in our example email:
R 0 l G 001001 110100 100101 000110 Resulting 3 bytes: 00100111 01001001 01000110
Details can be found at RFC 1421, 4.3.2.4 Step 4: Printable Encoding
Using the code
The best way to get an understanding of a library is to use it. TheMainfunction in the downloadable code does just that. It connects to an POP3 server (don't forget to provide the proper server name, user name and password) and downloads at most 5 emails. The code will not delete the emails from the server, but the server might delete them anyway, depending on its settings. The structure of the 5 emails will be displayed on the console. "Program.cs" also contains the method
SendTestmail()to generate some sample emails.
Emails are received by
Pop3MimeClientderived from
Pop3MailClient, which is described in POP3 Email Client (.NET 2.0), by Peter Huber and offers all the functionality to interact with the POP3 server.
Pop3MimeClientadds the method
GetEmailwhich fetches one particular email from the POP3 server and returns it decoded as
RxMailMessage.
Mapping MIME to System.Net.Mail.MailMessage
TheSystem.Net.Mail.MailMessageclass is used by
System.Net.Mail.SmtpClientfor sending emails with SMTP.
MailMessageincludes only the information needed to send an email. Receiving an email creates some additional information. Therefore, a new class
RxMailMessageis inherited deriving from
MailMessageand adding properties like
DeliveryDateor
DeliveredTo.
The
SmtpClientconverts
MailMessageto a MIME conformant email, but
MailMessageprovides hardly access to any MIME related functionality. When receiving an email, we would like to store the complete information.
Pop3MimeClientreceives the first MIME entity by MIME entity and stores them as a MIME entity tree in the new
Entitiescollection property of
RxMailMessage. If possible, the info is also copied to the properties inherited from
MailMessage. This gives the user the freedom to choose if the complete email in MIME form is used for further processing or just the simpler, but possibly incomplete
Body,
AlternateViewsor
Attachmentsas defined by
MailMessage. The method
decodeEntitycan be used as an example how to loop through all MIME entities of an email.
History
11.10.2006 Improvements Constructor, Handling ContentDisposition==null
Proper handling ofuseSSLin
Pop3MimeClientconstructor
Prevent exception when
ContentDispositionis
null
8.10.2006 Improvements Attachment Handling
Detecting content-disposition header field and creating an attachment if it looks like: "C-Disp: attachment"
Only MIME entities from
multipart/alternativeparents become alternative views
Added end markers for multiparts in
RxMailMessage.MailStructure()
17.9.2006 Original Post
I was not too sure how to map the various multipart entities to email body, etc. I analysed probably thousands of emails I received and populated theRxMailMessageproperties as appropriate, but it is very likely that you might receive a different formatted email. Please provide some feedback here, if you do so or find any bugs.
相关文章推荐
- outlook 2003 with exchange 2003 using pop3 acct's in o2K3 client POP3 Email being returned
- How to make a combo box with fulltext search autocomplete support?
- Issue 71 - pymssql - Undefined symbols on Mac, CentOS, Redhat with pre-compiled build - A fast MS SQL Server client library for Python directly using C API instead of ODBC. It is Python DB-API 2.0 compliant. Works on Linux, *BSD, Solaris, Mac OS X and Win
- Rolling with Rails 2.0 - The First Full Tutorial - Part 2
- outlook 2003 with exchange 2003 using pop3 acct's in o2K3 client POP3 Email being returned
- [转]在.NET 2.0 中发送Email
- 在.NET 2.0 中发送Email
- Rolling with Rails 2.0 - The First Full Tutorial (中文)
- win2003 64位英文 IIS安装.net 2.0时的问题 IsDomainController failed with HRESULT 80070842
- Testing WSO2 Identity Server OAuth 2.0 support with Curl
- .NET 2.0中SmtpClient的乱码问题
- 在.NET 2.0 中发送Email
- .NET 2.0中SmtpClient的乱码问题
- 利用webservice和.net技术上传和下载文件--Sending files in chunks with MTOM Web Services and .NET 2.0 from:http://www.codeproject.com/soap/MTO
- perl mail(Mail::POP3Client+MIME::Parser)
- 在.NET 2.0 中发送Email
- .net 2.0 中发送Email
- Data Binding with Windows Forms 2.0: Programming Smart Client Data Applications with .NET (Microsoft .NET Development Series)
- .NET 2.0 发送EMail邮件
- 从壹开始前后端分离 [ Vue2.0+.NetCore2.1] 二十六║Client渲染、Server渲染知多少{补充}