Parsing Email Addresses with Regular Expressions
A lenient and strict method along with examples
By
Steve on
Tuesday, January 09, 2007
Updated
Friday, April 22, 2016
Viewed
117,900 times. (
0 times today.)
Summary
Email validation is a common task in an ASP.NET page where users need to enter their email addresses. Most of the time a@b.c is an accepted email address, but you might like to do better than that.
The RegularExpressionValidator in .NET 1.1 gives a lenient Regex pattern for parsing an email address. If you don't need the strict pattern use the lenient one. It will stand the test of time better.
Here are the regular expression patterns:
Email Regex from the .NET 1.1 Regular Expression Validator
string patternLenient = @"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*";
string patternStrict = @"^(([^<>()[\]\\.,;:\s@\""]+"
+ @"(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))@"
+ @"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
+ @"\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+"
+ @"[a-zA-Z]{2,}))$";
Use the following method to test the regular expressions. Copy the method into the code-behind of an ASPX page with a Label control on it (lblOutput). Don't forget to add the "using" directive to your file: "using System.Text.RegularExpressions".
Test Email Regular Expressions
public void TestEmailRegex()
{
string patternLenient = @"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*";
Regex reLenient = new Regex(patternLenient);
string patternStrict = @"^(([^<>()[\]\\.,;:\s@\""]+"
+ @"(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))@"
+ @"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
+ @"\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+"
+ @"[a-zA-Z]{2,}))$";
Regex reStrict = new Regex(patternStrict);
ArrayList samples = new ArrayList();
samples.Add("joe");
samples.Add("joe@home");
samples.Add("a@b.c");
samples.Add("joe@home.com");
samples.Add("joe.bob@home.com");
samples.Add("joe-bob[at]home.com");
samples.Add("joe@his.home.com");
samples.Add("joe@his.home.place");
samples.Add("joe@home.org");
samples.Add("joe@joebob.name");
samples.Add("joe.@bob.com");
samples.Add(".joe@bob.com");
samples.Add("joe<>bob@bob.come");
samples.Add("joe&bob@bob.com");
samples.Add("~joe@bob.com");
samples.Add("joe$@bob.com");
samples.Add("joe+bob@bob.com");
samples.Add("o'reilly@there.com");
string output = "<table border=1>";
output += "<tr><td><b>Email</b></td><td><b>Pattern</b>"
+ "</td><td><b>Valid Email?</b></td></tr>";
bool toggle = true;
foreach (string sample in samples)
{
string bgcol = "white";
if (toggle)
bgcol = "gainsboro";
toggle = !toggle;
bool isLenientMatch = reLenient.IsMatch(sample);
if (isLenientMatch)
output += "<tr bgcolor=" + bgcol + "><td>"
+ sample + "</td><td>Lenient</td><td>Is Valid</td></tr>";
else
output += "<tr bgcolor=" + bgcol + "><td>"
+ sample + "</td><td>Lenient</td><td>Is NOT Valid</td></tr>";
bool isStrictMatch = reStrict.IsMatch(sample);
if (isStrictMatch)
output += "<tr bgcolor=" + bgcol + "><td>"
+ sample + "</td><td>Strict</td><td>Is Valid</td></tr>";
else
output += "<tr bgcolor=" + bgcol + "><td>"
+ sample + "</td><td>Strict</td><td>Is NOT Valid</td></tr>";
}
output += "</table>";
lblOutput.Text = output;
}
Below is the output of the test method. Most of the time the lenient and strict patterns agree. But you'll see some cases like "a@b.c" which passes the lenient test and fails the strict test. Determining what characters can be used in an email address is almost more art than science. Basically most ASCII characters are allowed, but not space, <, >, [, ], " and a few others, but in practice many mail servers and email applications have some additional restrictions of their own.
We know that the lenient pattern will often accept mails that are NOT valid, however, I think it may also reject some that ARE valid. For example (joe$@bob.com).
In fact, an @ symbol is not even required for a serviceable email address if you're sticking to your local intranet.
So, really, when you're using a regular expression to validate an email address, you are trying to ensure that you're not going to get flaky, bizzare addresses which, while technically allowed, may be from malicious sources. Afterall, if you're a legitimate user, you're going to be sure your email address is standard and compatible with most systems.
I recently had trouble in a system with a customer having a single quote in their email address. Something like o'reilly@there.com. It's technically correct, but many systems won't allow it.
Output: Email Regex Samples
Email | Pattern | Valid Email? |
joe | Lenient | Is NOT Valid |
joe | Strict | Is NOT Valid |
joe@home | Lenient | Is NOT Valid |
joe@home | Strict | Is NOT Valid |
a@b.c | Lenient | Is Valid |
a@b.c | Strict | Is NOT Valid |
joe@home.com | Lenient | Is Valid |
joe@home.com | Strict | Is Valid |
joe.bob@home.com | Lenient | Is Valid |
joe.bob@home.com | Strict | Is Valid |
joe-bob[at]home.com | Lenient | Is NOT Valid |
joe-bob[at]home.com | Strict | Is NOT Valid |
joe@his.home.com | Lenient | Is Valid |
joe@his.home.com | Strict | Is Valid |
joe@his.home.place | Lenient | Is Valid |
joe@his.home.place | Strict | Is Valid |
joe@home.org | Lenient | Is Valid |
joe@home.org | Strict | Is Valid |
joe@joebob.name | Lenient | Is Valid |
joe@joebob.name | Strict | Is Valid |
joe.@bob.com | Lenient | Is NOT Valid |
joe.@bob.com | Strict | Is NOT Valid |
.joe@bob.com | Lenient | Is Valid |
.joe@bob.com | Strict | Is NOT Valid |
joe<>bob@bob.come | Lenient | Is Valid |
joe<>bob@bob.come | Strict | Is NOT Valid |
joe&bob@bob.com | Lenient | Is Valid |
joe&bob@bob.com | Strict | Is Valid |
~joe@bob.com | Lenient | Is Valid |
~joe@bob.com | Strict | Is Valid |
joe$@bob.com | Lenient | Is NOT Valid |
joe$@bob.com | Strict | Is Valid |
joe+bob@bob.com | Lenient | Is Valid |
joe+bob@bob.com | Strict | Is Valid |
o'reilly@there.com | Lenient | Is Valid |
o'reilly@there.com | Strict | Is Valid |