Optimism is an occupational hazard of programming: feedback is the treament. -- Kent Beck

Parsing Email Addresses with Regular Expressions

A lenient and strict method along with examples

Summary

Email validation is a common task in an ASP.NET page where users need to enter their email addresses. Most of the time a@b.c is an accepted email address, but you might like to do better than that.

The RegularExpressionValidator in .NET 1.1 gives a lenient Regex pattern for parsing an email address. If you don't need the strict pattern use the lenient one. It will stand the test of time better.

Here are the regular expression patterns:

Email Regex from the .NET 1.1 Regular Expression Validator

string patternLenient = @"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*";

string patternStrict = @"^(([^<>()[\]\\.,;:\s@\""]+" 
   + @"(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))@" 
   + @"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" 
   + @"\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+" 
   + @"[a-zA-Z]{2,}))$";

Use the following method to test the regular expressions. Copy the method into the code-behind of an ASPX page with a Label control on it (lblOutput). Don't forget to add the "using" directive to your file: "using System.Text.RegularExpressions".

Test Email Regular Expressions

public void TestEmailRegex()
{
   string patternLenient = @"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*";
   Regex reLenient = new Regex(patternLenient);
   string patternStrict = @"^(([^<>()[\]\\.,;:\s@\""]+" 
      + @"(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))@" 
      + @"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" 
      + @"\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+" 
      + @"[a-zA-Z]{2,}))$";
   Regex reStrict = new Regex(patternStrict);

   ArrayList samples = new ArrayList();
   samples.Add("joe");
   samples.Add("joe@home");
   samples.Add("a@b.c");
   samples.Add("joe@home.com");
   samples.Add("joe.bob@home.com");
   samples.Add("joe-bob[at]home.com");
   samples.Add("joe@his.home.com");
   samples.Add("joe@his.home.place");
   samples.Add("joe@home.org");
   samples.Add("joe@joebob.name");
   samples.Add("joe.@bob.com");
   samples.Add(".joe@bob.com");
   samples.Add("joe<>bob@bob.come");
   samples.Add("joe&bob@bob.com");
   samples.Add("~joe@bob.com");
   samples.Add("joe$@bob.com");
   samples.Add("joe+bob@bob.com");
   samples.Add("o'reilly@there.com");

   string output = "<table border=1>";
   output += "<tr><td><b>Email</b></td><td><b>Pattern</b>"
      + "</td><td><b>Valid Email?</b></td></tr>";
   bool toggle = true;
   foreach (string sample in samples)
   {
      string bgcol = "white";
      if (toggle)
         bgcol = "gainsboro";
      toggle = !toggle;

      bool isLenientMatch = reLenient.IsMatch(sample);
      if (isLenientMatch)
         output += "<tr bgcolor=" + bgcol + "><td>" 
            + sample + "</td><td>Lenient</td><td>Is Valid</td></tr>";
      else
         output += "<tr bgcolor=" + bgcol + "><td>" 
            + sample + "</td><td>Lenient</td><td>Is NOT Valid</td></tr>";

      bool isStrictMatch = reStrict.IsMatch(sample);
      if (isStrictMatch)
         output += "<tr bgcolor=" + bgcol + "><td>" 
            + sample + "</td><td>Strict</td><td>Is Valid</td></tr>";
      else
         output += "<tr bgcolor=" + bgcol + "><td>" 
            + sample + "</td><td>Strict</td><td>Is NOT Valid</td></tr>";

   }
   output += "</table>";

   lblOutput.Text = output;

}

Below is the output of the test method. Most of the time the lenient and strict patterns agree. But you'll see some cases like "a@b.c" which passes the lenient test and fails the strict test. Determining what characters can be used in an email address is almost more art than science. Basically most ASCII characters are allowed, but not space, <, >, [, ], " and a few others, but in practice many mail servers and email applications have some additional restrictions of their own.

We know that the lenient pattern will often accept mails that are NOT valid, however, I think it may also reject some that ARE valid. For example (joe$@bob.com).

In fact, an @ symbol is not even required for a serviceable email address if you're sticking to your local intranet.

So, really, when you're using a regular expression to validate an email address, you are trying to ensure that you're not going to get flaky, bizzare addresses which, while technically allowed, may be from malicious sources. Afterall, if you're a legitimate user, you're going to be sure your email address is standard and compatible with most systems.

I recently had trouble in a system with a customer having a single quote in their email address. Something like o'reilly@there.com. It's technically correct, but many systems won't allow it.

Output: Email Regex Samples

EmailPatternValid Email?
joeLenientIs NOT Valid
joeStrictIs NOT Valid
joe@homeLenientIs NOT Valid
joe@homeStrictIs NOT Valid
a@b.cLenientIs Valid
a@b.cStrictIs NOT Valid
joe@home.comLenientIs Valid
joe@home.comStrictIs Valid
joe.bob@home.comLenientIs Valid
joe.bob@home.comStrictIs Valid
joe-bob[at]home.comLenientIs NOT Valid
joe-bob[at]home.comStrictIs NOT Valid
joe@his.home.comLenientIs Valid
joe@his.home.comStrictIs Valid
joe@his.home.placeLenientIs Valid
joe@his.home.placeStrictIs Valid
joe@home.orgLenientIs Valid
joe@home.orgStrictIs Valid
joe@joebob.nameLenientIs Valid
joe@joebob.nameStrictIs Valid
joe.@bob.comLenientIs NOT Valid
joe.@bob.comStrictIs NOT Valid
.joe@bob.comLenientIs Valid
.joe@bob.comStrictIs NOT Valid
joe<>bob@bob.comeLenientIs Valid
joe<>bob@bob.comeStrictIs NOT Valid
joe&bob@bob.comLenientIs Valid
joe&bob@bob.comStrictIs Valid
~joe@bob.comLenientIs Valid
~joe@bob.comStrictIs Valid
joe$@bob.comLenientIs NOT Valid
joe$@bob.comStrictIs Valid
joe+bob@bob.comLenientIs Valid
joe+bob@bob.comStrictIs Valid
o'reilly@there.comLenientIs Valid
o'reilly@there.comStrictIs Valid