To iterate is human, to recurse divine. --L. Peter Deutsch
Welcome to my blog about software development and the Microsoft stack.

I've been a full time .NET developer for ten years, but I didn't start my professional life as a programmer ... more
Share/Print this page:

Subscribe for news, updates and more:

Parsing URLs with Regular Expressions and the Regex Object

And the Anatomy of a URI (Uniform Resource Identifier)

By steve on January 09, 2007.
Updated on January 22, 2012.
Viewed 50,014 times (12 times today).
Article TypesArticle TypesLanguage ElementsLanguage ElementsLanguagesTechnologiesTopicsTopics
OverviewSnippetRegular ExpressionsText and StringsC#.NETPolicy and StandardsWeb

Example: Regular Expressions for Parsing URIs and URLs

Contents

OK, we're finally here. The following method may be copied into the code behind file of your aspx page. Ensure there is a Label named lblOutput on your aspx page and call the TestParseURL method.

Example: Parse a URL with C# Regex

Contents
public void TestParseURL()
{
   string url = "http://www.cambiaresearch.com"
      + "/Cambia3/snippets/csharp/regex/uri_regex.aspx?id=17#authority";

   string regexPattern = @"^(?<s1>(?<s0>[^:/\?#]+):)?(?<a1>" 
      + @"//(?<a0>[^/\?#]*))?(?<p0>[^\?#]*)" 
      + @"(?<q1>\?(?<q0>[^#]*))?" 
      + @"(?<f1>#(?<f0>.*))?";

   Regex re = new Regex(regexPattern, RegexOptions.ExplicitCapture); 
   Match m = re.Match(url);

   lblOutput.Text = "<b>URL: " + url + "</b><p>";

   lblOutput.Text +=
      m.Groups["s0"].Value + "  (Scheme without colon)<br>"; 
   lblOutput.Text +=
      m.Groups["s1"].Value + "  (Scheme with colon)<br>"; 
   lblOutput.Text +=  
      m.Groups["a0"].Value + "  (Authority without //)<br>"; 
   lblOutput.Text +=  
      m.Groups["a1"].Value + "  (Authority with //)<br>"; 
   lblOutput.Text +=  
      m.Groups["p0"].Value + "  (Path)<br>"; 
   lblOutput.Text +=  
      m.Groups["q0"].Value + "  (Query without ?)<br>"; 
   lblOutput.Text +=  
      m.Groups["q1"].Value + "  (Query with ?)<br>"; 
   lblOutput.Text +=  
      m.Groups["f0"].Value + "  (Fragment without #)<br>"; 
   lblOutput.Text += 
      m.Groups["f1"].Value + "  (Fragment with #)<br>"; 


}

The following is the output you should see on your aspx page when you run the above method.

Example: Output

Contents
URL: http://www.cambiaresearch.com/Cambia3/snippets/csharp/
      regex/uri_regex.aspx?id=17#authority

http (Scheme without colon)
http: (Scheme with colon)
www.cambiaresearch.com (Authority without //)
//www.cambiaresearch.com (Authority with //)
/Cambia3/snippets/csharp/regex/uri_regex.aspx (Path)
id=17 (Query without ?)
?id=17 (Query with ?)
authority (Fragment without #)
#authority (Fragment with #)
Back to Top

User Comments (3)

Posted 2008 May 21 12:43 PM. reply
Wonderful post! Just what I was looking for - a practical approach to parsing URL's with regular expressions.

wmhogg
Posted 2008 May 29 03:43 AM. reply
Is it possible to add in something like the Segments property of the uri class into the regular expression, where the Path is split by "/". Or would you say its better to do this with a string.split?

Peter
Posted 2012 Feb 21 16:08 PM. reply
VERY helpful, thank you.

ryan
Post Your Comment
  You may post without logging in or login here.
Display Name: Required.
Email: Required. Will not be shown. Used for identicon.
Comment:
Allowed tags: <quote></quote>, <code></code>, <b></b>, <i></i>, <u></u>, <red></red>
 
   Please type text as shown in the image at left.