| Snippet | Regular Expressions | Text and Strings | C# | .NET | Browsers | Web |
SummaryWhile the .NET Framework contains a class call Uri which provides a lot of functionality when dealing with URLs and URIs, it's not always easy to get at the pieces of a URL. I present a technique here using regular expressions to extract the subparts of a URI in one fell swoop. But first let's talk about the parts of a URI.
For the code go straight to Page 2. Read on below about the anatomy of a URI. |
Anatomy of a URIThe following information is taken from the official W3C Network Working Group documents. For the whole kit and kaboodle go here.
URI stands for Uniform Resource Identifier and is a "compact sequence of characters which identifies an abstract or physical resource."
URL stands for Uniform Resource Locator. URLs are a subset of URIs that "provide a means of locating a resource by describing its primary access mechanism, for example, its network location.
Now, let's talk about the syntax of the URI, or the various pieces that combine to form the whole. These consist of the scheme, authority, path, query and fragment.
|
Components of a URI foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/\__________/ \__/
| | | | |
scheme authority path query fragment |
SchemeThe scheme of a URL is the first item, such as http, which indicates that this URI uses the hyper-text transport protocol. Examples of other schemes are:
| http | World Wide Web Server. (http://www.w3.com) |
| https | Secure World Wide Web Server. (https://www.w3.com) |
| ftp | File Transfer Protocol (ftp://www.ftpx.com) |
| gopher | The Gopher Protocol. |
| mailto | Electronic Mail Address. (mailto:info@cambiaresearch.com) |
| news | USENET newsgroups. |
| nntp | USENET newsgroups using NNTP access. |
| telnet | Interactive sessions on server. |
| wais | Wide Area Information Servers. |
| file | file on local system. |
| prospero | Prospero Directory Services. |
|
AuthorityIn a URL the authority is also called the domain and may include a port number at the end separated by a colon.
In the following example, the authority is www.cambiaresearch.com
http://www.cambiaresearch.com
In the following example, the authority is www.cambiaresearch.com:81
https://www.cambiaresearch.com:81
In the following example, the authority is info@cambiaresearch.com
mailto:info@cambiaresearch.com |
PathThe path component of the URL specifies the specific file (or page) at a particular domain. The path is terminated by the end of the URL, a question mark (?) which signifies the beginning of the query string or the number sign (#) which signifies the beginning of the fragment.
The path of the following URL is "/default.htm"
http://www.cambiaresearch.com/default.htm
The path of the following URL is "/snippets/csharp/regex/uri_regex.aspx"
http://www.cambiaresearch.com/snippets/csharp/regex/uri_regex.aspx |
QueryThe query part of the URL is a way to send some information to the path or webpage that will handle the web request. The query begins with a question mark (?) and is terminated by the end of the URL or a number sign (#) which signifies the beginning of the fragment.
The query of the following URL is "?id=241"
http://www.cambiaresearch.com/default.htm?id=241
The query of the following URL is "?sourceid=navclient&ie=UTF-8&rls=GGLC,GGLC: 1969-53,GGLC:en&q=uri+query"
http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=GGLC,GGLC:1969-53,GGLC:en&q=uri+query |
FragmentIn a URL the fragment is used to specify a location within the current page. This is often used in a FAQ with a list of links at the top of the page linking to longer descriptions farther down in the page.
The fragment of the following URL is "contact"
http://www.cambiaresearch.com/default.htm#contact
The fragment of the following URL is "scheme"
http://www.cambiaresearch.com/snippets/csharp/regex/uri_regex.aspx#scheme |
|