开发者

Using a C# regex to parse a domain name?

开发者 https://www.devze.com 2023-01-08 12:16 出处:网络
I need to parse the domain name from a string. The string can vary and I need the exact domain. Examples of Strin开发者_StackOverflow中文版gs:

I need to parse the domain name from a string. The string can vary and I need the exact domain.

Examples of Strin开发者_StackOverflow中文版gs:

http://somename.de/
www.somename.de/
somename.de/
somename.de/somesubdirectory
www.somename.de/?pe=12

I need it in the following format with just the domain name, the tld, and the www, if applicable:

www.somename.de

How do I do that using C#?


As an alternative to a regex solution, you can let the System.Uri class parse the string for you. You just have to make sure the string contains a scheme.

string uriString = "http://www.google.com/search";

if (!uriString.Contains(Uri.SchemeDelimiter))
{
    uriString = string.Concat(Uri.UriSchemeHttp, Uri.SchemeDelimiter, uriString);
}

string domain = new Uri(uriString).Host;

This solution also filters out any port numbers and converts IPv6 addresses to its canonical form.


i simple used

 Uri uri = new Uri("http://www.google.com/search?q=439489");
            string url = uri.Host.ToString();
            return url;

because by using this you can sure.


I checked out Regular Expression Library, and it looks like something like this might work for you:

^(([\w][\w\-\.]*)\.)?([\w][\w\-]+)(\.([\w][\w\.]*))?$


Try this:

^(?:\w+://)?([^/?]*)

this is a weak regex - it doesn't validate the string, but assumes it's already a url, and gets the first word, until the first slash, while ignoring the protocol. To get the domain look at the first captured group, for example:

string url = "http://www.google.com/hello";
Match match = Regex.Match(url, @"^(?:\w+://)?([^/?]*)");
string domain = match.Groups[1].Value;

As a bonus, it also captures until the first ?, so the url google.com?hello=world will work as expected.

0

精彩评论

暂无评论...
验证码 换一张
取 消