Friday, April 29, 2011

C# subtring parse using regular expression (ATL >> NET)?

What is the C# & .NET Regex Pattern that I need to use to get "bar" out of this url?

http://www.foo.com/bar/123/abc

In ATL regular expressions, the pattern would have been

http://www\.foo\.com/{[a-z]+}/123/abc
From stackoverflow
  • Simply: #http://www\.foo\.com/([a-z]+)/123/abc#

    use parenthesis instead of brackets.

    You will need to use a character on the front and the end of the regular expression to make it work too.

    Nick Berardi : actually this is wrong because the "." is considered any char? So yours also evaluates fine for http://wwwxfood.com/bar/123/abc
    Daniel Brückner : The dot is escaped with a backslash - so it works.
    Nick Berardi : # is not a valid character in ASP.NET, I think you mean ^ and $
    Nick Berardi : Also it wasn't escaped when I posted the comment.
    Erick : @Nick Actually it was excaped but for some reason you have to double escape your text on SO ^_^ , as for the #s don't you need a delimiter usually for your REGEX ... ?
  • Pretty much the same thing

        http://www\.foo\.com/([a-z]+)/123/abc
    
  • This will almost work - just a tiny modification - change brackets to parenthesis.

    http://www\.foo\.com/([a-z]+)/123/abc

    But I consider this regex of not much use because it includes almost the whole string. Would it not be better to match the first path element independently from the whole rest?

    ^http://[^/]*/([^/]*).*$
    Jonathan C Dickinson : Maybe ^http://([^/])*/([^/]*).*$
    Jonathan C Dickinson : Ignore the semicolon, SO added it for some reason.
  • Here is a solution that breaks the url up into component parts; protocol, site and part. The protocol group is not required so you could give the expression 'www.foo.com/bar/123/abc'. The part group can contain multiple sub groups representing the folders and file under the site.

    ^(?<protocol>.+://)?(?<site>[^/]+)/(?:(?<part>[^/]+)/?)*$
    

    You would use the expression as follows to get 'foo'

    string s = Regex.Match(@"http://www.foo.com/bar/123/abc", @"^(?<protocol>.+://)?(?<site>[^/]+)/(?:(?<part>[^/]+)/?)*$").Groups["part"].Captures[0].Value;
    

    The breakdown of the expression results are as follows

    protocol: http://
    site: www.foo.com
    part[0]: bar
    part[1]: 123
    part[2]: abc

0 comments:

Post a Comment