1

I asked a similar question recently about using regex to retrieve a URL or folder path from a string. I was looking at this comment by Dour High Arch, where he says:

"I recommend you do not use regexes at all; use separate code paths for URLs, using the Uri class, and file paths, using the FileInfo class. These classes already handle parsing, matching, extracting components, and so on."

I never really tried this, but now I am looking into it and can't figure out if what he said actually is useful to what I'm trying to accomplish.

I want to be able to parse a string message that could be something like:

"I placed the files on the server at http://www.thewebsite.com/NewStuff, they can also be reached on your local network drives at J:\Downloads\NewStuff"

And extract out the two strings http://www.thewebsite.com/ and J:\Downloads\NewStuff. I don't see any methods on the Uri or FileInfo class that parse a Uri or FileInfo object from a string like I think Dour High Arch was implying.

Is there something I'm missing about using the Uri or FileInfo class that will allow this behavior? If not is there some other class in the framework that does this?

4
  • 2
    I think what that comment was implying is that if you pass it a file path, etc. It should correct back-slashes for forward-slashes and so forth. I myself would use Regex. Commented Oct 7, 2013 at 16:39
  • You were missing Uri.IsWellFormedUriString will return the type of the Uri (including file paths and URLs). This is the matching mentioned. see msdn.microsoft.com/en-us/library/… Commented Oct 7, 2013 at 16:49
  • @Hogan If I'm only passing in a string that is a Uri that would be fine. However, I'm asking if there is a method of the Uri or FileInfo class that can accept a string such as the one in the example, and retrieve a URI or Filepath from that string without any further work... Commented Oct 7, 2013 at 16:52
  • 1
    @Zack - That is easy to answer - "No". Commented Oct 7, 2013 at 16:56

4 Answers 4

1

I'd say the easiest way is splitting the strings into parts first.

First delimiter would be spaces, for each word - second would be qoutes (double and single)

Then use Uri.IsWellFormedUriString on each token.

So something like:

foreach(var part in String.Split(new char[]{''', '"', ' '}, someRandomText))
{
    if(Uri.IsWellFormedUriString(part, UriKind.RelativeOrAbsolute))
        doSomethingWith(part);

}

Just saw at URI.IseWellFormedURIString that this is a bit to strickt to suit your needs maybe. It returns false if www.Whatever.com is missing the http://

Sign up to request clarification or add additional context in comments.

2 Comments

I guess you have to trim other punctuations such as commas, periods, exclamation and question marks, colons, semi-colons and so on,as well, right?
yeah maybe it would be better to use Regex.Matches instead of string.Split. So define a more lose regex of what could be a path, then use uri.IsWellFormedUriString to make sure it is.
1

It was not clear from your earlier question that you wanted to extract URL and file path substrings from larger strings. In that case, neither Uri.IsWellFormedUriString nor rRegex.Match will do what you want. Indeed, I do not think any simple method can do what you want because you will have to define rules for ambiguous strings like httX://wasThatAUriScheme/andAre/these part/of/aURL or/are they/separate.strings?andIsThis%20a%20Param?

My suggestion is to define a recursive descent parser and create states for each substring you need to distinguish.

Comments

1

U can use :

(?<type>[^ ]+?:)(?<path>//[^ ]*|\\.+\\[^ ]*)

that will give you 2 groups on each result

type : "http:"

path : //www.thewebsite.com/NewStuff

and

type : "J:"

path : \Downloads\NewStuff

out of the string

"I placed the files on the server at http://www.thewebsite.com/NewStuff, they can also be reached on your local network drives at J:\Downloads\NewStuff"

you can use the "type" group to see if the type is http:or not and set action on that.


EDIT

or use regex below if you are sure there is no whitespace in your filepath :

(?<type>[^ ]+?:)(?<path>//[^ ]*|\\[^ ]*)

Comments

-1

Try \w+:\S+ and see how well that fits your purposes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.