Extract URL's and Links from a work document

Hi all,

I’m working on a project where I need to extract URLs and links from a Word document. The document contains a lot of text and other information, but I need to be able to extract just the URLs and links. Some links are embedded like “click here” if someone’s click on it then they will redirected to a website or on a mail address.

Does anyone have any suggestions on how to approach this? I’m using UiPath for the project, so any specific activities or methods within UiPath would be especially helpful.

Thanks in advance for your help!

Hi @Muhammad_Anas_Baloch

One way to extract URLs and links from a Word document is to use regular expressions.

Use the “Read Text” activity to read the contents of the Word document into a string variable and then create a regular expression pattern that matches URLs and links.

For example, the following pattern matches HTTP(S) URLs and email addresses:

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s!(){};:'".,<>?«»“”‘’]))`.

Use the “Matches” activity to apply the regular expression pattern to the string variable containing the Word document contents.

Thanks!!

1 Like

@Nitya1
I will try and let you know, Thank you!

1 Like