Need to Extract the string between two strings where second string is different

Hi There,
I want to extract a field “Description” from PDF’s. PDF’s are normal and able to extract the data into note pad (String).
There are 2 types here where a field called “Article” available next to the Description field and in other PDF field called “Sold Date” (Basically a table ) is available next to the Description.
Note : Description can have a single /multi line data.
Ex:1
Description : ABC 123 Welcome message
Article : Hello welcome to UiPath
Ex :2
Description : ABC 123 Welcome message
Sold Date - Basically here there is a table and this is the first column.

For the Article PDF , i am able to get the data from notepad successfully.
Now i want to handle the second pdf format as well if it contains Sold Date i need to extract the Description.

Following is the code that i have used after getting the data from PDF to string.
Getting the data between description and article, How to handle the second pdf with the existing code.

System.Text.RegularExpressions.Regex.Match(strPDFData,“(?<=Description:)([\S\s]*)(?=Article:)”).Value.Trim

Kindly suggest

Hi @avinashy

Wouldn’t it work by just replacing “Article:” with “Sold Date” in your Regex

(?<=Description:)([\S\s]*)(?=Sold Date)

Your code would be:

System.Text.RegularExpressions.Regex.Match(strPDFData,“(?<=Description:)([\S\s]*)(?=Sold Date)”).Value.Trim

Could you share the string that is being extracted from second pdf. So that I can create a proper Regex?

As I am not sure on how a table appears in a string.

@avinashy,

Use this regex:

(?<=Description\s*:\s*)([\S\s]*?)(?=\b(Article|Sold Date)\b)

This will work for both the scenarios.

System.Text.RegularExpressions.Regex.Match(strPDFData,"(?<=Description\s*:\s*)([\S\s]*?)(?=\b(Article|Sold Date)\b)").Value.Trim

Output:

1 Like

Hi @avinashy

Adding onto @ashokkarale solution,

Can I assume that your second pdf data looks like this:

If so, the solution works.

Hi @avinashy

Use this regex to extract the values as per your examples

System.Text.RegularExpressions.Regex.Match(strPDFData,“(?<=Description\s:\s)([\S\s]*?)(?=\b(Article|Sold Date)\b)").Value.Trim

Thanks

Thanks @ashokkarale It worked

1 Like

@avinashy,

Kindly close the thread by marking my answer as solution so it will be helpful to other community members as well.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.