Regex to extract data

Hi,
im trying to extract data using regex based extractor from a pdf file:

How to extract a specific data within two dynamic strings:

15-Apr-2021 → (Dynamic Date)
First Name
Email:xyz

from the above i want to extract only the “First Name”
And another case where i want to extract data before the numeric:

Compensation INR 5,45,000 ->(Dynamic amount)

Here i need only the INR

Please help me to extract this using regular expression.

Thanks

@Muthulakshmi_Thangamuthu - For the first name.

Regex Pattern :

(?<=\d+-\w+-\d+\r?\n).*
  1. For INR

Regex Pattern:

\w+(?=\s[\d.,]+)

hi @prasath17
thanks for the suggestion but i could not get the string first name using the regex

15-Apr-2021

First Name
Email:xyz

while running it takes a new line space like mentioned above so the regex is not working

@Muthulakshmi_Thangamuthu …pls give a correct sample text or use .trim at the end.

Firstname=regvar(0).values.tostring.trim

Hi @Muthulakshmi_Thangamuthu

Try this to get only the first name,

Regex: ^F.\w+.[A-Z]\w+

Thanks.

But this is not your original requirement right? I.e Line space after the date …?

15-Apr-2021

First Name
Email:xyz

Hi this is the actual requirement.

@Muthulakshmi_Thangamuthu - Please check this pattern…

(?<=\d+-\w+-\d+\s+\r?\n).*

Pattern Link

Hope this helps…

1 Like

Thanks @prasath17

Your Annual Compensation would be INR 5,45,000/- (Amount in Words)

For INR also it matches with many words using the previous expression.
Here i want to get only the “INR”

@Muthulakshmi_Thangamuthu - if you have any space between INR and amt then the above code works or change it to w{3} instead of w+ …

I just tested again, it is only extracting the INR…

Please show us what you have tried and how it is selecting “Many words”???