Hi,
im trying to extract data using regex based extractor from a pdf file:
How to extract a specific data within two dynamic strings:
15-Apr-2021 → (Dynamic Date)
First Name
Email:xyz
from the above i want to extract only the “First Name”
And another case where i want to extract data before the numeric:
Compensation INR 5,45,000 ->(Dynamic amount)
Here i need only the INR
Please help me to extract this using regular expression.
Thanks
@Muthulakshmi_Thangamuthu - For the first name.
Regex Pattern :
(?<=\d+-\w+-\d+\r?\n).*
- For INR
Regex Pattern:
\w+(?=\s[\d.,]+)
hi @prasath17
thanks for the suggestion but i could not get the string first name using the regex
15-Apr-2021
First Name
Email:xyz
while running it takes a new line space like mentioned above so the regex is not working
@Muthulakshmi_Thangamuthu …pls give a correct sample text or use .trim at the end.
Firstname=regvar(0).values.tostring.trim
Hi @Muthulakshmi_Thangamuthu
Try this to get only the first name,
Regex: ^F.\w+.[A-Z]\w+
Thanks.
But this is not your original requirement right? I.e Line space after the date …?
15-Apr-2021
First Name
Email:xyz
Hi this is the actual requirement.
@Muthulakshmi_Thangamuthu - Please check this pattern…
(?<=\d+-\w+-\d+\s+\r?\n).*
Pattern Link
Hope this helps…
1 Like
Thanks @prasath17
Your Annual Compensation would be INR 5,45,000/- (Amount in Words)
For INR also it matches with many words using the previous expression.
Here i want to get only the “INR”
@Muthulakshmi_Thangamuthu - if you have any space between INR and amt then the above code works or change it to w{3} instead of w+ …
I just tested again, it is only extracting the INR…
Please show us what you have tried and how it is selecting “Many words”???