I am using Document Understanding’s Regex Extractor to extract “total” from multiple pdf’s. All the pdf’s have multiple copies of the same page which I cannot remove. Trouble is that the regex extractor is extracting all the instances of “total” from all the pages. So if the pdf contains 3 pages, then 3 duplicates of “total” are returned. I only want the value extracted from the first page and not other pages. Any help is appreciated!
Hi @shrey.shah
Try with below expression
System.Text.RegularExpressions.Regex.Match("Inputstring","Your Pattern").Groups(1).Tostring
Groups(1) → It will return the First value from the group
Regards
Gokul
Thank you @Gokul001 for your time. I am using Regex extractor from the Document Understanding package so where do I insert the regex you mentioned? Sorry but I am new to uipath.
Ok I understood where to input the expression (in the advanced field right?). The field that I want to extract is total amount which has a pattern as “Rs. xxxxxx”. So in the expression you provided, “Your Pattern” will be substituted with “Rs.” and “Inputstring” will be substituted with?
Hi @Gokul001
This is the field I want to extract!
This is the regex builder where I am typing the regex you mentioned!
Hi @Gokul001
That is what I was doing previously as shown below:
But this is returning value from all the pages of the pdf. I only want the value from the first page!
If possible share the Input @shrey.shah
@Gokul001 By Input you mean the pdf files?
Yes @shrey.shah
Hi @shrey.shah
Have you try with selecting the option in the drop down
Check → SingleLine
Regards
Gokul
@Gokul001 Yes it is working now. So Singleline basically extracts only the value from the first page?
Great @shrey.shah
Only for the particular Regex pattern you can use Singleline
If your Query is resolve Kindly clos this topic by marking solution. So it will help for others too.
Regards
Gokul
@Gokul001 Thanks a lot!
@Gokul001 Sorry to disturb again but if I select the Singleline option, then along with the amount it is also extracting other details in the page as shown below:
I tried limiting the characters but then it again extracts the value from all pages even with Singleline selected:
Hi @shrey.shah
In this case use use string manipulation to extract the particular amount.
Can you share the data after extracting from the regex extractor.
Regards
Gokul
@Gokul001 I have uploaded the image of the data extracted for both the scenarios (Singleline+no character limit) and (Singleline + character limit) in my previous reply