Is there any way to get only the highlighted part form a pdf?


@Needhelp, sure, but i need to know a couple of things first.

what on that line or at least region doesn’t change, i need something to use as an anchor, and that must be something present on all the documents you need to extract this from.

I attempted tagged reading and anchor base to find element and get text. However when i go to another pdf, which looks the same but the formatting (positioning) of the word are different and it didnt work.

Is there any alternative solution?

@Needhelp, theres many ways to do this, i wanted to write a regex expression that will help you get the text, but i need you to answer the question i asked.


I am not sure what is on the region. However i have 2 examples

Am I correct in saying on the second page we want “ABC Company Pte Ltd”, so already you can see that the text we want from both pages begins with Uppercase 3 letters. if that’s the same across all documents then we can get it using regex, i am gonna send you the solution shortly


@Needhelp, so now you need to use the Read PDF activity, the output of that activity will be everything from the document saved in a string variable. Then use a Matches activity, the input is the string variable from the read pdf and then the expression is: ^([A-Z]{3}.*) the output will be collection. so use a for each activity parse in the collection variable from Matches activity and you can test that its working by using a message box activity inside the for each and parsing in “item”. it will only print out the line you want. let me know if you need further assistance.


If the 3 letters changes from pdf to pdf (e,g, XYZ may be VWXYZ), will this regex expression still work?

@Needhelp, no, it wouldnt work, because is only looking for 3 uppercase letters in the beginning, if its 4 or more it wouldn’t work, gather a bigger sample of your documents and see how they all look like and if they change a lot just comment back here so we can modify the expression

