Getting PDF text from specific position


Is there any way to get only the highlighted part form a pdf?


@Needhelp, sure, but i need to know a couple of things first.

what on that line or at least region doesn’t change, i need something to use as an anchor, and that must be something present on all the documents you need to extract this from.

I attempted tagged reading and anchor base to find element and get text. However when i go to another pdf, which looks the same but the formatting (positioning) of the word are different and it didnt work.

Is there any alternative solution?

@Needhelp, theres many ways to do this, i wanted to write a regex expression that will help you get the text, but i need you to answer the question i asked.


I am not sure what is on the region. However i have 2 examples

Am I correct in saying on the second page we want “ABC Company Pte Ltd”, so already you can see that the text we want from both pages begins with Uppercase 3 letters. if that’s the same across all documents then we can get it using regex, i am gonna send you the solution shortly


@Needhelp, so now you need to use the Read PDF activity, the output of that activity will be everything from the document saved in a string variable. Then use a Matches activity, the input is the string variable from the read pdf and then the expression is: ^([A-Z]{3}.*) the output will be collection. so use a for each activity parse in the collection variable from Matches activity and you can test that its working by using a message box activity inside the for each and parsing in “item”. it will only print out the line you want. let me know if you need further assistance.


Thanks alot.

If the 3 letters changes from pdf to pdf (e,g, XYZ may be VWXYZ), will this regex expression still work?

@Needhelp, no, it wouldnt work, because is only looking for 3 uppercase letters in the beginning, if its 4 or more it wouldn’t work, gather a bigger sample of your documents and see how they all look like and if they change a lot just comment back here so we can modify the expression

@Needhelp, and if this solved your problem, please mark it as a solution.