How to get desired text from a document


I want to extract the name of a person from invoices. I have used get text and I indicated the target and anchor for it but for each invoice the anchor text is different.

Below I have added some samples.




How to extract the name.


You can use
first read pdf text
or read pdf with ocr
then take assign activity set variable of string and
use the below expression

System.Text.RegularExpressions.Regex.Match( readpdftextoutput,“(?<=Name:).*).trim”

share input file for more clarification

actually I am opening those documents in web portal itself. So I used “use application” activity and I want to extract the name.

IN1817_Payslip_Jul2023.pdf (112.9 KB)
Payslip - June 2023 (English).pdf (109.2 KB)
Payslip_Dec_2022.pdf (9.6 KB)


use read pdf text activity
then use write to text file
then you can use regex as said in above for whichever fields you need to extract

actually these documents are not downloaded and we are just opening from a website. So can I use read pdf text activity?

the documents which you have sent are pdfs right?

of course. you asked sample for reference, so I sent in pdf but actually the customer uploads these in a website. We are not downloading or opening pdf format. Whenever we clikc on the document name, it will open a new tab in the website and that to in a browser. So we need to extract the data as I shared the screenshots earlier.

ok once try read pdf text activity in use application activity

oh ok got it
then try to use anchor base activity and check

anchor must be

actually I used Modern activity of “Get Text” and I indicated the target and anchor too. but whenever we get the second type of document the element is not recognised due to change in the anchor text as I mentioned in the earlier post. Actually the target is ok but the anchor text is changing for different documents. So asking for how to make the anchor text dynamic.

take anchor selector and pass the variable
variable contains the name whichever format it has appeared

can you explain me more please

please share the selector of anchor

use get text and take the value of the anchor which you want and store it into variable.
then pass the variable into the anchor selector.

in my first post I have shared some screenshots of the target and anchor and also the pdf docs. please check once. we are looking to extract the name of the person i.e. our target and anchor is against the name. please refer once and let me know the solution for this.

please share the selector of the anchor selector will be in edit selector panel

or you can try by screen scraping get full text activity and then you can split the required field which you need

please find the anchor selectors

not this you used get text activity right?
in that activity top right corner 3 lines will be there click on it and click on edit selector