Hi,
I want to extract the name of a person from invoices. I have used get text and I indicated the target and anchor for it but for each invoice the anchor text is different.
Below I have added some samples.
How to extract the name.
Regards,
Hi,
I want to extract the name of a person from invoices. I have used get text and I indicated the target and anchor for it but for each invoice the anchor text is different.
Below I have added some samples.
How to extract the name.
Regards,
You can use
first read pdf text
or read pdf with ocr
then take assign activity set variable of string and
use the below expression
System.Text.RegularExpressions.Regex.Match( readpdftextoutput,“(?<=Name:).*).trim”
share input file for more clarification
actually I am opening those documents in web portal itself. So I used “use application” activity and I want to extract the name.
IN1817_Payslip_Jul2023.pdf (112.9 KB)
Payslip - June 2023 (English).pdf (109.2 KB)
Payslip_Dec_2022.pdf (9.6 KB)
Regards,
use read pdf text activity
then use write to text file
then you can use regex as said in above for whichever fields you need to extract
actually these documents are not downloaded and we are just opening from a website. So can I use read pdf text activity?
the documents which you have sent are pdfs right?
of course. you asked sample for reference, so I sent in pdf but actually the customer uploads these in a website. We are not downloading or opening pdf format. Whenever we clikc on the document name, it will open a new tab in the website and that to in a browser. So we need to extract the data as I shared the screenshots earlier.
ok once try read pdf text activity in use application activity
oh ok got it
then try to use anchor base activity and check
anchor must be
Name:|EMPNAME|SURNAME AND NAME
actually I used Modern activity of “Get Text” and I indicated the target and anchor too. but whenever we get the second type of document the element is not recognised due to change in the anchor text as I mentioned in the earlier post. Actually the target is ok but the anchor text is changing for different documents. So asking for how to make the anchor text dynamic.
take anchor selector and pass the variable
variable contains the name whichever format it has appeared
can you explain me more please
please share the selector of anchor
use get text and take the value of the anchor which you want and store it into variable.
then pass the variable into the anchor selector.
in my first post I have shared some screenshots of the target and anchor and also the pdf docs. please check once. we are looking to extract the name of the person i.e. our target and anchor is against the name. please refer once and let me know the solution for this.
please share the selector of the anchor selector will be in edit selector panel
or you can try by screen scraping get full text activity and then you can split the required field which you need
not this you used get text activity right?
in that activity top right corner 3 lines will be there click on it and click on edit selector