Can't extract amount from Invoice PDF using Get Text

I am trying to extract the total amount to be paid (“Total de plata”) from the attached invoice: Factură 77179908 10.07.2019.PDF (55.3 KB)

This is the selector I use:

and I UiPath highlight it correctly:

but when I run the workflow, the value “5.961,47” is not extracted.
image

Any idea what I should do different?

Hi Cezar,

If it’s only a sigle value that you need to extract, I suggest to use Substring or regex to retrieve what you need.
You can test here the Regex combinations.

Also, why I prefer Regex/Substring:

  1. the process is in the background.
  2. No environmental issues with Adobe…

If you need more details, let me know.

Vasile.

2 Likes

Hi @Cardon_Cezar

I created a sample project from Mr.Wasea’s advice.
Maybe you can use it for guideline

Step:
1.Get current project directory
2.Get PDF file path
3.For each to loop read data from PDF file
4.Use Regex to get Total de plata,Total,TVA 19%
5.Check output value

*You need to install [UiPath.PDF.Activities.ReadPDFText] for using Read Full PDF activities


Check output
image

Get_Text_fromPDF.zip (71.9 KB)

I hope it will be useful to you :blush:

2 Likes

Hi @Natapong

Thank you for help.
I installed the package UiPath.PDF.Activities


… but still I can’t use the workflow:

Salut @wasea

Unfortunately I have almost 0 programming knowledge, so I can’t really use RegEx.

How would you see the Substring expression?

@Cardon_Cezar If you have Installed PDF Actvities, I suggest you to use to read the input pdf file using Read PDF Text Activity, and then Output the Value in a Messag Box or Write Line to Check if the Value is Appearing in it. If it appears then we can further do regex/String Manipulation to get the value you need.

2 Likes

Hi @Cardon_Cezar

I created a new sample project compatible with your activity version.
Please try to open it again.

Get_Text_fromPDF.zip (75.3 KB)

2 Likes

Hi Cezar,

In addition what @Natapong did, here you can see an approximate substring command where we get the same values:

Tota_de_plata = PDF_text.Substring(PDF_text.IndexOf(“plata:”)+“plata:”.Length).Split(Environment.NewLine.ToCharArray)(0).tostring

Total = PDF_text.Substring(PDF_text.IndexOf(“Total:”)+“Total:”.Length).Split(Environment.NewLine.ToCharArray)(0).tostring

TVA19 = PDF_text.Substring(PDF_text.IndexOf(“TVA 19%:”)+“TVA 19%:”.Length).Split(Environment.NewLine.ToCharArray)(0).tostring

I hope it helps.

Vasile.

3 Likes

Hi @supermanPunch

I used Read PDF to extract the invoice’s text and then I inserted it in the attached txt file using Write Text File.
Factura.txt (1.7 KB)

All that text is a string. How do you suggest to extract in a generic way the amount 5.961,47 from the line “Total de plata: 5.961,47”?
As an anchor, "Total de plata: " is constant on a separate line.

@Cardon_Cezar Is the Format of PDF fixed? That is Other PDF files from which you want to extract the Value have the same Format? If So , we can use same regex expression to get the Value.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.