Extract info from a changing pdf

Hello i have a pdf file. THat has alot of values. I need to extract some elements. ( Only problem this elements are not always the same value it can differ from pdf to pdf)

I need to extract following

  • “6569ACDCIMVZ1C”
  • “3904”
  • “PCON12096A”
  • “249”

thanks in advanced

invoice_9.pdf (42.7 KB)

hello @langsem can you attach 2-3 more PDF files
it will help us to build the logic

yes :slight_smile:

invoice_17.pdf (42.7 KB) invoice_35.pdf (42.7 KB) invoice_41.pdf (42.6 KB)

Hey @langsem,

You can use the ReadPDFText activity and store the entire result in string variable.
Then prepare the regular expression for the fields you want to extract.

tried it, but dident get the exact values i wanted :_//

@langsem sorry for late response (was stuck with urgent project)
can you tell us what is the logic of choosing 6569ACDCIMVZ1C and PCON12096A over others…

The other values are never chaning, but this values are

but if you check value for U11 amount is changing…
invoice_35.pdf—U11 MONTERINGSGRUPPE 11 1,0 STK 1750,00 1750,00
invoice_9.pdf—U11 MONTERINGSGRUPPE 11 1,0 STK 1674,00 1674,00
Invoice_17.pdf—U11 MONTERINGSGRUPPE 11 1,0 STK 1350,00 1350,00

or if you have any logic to identify which lines to pick for processing

have another program wich stores this values.

I mean how I can identify I have to process only 6569ACDCIMVZ1C, PCON12096A and not other…

Yes, thats the issue, so if the pdf dont contain it then i have another function for it. but if it contains it i need those values

Main.xaml (22.9 KB)

let us know if this works for you…
adjust image and adobe selector as per your configuration…

update: output is

Thanks let me check

I just want to make comment here, if the process changes frequently and there are no rules, see if the process is good for RPA automation or it needs some kind of changes. Often you would find that writing RPA project for those kind of processes is harder than changing the process a little bit