Hi Everyone, I have a set of invoices combined in 1 PDF file. I want to extract invoice number from all the Invoices. The issue i’m facing here is Invoice formats are not constant. Can anyone guide me how to solve this.
I’m attaching sample PDF below.
Perhaps you should define a selector for each type of invoice.
Hence, you could:
Open the invoice .pdf file
Use the Check App State activity to identify your type of invoice https://docs.uipath.com/activities/docs/n-check-state . Note that each type of invoice should have its own check app state acivity, and its own activity for reading/ extracting the invoice number.
Finally, extract your invoice number from the .pdf.
@Marius_Puscasu But here the problem is i will be having 40+ formats in each pdf, how can i write a code(To get Invoice Number) for all combined because i don’t know which format i get. So how can i achieve this.
40 possible formats looks a bit complex to automate in the traditional way
Do you have by any chance access to UiPath’s Document Understanding?
If so, UiPath has already some trained Machine Learning Skills, focused on invoices that should extract with a pretty good rate the Invoice Numbers from most of the examples you sent.
2nd Option here I’d use is regex extraction. Use Read PDF activities from native PDF(generated digitally) or Read PDF with OCR Activity (scanned documents) and then try to identify the keywords to build a regex in order to extract invoice numbers.
this 2nd option is a bit longer to extract and can be a bit more complex due to the amount of variations in the Invoices.
As a personal opinion, it’s difficult to extract invoice number with single regex or any other rule base approach, I think.
However, it might be possible to use Machine Learning base extractor. (Sorry but I’m not very familiar with it)