Hi All
I’m working on Invoice extraction in multiple type of invoices
in this process I’m want to extract the details in PDF is (Description, Price, Quantity)
but in many PDF there is a alternative word are there in the PDF (Example: for Quantity - qty - qte )
If I can use the Read PDF with OCR I can Extract the details from one type of Invoice only,
Actually
this pdf I just create from to post this image in the UiPath forum I can’t share my Invoice’s,
Actually my work is to extract the data from the different type of Invoice’s
for example in PDF’s (I want to extract the Quantity value QTY, qte, quantite) so I can’t extract the exact value of the quantity by use the Document understanding.
If all the column will have data and if the column positions are same…then you can use read pdf as text and then use split activity to get the table text then pass the text to generate datatable and give the column and row separators and as the column places are fixed you can use the index instead of name
To Extract only the table text try identifying a static string just before table and after table in the pdf
Hi @krishna_priya1, use read pdf, save the output in a txt file, paste the content in regexstorm.net and start making the regex patterns. Working with tables isn’t always easy, you may need to remove the “junk” from the invoice before extracting the actual relevant data.
Try building different documents or else…and check which is present and use switch case to extract from different quantity columns…that might work as well
In thw workflow before sending it to extraction use a switch case and before that read the pdf and check if it has quantity or qte and seggregate the document with switch case and use multiple extractions each for one type