I need to read from here:
"Invoice No. 7334/461
A.B.N 26 008 672 179 Invoice Date 21/03/2021
Level 3, 25 Rowe Avenue, Rivervale WA 6103
Ext.
SEQ97318 Account No. 31177
08:07 Order No. 994 HUNTERS HILL
J4SvTAfaeP Date Order Received 21/03/2022
Job
994 HUNTERS HILL Page 1 of 1
PRODUCT DESCRIPTION QUANTITY UNIT RATE DISCOUNT AMOUNT GST AMOUNT
CODE EXCL EXCL PAYABLE INCL
OR SIZE GST GST GST
4470390 CLEANER GLASS SIMPLE GREEN 750ML RTU 00168 1 EACH 6.00 D 6.00 0.60 6.60
0065280 GLOVES RUBBER SABCO 3PK LATEX MED SAB80001 1 EACH 5.00 5.00 0.50 5.50
4471041 BRUSH NAIL SABCO NAIL BRUSH 25007 1 EACH 2.91 2.91 0.29 3.20
4460431 BRUSH MR CLEAN SHOE PB477 1 EACH 2.99 2.99 0.30 3.29
9036011974504599473 TOBY MERRELL
TOTAL TOTAL TOTAL
AMOUNT GST AMOUNT
EXCL GST PAYABLE INC GST
16.90 1.69 18.59"
and enter to excel , product code, description, quantity, rate and the detail on the last line that says someone’s name (for unstance here- TOBY MERRELL).
i’m not good with Regex and don’t know how to loop through the table and get only needed info? Please, any elo will be great!!
Also, i’m entering it to Excel to columns and i’m looping through the invoices in the folder. Some columns will have many rows and some of them only one. Will UIPath start each new invoice from the new line, even if the previous ones are not filled for each column?
Could you provide us the Expected Output from this Data in the form of Excel ?
This would help us to figure out the fields to be extracted and the order to be extracted.
In addition, you could also provide us Information about the PDF files being used. If the PDF’s are always digital, make sure that you set PreserveFormat property to True in Read PDF Text Activity and then Provide us the Input Text Data.
Also, Just to get Cleared with the Table Structure, if you could provide us the PDF Document, we would be able to suggest some other alternatives if it doesn’t work with Regex.
because in each invoice there are more than one product! if you look at pdf- 4 products, but only one name. And in other invoices there can be any number of products, from one to 100, but oly one ae at the end. and one invoice number. The desired excel i made myself. and the question was- it is even possible?
@natasha6 , We don’t know the Structure of all the Possible Data, Keeping in mind that there might be multiple names like “TOBY MURREL” that we need to Capture. So Unless, we have the full picture of the Data format where there are multiple Products, I do not think we can come to a Conclusion of Possible/Not Possible just yet.
Especially since the PDF is Digital, we should be able to find some way to extract the relevant data.
However, Below is the workflow that performs Extraction for the PDF Provided, It doesn’t capture the Name. But I do think that it is possible by using String Manipulation.
You Could Check the workflow for Different PDF’s and let us know if it is the same for all, Keeping in mind that the Columns will always be in the same format. Extract_Table_Regex.xaml (10.7 KB)
If required to assist further with Multiple Products data, we would require you to provide us with the sample data to work on.
Thank you! it is working for all pdfs! But not exactly what i’m after… everything i need:
Invoice number, date, job, product code, description, quantity, rate and the detail on the last line that says someone’s name. And i need to loop through all invoices in the folder. I developed reading of Invoice number, date, job from all invoices and successfully entering it to excel. Now just need the rest.
@natasha6 , The above fields were also considered and it was extracted using Regex.
As mentioned before, we would like to get the Pdf which contains multiple Products, then we will be able to create the Correct Logic for Identifying the names.