Extract data from PDF(vertically)

I want to extract address and table data of the PDF. Please help.Here is my PDF :bristanGroup.pdf (103.2 KB)

–use start process activity and pass the file path of pdf as input
–this will open the pdf file in foreground
–now use a SCREEN SCRAPPING METHOD from design tab and scrape on the region that we need. Use OCR method in screen scrapping
–next after scrapping the address and getting the output next use a SEND HOT KEY activity with key as pgdn which will get us the table in the pdf…use n number of send hot key activity with down and followed by use again SCREEN SCRAPPING METHOD and get the table structure as a string variable
–now use a GENERATE DATATABLE activity and pass the string variable as input and get the outpu with a variable of type datatable which can be passed as input to WRITE RANGE ACTIVITY which will enter the data to a excel

Cheers @Oyndrila_Chowdhury

1 Like

Thank you @Palaniyappan. But i can’t take the PDF table as data table. Because there is no specific format to set the column data in specific column.

Convert pdf to excel and then extract data

Can you tell me the steps. Because i convert PDF to excel but i can’t find the logic to catch specific data according to header. here is my excel file
Untitled.xml (8.1 KB)

Read pdf using Uipath.pdf.activities
Then Try to apply Regular Expression to extract Data

Please share excel file

here is my excel file Untitled.xml (7.6 KB)
and pdf of the excel
bristanGroup.PDF (85.3 KB)

1.Address can be extracted by finding the index of cell containing “Invoice Address” and adding one to that address upto index of cell containing “Part Number”
2.Line items can be extracted by findind the index of “Part Number” to the index of “Paymentof this” and then apply some regular expression in it.

Since i m not able to acces xml file.

But data is unstructured form because after conversion also there are some characters which are not present in pdf… So its better to use abby flexicapture here… Because pdf file will not be same all the time.

invoicetest1.7z (6.3 KB)
This is my zip excel file. Though i find “Invoice Address” cell but there is also “Deliver address” data. then how can i remove them.

why dont you use screen scrapping for this purpose

here delivery address is extracted separately

1 Like

Yes. It works. But how can i take the table part . Each “part no.” in a specific column and if “part no.” increased then that also insert