ScrapTable from PDF Image into Data table

Hello,

I have several PDF files where they have invoices in them (Tables) but they’re images, and I have to scrap the data into excel…

how’s that possible I’ve tried several solution but non worked !

Hey @MHarakeh

Could you please mention the solutions you already tried ?

Thanks
#nK

Hey @Nithinkrishna

I tried using recorder, screen scraping and OCR but non worked as needed…

The most accurate OCR in scraping the data was UiPath Screen OCR

But all the data in the PDF including the table are coming in one line
Example: Invoice No:, Invoice Date, Table headers, table rows…

Hey @MHarakeh

Could you please show a sample ?

Thanks
#nK

Hi @Nithinkrishna

Here’s an example from UiPath : 01 4Tech keyboard black 1 600.00 600.00 02 A4Tech HS-800 headphone 1 900.00 900.00 03 Asus Memo Pad Tablet 1 7800.00 7800.00 04 HP Desktop C2500 Keyboard+Mouse 1 1500.00 1500.00 05 Logitech B170 Wireless Mouse (Black) 2 600.00 1200.00 06 Beng G2020HDA Screen 2 1500.00 3000.00 07 Logitech B525 Commercial HD Webcam 1 2000.00 2000.00 Sub Total 17000.00 GST 8% 1360.00 Total 18360.00

I guess we can use some string manipulation, but I don’t think that’ll work because each PDF might have different amount of data…

Yep, Need to see the PDF’s for a better idea.


This is a sample, but sometime it might have less items/more items which can be not only a single page…

1 Like

Hey @MHarakeh

This looks like a clear cut digital scanned invoice (since you confirmed it’s an image)

Read PDF with OCR should do the job of extraction & some string manipulations with Generate DataTable activity

But when you say the format is not constant you mean it may have more rows in the table but still the headers and the top content is all same ?

Thanks
#nK

@MHarakeh ,

Can you try using this methods ?

or

Did you tried document understanding ??

please find the link below for your reference

Data Scraping solved this, it wasn’t working earlier due to accessibility settings in Adobe Acrobat Reader…

Thanks @Nithinkrishna @muhamed_fasil much appreciated

1 Like

Cool @MHarakeh :slightly_smiling_face::+1:

1 Like

@Nithinkrishna

I’m facing an issue when trying to make my selector dynamic…
I assigned filename = currenfile.fullname
I’m using filename in the selector but am getting the following error

@muhamed_fasil any idea on this ?

Hey @MHarakeh

Just remove .pdf from the selector & use filename = currentfile.Name

Hope this helps.

Thanks
#nK

Did the mentioned but still getting that error " Variable or argument “filename” is not defined in the current scope

Should there be anything in the default value ? and what is it ?

The selector will not be validated because of the variable but please run and check !

This didn’t work
I’m using an open application activity and indicating the PDF file, then am editing the selector and assigning my variable name under title…
But still when it reached that step it’s opening acrobat reader and I have to select the file that it should open, when I select the file am getting another error
image

1 Like

As you can see in the closest match, the file name passed through the variable is wrong - which has an extra _1 at the end of the name.

Yeah you’re right, but still it’s not opening the PDF file automatically, it’s opening the adobe acrobat reader and I have to open the file from there…