PDF that has multiple pages but same structure to Excel

bidfood_IT · May 23, 2024, 8:25am

Hi all,

Can any of you help me by giving me a solution? I have been trying for 3 days right now, but i can’t seem to find a solution. I tried almost all method from Text to documents learning, but i am still failing.

So the real question here is: How can i extract the tables from the pdf files? From page 1 - page 10 (same format), and then how can i transfer them over to excel?

Few things i would like to point out, that each page representing a place, so i was thinking to create a different tab for each.

Solution i have tried:

Using taxonomy and build the template along with OCR reading tools, but i am still failing.

This is only Page 1, but page 2-10 is the same. Having the same format.

Field to extract:

Address
PO number
The tables (all of them)

Main.xaml.json (141 Bytes)

supermanPunch · May 23, 2024, 11:11am

Hi @bidfood_IT ,

Could you let us know whether the PDF document would be always Digital/Native or if it could contain scanned documents as well ?

vincenthoh.capital · May 23, 2024, 11:18am

@supermanPunch,

Hi, This is my “home” account, sorry for the confusion because i didnt bring my work laptop home. Anyway, to answer your question, the PDF will always be digital copy without containing scanned docs.

In fact, this is a computer generated PDF through one of our customer’s portal, so the format will always be the same. I have hide some of the details, but the format will be the same as shown on the pic.

Appreciate if you could assist to help me!

supermanPunch · May 23, 2024, 11:25am

@vincenthoh.capital ,

In that case, we could perform a first check by using Regular Expressions to get the details of each rows in the table.

Check if it is feasible and if the regular expression derived is applicable to all the different examples of files that you would have.

This should be our first analysis and then we could move on to DU if this does not work.

For checking the feasibility with RegEx, would you be able to provide a sample document ? We would also need to understand the constraints of the table columns (If All values will be present or if there are any optional values)

If you would want to use DU without the check of Regex, you could use the Invoice Model that UiPath provides out of the Box and check if it works.

vincenthoh.capital · May 23, 2024, 1:19pm

Well, is a little sensitive because this is the costing price where we are giving out to other company, and this forum is open right? Any other way we could go around on this? Because like i mentioned on my first post, the template is exactly the same except for the figures. In terms of the headings for the tables are all the same.

Few things are different:

PO number
Invoice to:
Deliver to:

The rest all are the same except for tables figures and description.

vincenthoh.capital · May 23, 2024, 1:21pm

Just for your additional information, the pdf itself technically consist of 10 outlets. Meaning:

Page 1: Outlet A
Page 2: Outlet b

Which is why the “Deliver to” and PO number is different. I tried to turn it into .txt format, and the table only has 1 space, which i cant use another method to turn it into tables.

vincenthoh.capital · May 23, 2024, 1:34pm

Anyway, i am very serious about learning this, so i would like to apologize in advance for not being able to show you the full .pdf, but please let me know what else i could do in this case. Like i mentioned earlier, all things are the same nothing change except for the information.

Please

Topic		Replies	Views
Extract table into excel in a pdf document Studio studio , question , activities_panel	1	24	December 10, 2024
Tables from multiple PDFs over to 1 sheet in excel Help question	2	791	March 27, 2020
Extract table data from a pdf to Excel Studio	3	1201	April 17, 2024
Exctract table from pdf to excel Studio	3	2534	February 28, 2021
How to extract Multiple datatables from a PDF which contains multiple pages (Max 3 pages) AI Center question , document_understanding , ai_center , pdf-extraction	9	82	October 10, 2024

Most Active Users - Yesterday
Anil_G
V_Roboto_V
mkankatala
ashokkarale
sharazkm32
Somanath1
Sai17
Jon_G
Sonadri_Tanaya_Mishra_EXT
harsh.savaliya
More details...

PDF that has multiple pages but same structure to Excel

Related topics