Identifying PDF Page Containing Tables & Extracting Table

Vaishnavi_RP · February 10, 2025, 10:52am

I want to extract a table from a PDF file (without using Document Understanding). The table’s page number varies, and it differs for each file. The total page count can exceed 150.

Approach I Followed:

Split the PDF into individual pages.
Check each page for a specific table-related keyword or string.
Extract and save the identified page as a separate PDF.
Open the extracted table page and extract the table using CV Scope & CV Extract Table.

Challenges I Faced:

Accuracy issues in extraction.
The last row of the table is missing, so I have to minimize the screen and extract it.
Extraction is based on the visible screen area, so minimizing or adjusting the screen affects results.

Need Suggestions to Handle This Situation:

How to Find the Page That Contains the Table?
How to Extract a Table from a PDF File?

sharazkm32 · February 10, 2025, 10:58am

See if opening PDF in excel works for you or not. I was also facing similar issue and @Anil_G suggested this solution for me. Worth giving a try.

Vaishnavi_RP · February 10, 2025, 11:17am

Thank you for your quick response. I have tried the steps you suggested, and it works. Can we automate these steps in UiPath?

Anil_G · February 10, 2025, 12:06pm

@Vaishnavi_RP

yes you can…better to go with recording a macro and then the macro can be called from UiPath using invoke macro activity…you can add arguments to macro to send the file details dynamically

cheers

Jon_Smith · February 10, 2025, 1:12pm

Why do you have this limitation and what limitations besides this do you have?

Doing it the way you want is not ideal, hence the issues you face, can you reconsidering using the appropriate tools?

Vaishnavi_RP · February 10, 2025, 1:45pm

What alternative approach or tool would you suggest for this? I’d like to understand if there’s a more effective way to handle it.

Jon_Smith · February 10, 2025, 3:06pm

If you can explain why you have decided Document Understanding is a hard no I can explain some alternatives.

If, for example its a cost thing then there is no point me suggesting things that also cost money.

Vaishnavi_RP · February 12, 2025, 7:33am

As this frequency is low (once or twice a year), we decided not to go with the Document Understanding license. I tried using regular expressions, but couldn’t get it to work. Then, I tried the CV Scope & CV extract Table approach to extract the table, but the accuracy is still lacking. now looking for a more effective solution for this

adilhassanpost · February 12, 2025, 11:20am

Your approach is great! To find the right page, try using text extraction with Regex to detect table-like patterns instead of just keywords. For better accuracy, avoid CV Scope and use UiPath’s PDF activities or Python libraries (like pdfplumber) to extract tables properly. If rows are getting cut off, test different OCR engines or adjust PDF processing settings.

Jon_Smith · February 12, 2025, 12:16pm

Hmm, interesting. As far as I was aware document understanding is licenced by AI units and not as a one off licence costs, so even if you do it once or twice a year you’ll still get the same cost / usage of AI units, which are used across many features.

After this, an AI Agent will be a good way to handle this, but its of course not in public preview yet, dunno if you can wait?

Topic		Replies	Views
Extracting table in PDF document dynamically Activities activities , question , document_understanding	11	1895	February 1, 2023
Table Extraction and Splitting in pdf using UiPath Studio studio , question , activities_panel	4	543	January 18, 2024
How to extract tables when multiple pages in pdf file Studio studio , question , activities_panel	9	777	November 23, 2023
Extract table from PDF - Document Understanding Studio studio , question , activities_panel	5	98	October 19, 2024
EXTRACTING TABLE FROM SCANNED IMAGE Help datatable , excel , studio	2	1174	November 17, 2020

Identifying PDF Page Containing Tables & Extracting Table

Approach I Followed:

Challenges I Faced:

Need Suggestions to Handle This Situation:

Related topics