Identifying PDF Page Containing Tables & Extracting Table

I want to extract a table from a PDF file (without using Document Understanding). The table’s page number varies, and it differs for each file. The total page count can exceed 150.

Approach I Followed:

  • Split the PDF into individual pages.
  • Check each page for a specific table-related keyword or string.
  • Extract and save the identified page as a separate PDF.
  • Open the extracted table page and extract the table using CV Scope & CV Extract Table.

Challenges I Faced:

  • Accuracy issues in extraction.
  • The last row of the table is missing, so I have to minimize the screen and extract it.
  • Extraction is based on the visible screen area, so minimizing or adjusting the screen affects results.

Need Suggestions to Handle This Situation:

  • How to Find the Page That Contains the Table?
  • How to Extract a Table from a PDF File?
1 Like

See if opening PDF in excel works for you or not. I was also facing similar issue and @Anil_G suggested this solution for me. Worth giving a try.

1 Like

Thank you for your quick response. I have tried the steps you suggested, and it works. Can we automate these steps in UiPath?

1 Like

@Vaishnavi_RP

yes you can…better to go with recording a macro and then the macro can be called from UiPath using invoke macro activity…you can add arguments to macro to send the file details dynamically

cheers

Why do you have this limitation and what limitations besides this do you have?

Doing it the way you want is not ideal, hence the issues you face, can you reconsidering using the appropriate tools?

What alternative approach or tool would you suggest for this? I’d like to understand if there’s a more effective way to handle it.

If you can explain why you have decided Document Understanding is a hard no I can explain some alternatives.

If, for example its a cost thing then there is no point me suggesting things that also cost money.

As this frequency is low (once or twice a year), we decided not to go with the Document Understanding license. I tried using regular expressions, but couldn’t get it to work. Then, I tried the CV Scope & CV extract Table approach to extract the table, but the accuracy is still lacking. now looking for a more effective solution for this

Your approach is great! To find the right page, try using text extraction with Regex to detect table-like patterns instead of just keywords. For better accuracy, avoid CV Scope and use UiPath’s PDF activities or Python libraries (like pdfplumber) to extract tables properly. If rows are getting cut off, test different OCR engines or adjust PDF processing settings.

Hmm, interesting. As far as I was aware document understanding is licenced by AI units and not as a one off licence costs, so even if you do it once or twice a year you’ll still get the same cost / usage of AI units, which are used across many features.

After this, an AI Agent will be a good way to handle this, but its of course not in public preview yet, dunno if you can wait?