Pdf table row count

Hi,

Have multiple pdf with multiple tables,and need to take the count of rows for a particular table.

Please guide me how to count the rows of table which is present in pdf file.

Hello @yashashwini2322, you could try this:

  1. Read PDF Files**: Use the “Read PDF Text” activity to extract the text content from the PDF file. Make sure that the tables you want to count rows from are correctly recognized as text.
  2. Identify Table Sections**: Depending on the structure of the PDF, tables might be represented as text or characters in a specific pattern. Use regular expressions or string manipulation techniques to identify the sections that correspond to the tables you want to count.
  3. Extract Table Content**: Once you’ve identified the sections containing the tables, extract the relevant text that belongs to each table. This could involve using string manipulation to isolate the relevant part of the text.
  4. Count Rows**: For each table’s extracted text, split it by newline characters to separate rows. The count of rows would be the number of elements in the resulting array after splitting.
  5. Repeat for Multiple PDFs**: If you have multiple PDF files with similar tables, you can loop through each PDF and apply the same process to count rows for each table.

Cheers!! :slight_smile:

@yashashwini2322

use read pdf text activity

if your pdf contains contains the unique values like invoice number, specific keywords, or other distinctive patterns in the pdf text find it

If you find the unique patterns in the text by using regex or string manipulations
increment the count every time if it finds new pattern

Can you please provide any sample flow/ code.
This would be a help full for me

I’m currently on my phone, but I believe if you try following the steps you’ll reach the solution.

  1. Read PDF Text (Output: pdfText) - Read the text from the PDF file.
  2. Assign (Output: tableContent) - Use string manipulation to extract the content of the specific table section.
  3. Assign (Output: rows) - Split the table content into rows using line breaks or other delimiters.
  4. Assign (Output: rowCount) - Count the number of rows using the Count property of the ‘rows’ array.
  5. Log Message - Output the row count using the ‘rowCount’ variable