How to get table from pdf

How to get the highlighted text in a table format from pdf document.

Things I have tried-
1-Data Scraping from Adobe Acrobat- Is not taking the Whole table.
2-Epsilon AI- Taking too much time,
Not taking the desired table
Not getting Table
3- Feat System(Pdf to Excel)- Not giving the structured Info.

And I don’t want to apply document understanding.

which part you want to extract from table?

Whole highlighted text in a tabular format.

can you share that pdf file if possible ?

It’s Confidential Bro. Can please suggest some solution here?

I don’t have such like pdf so if possible then share demo pdf on my mail.

@shobhit.sachan…couple of options…

  1. Cv extract table
  2. Document understanding…
1 Like

Can you name the nuget package for 1 option?

@shobhit.sachan - i dont think there is a special package…


Try updating your automation package and check…

@shobhit.sachan - This was my sample workflow…

and output

@shobhit.sachan If it is not a scanned PDF, then can you try and use the Read PDF Text Activity and check if the Data that you want to extract appears in a table manner, then maybe we can use regex to group the columns of the tables according to the pattern it resembles. But we would need to have a sample pdf of the same format to be sure if it’s possible using regex.

1 Like

read pdf text is not giving the data in table format.

Can you share this workflow to ?

@shobhit.sachan Just to confirm Can you show us the Screenshot of the extracted text from PDF ?

Here it is, I have tried regex, but for the same format I’m getting different text before the table starts as shown below


second text

It worked for one file and giving the proper output, But how can I apply this in loop. means how can I apply it to multiple files?

@shobhit.sachan …good to know…

Say if you have multiple files in the folder …
1.use directory.getfiles(path) to get all the file names
2. For each loop of file name
3. Use CV scope and CV extract table inside the loop
4. Write Range to write the output (if you output in different excel then in the write range just use your filename.xlsx …

I already faced a similar challenge… A good OCR and Regex together, can do the work :))

But Cv socpe will open the document from where I have generated the Selector for Cv Screen Scope, Have you tried the For each?