Table from PDF text

INPUT :

Output Report******* page 1
–Table of Contents–
Name Item1 Item2 Item3 Amount Page
ABC T-14 1 1 1 4
ABC T-2 1 1 8 8
ABC T-2 1 1 15 9

Output Report ******* page 2
–Table of Contents–
ABC T-88 99 1 18 12
ABC T-21 999 1 28 17
ABC T-50 999 1 5 21

Output Report******* page 3
–Table of Contents–
ABC T-8 99 1 18 24
ABC T-098 999 1 28 27
ABC T-21 999 1 85 31
** some random text info till the last page ****

this is the table text extracted from a PDF, i need the output for this as below image in a single DT.
**Note: some PDF’s table text will be for only one page and in some PDF’s the table will be in multiple pages,The given input pdf is having 3 pages of table for the reference. After extracting the text, in between the table data "Output Report ******* page *
–Table of Contents-- " is existed .

@ppr
@Yoichi
@ushu
@vishal.kp
@supermanPunch
@Anil_G
@rlgandu
@jose.ordonez1
@srinivasmarneni
@Parvathy
@Jayesh_678
@sonaliaggarwal47
Happy Automation ,

regards,
@kmaddikatla

i am able to do this if the table is in single page, but failing if the table is with in multiple pages.
Input is dynamically from single page or multiple pages

Hi @kmaddikatla

Pls follow the below link, might be helpful.

@kmaddikatla

firs thing what did you do for one page?

ideally if know what would be there in end then its easy to remove the random text

and then coming to extraction looks like regex would work for this

a sample is here refine as needed but logic remains same

(?<=Table of Contents–(\r?\n)*)(Name Item1 Item2 Item3 Amount Page\r?\n)*(\w* T-\d+ \d+ \d+ \d+ \d+\r?\n)+

cheers

1 Like

hello @Anil_G ,
thanks for your response ,
If the table is in one single page, then i’m splitting the data using double split
Table_Text = Split(Split(PDFText," Amount Page")(1),“Output Report”)(0)
after this string manupalation , & then using the generate DT activity for converting the text to DT.
PDFText is the variable from ReadPDF Text Activity ,
Table_Text is the string variable created to store the text after splitting

happy automation
regards,
@kmaddikatla

@kmaddikatla

Please try above regex and get the data then, as mentioned if needed modify the regex as per your requirement and then use the same generate table

Cheers

Hi @kmaddikatla

Please check the below workflow file, let me know if any changes required
Sequence1.xaml (13.0 KB)

Output:

Hope it helps!!