Extract PDF Table using form Extractor

Hi Team,

I have multiple pdf files where I have to extract specific text. The text may not be always in same format but there is a specific keyword which enables us to identify what text to extract? Since the formats are not always same ?How can we achieve this?

Here Milestone is the keyword: Sometimes the text is in tabular format sometimes its not.Its mostly under Performance and Miltesone section most of the times ,sometimes its just text description.

Ist PDF Milestone Details

I need to capture each milestone details , Amount ,hours,Due Date:

Second PDF Milestone Details:

Hie @dutta.marina if the keyword is fix then you can use String manipulation method and pass the fix letter as a Reference .
as an example -( pdfOutput.Split(“RefrenceKeyword”)(1).tostring)
and if you have to extract data between 2 reference
pdfOutput.Split(“RefrenceKeyword”)(1).tostring.split(“RefrenceSecond”)(0).tostring.trim
change the reference and index position as per you need .
cheers

Hi All,

I have PDF files where I have to extract specific information (Milestone DEtails) from Mile Stone Table. I need to extract the milestone information from the section Performance and Milestone section.The Milestone details are in two different formats given below: How can I achieve this using Document understanding of Regex Extractor:

I need to capture Brief description, Amount Due Date
First format of PDF files

Second Format of PDF FILES:

@Anil_G Any help on this?

@dutta.marina

first thing for these table type of data regex is not a best choice

but if you still need then we need some actual samples in pdf …where we can try finding any match and use

cheers

@Anil_G

What is the best solution for these data .?

@dutta.marina

As this is a proper seggregated table …you can try with du model and training on it to extract the table data and you can as well classify before extracting for different formats

cheers

@Anil_G

Can we use form extrator for two different templates?

@dutta.marina

use classifier and create different templates in form extractor

cheers

@Anil_G

Sent you two pdf samples. Can you help on that?