Multiple field data extraction from PDF

Hi All,

I have a pdf from which i need to extract the details for eg the part number, amount ,shipping cost and all. and the pdf can contain this details multiple times in single pdf on different pages of the same pdf file and i need to extract all the instances of this details.So can i achieve this.

Yes @vishal_nachankar
You can achieve this by using regex or ocr methods!

Hi @vishal_nachankar

Use Regex

I hope it helps!!

no through Regx it is not possible as when we convert the pdf to text there is no unique identifer for it

@vishal_nachankar

Use read pdf text
Use regex or string manipulations to pull the data from the string

@vishal_nachankar

Use Document Understanding

how can we do it for multiple items which are having same keyword but for eg suppose my pdf has part number multiple time on different pages how can i acheive it

@vishal_nachankar

In the Document understanding we built a workflow to classify the files and extract the data from the pdf. In Data Extraction scope give the Form extractor then it will extract the field by considering the template.

Hope it helps!!

You can achieve this using regex

1 Like

Document understanding useful for only same formatted pdf files otherwise pdf format is not unique then bot not able to fetch right data.

@rushikeshlanke2

So you have to train the pdf files to extract the data using aicentre and ml skill in document understanding.