I have a pdf from which i need to extract the details for eg the part number, amount ,shipping cost and all. and the pdf can contain this details multiple times in single pdf on different pages of the same pdf file and i need to extract all the instances of this details.So can i achieve this.
You can achieve this by using regex or ocr methods!
no through Regx it is not possible as when we convert the pdf to text there is no unique identifer for it
Use read pdf text
Use regex or string manipulations to pull the data from the string
Use Document Understanding
how can we do it for multiple items which are having same keyword but for eg suppose my pdf has part number multiple time on different pages how can i acheive it
In the Document understanding we built a workflow to classify the files and extract the data from the pdf. In Data Extraction scope give the Form extractor then it will extract the field by considering the template.
Hope it helps!!
You can achieve this using regex
Document understanding useful for only same formatted pdf files otherwise pdf format is not unique then bot not able to fetch right data.
So you have to train the pdf files to extract the data using aicentre and ml skill in document understanding.