Multiple field data extraction from PDF

vishal_nachankar · July 20, 2023, 9:14am

Hi All,

I have a pdf from which i need to extract the details for eg the part number, amount ,shipping cost and all. and the pdf can contain this details multiple times in single pdf on different pages of the same pdf file and i need to extract all the instances of this details.So can i achieve this.

Vikas_M · July 20, 2023, 9:16am

Yes @vishal_nachankar
You can achieve this by using regex or ocr methods!

lrtetala · July 20, 2023, 9:16am

Hi @vishal_nachankar

Use Regex

I hope it helps!!

vishal_nachankar · July 20, 2023, 9:17am

no through Regx it is not possible as when we convert the pdf to text there is no unique identifer for it

rlgandu · July 20, 2023, 9:17am

@vishal_nachankar

Use read pdf text
Use regex or string manipulations to pull the data from the string

lrtetala · July 20, 2023, 9:18am

@vishal_nachankar

Use Document Understanding

vishal_nachankar · July 20, 2023, 9:20am

how can we do it for multiple items which are having same keyword but for eg suppose my pdf has part number multiple time on different pages how can i acheive it

lrtetala · July 20, 2023, 9:23am

@vishal_nachankar

In the Document understanding we built a workflow to classify the files and extract the data from the pdf. In Data Extraction scope give the Form extractor then it will extract the field by considering the template.

Hope it helps!!

tazunnisa.badavide · July 20, 2023, 10:12am

You can achieve this using regex

rushikeshlanke2 · July 20, 2023, 12:14pm

Document understanding useful for only same formatted pdf files otherwise pdf format is not unique then bot not able to fetch right data.

rlgandu · July 20, 2023, 12:20pm

@rushikeshlanke2

So you have to train the pdf files to extract the data using aicentre and ml skill in document understanding.

Topic		Replies	Views
I am having a scanned copies of pdf. I am trying to extract paticular field like name, valid from date and valid to dates.But these fileds are twice in single page so its extracting twice in simple fileds Activities activities , question , document_processing	4	929	March 11, 2022
Pdf Extraction from different PDFs Help pdf , activities , question	11	1137	November 14, 2019
Extracting Multiple Text from a PDF Studio excel , pdf , studio , question , activities_panel	2	469	January 13, 2024
Extract Part of PDF Text Academy Feedback	5	862	July 24, 2019
What pdf extraction approach would be for getting different product details from a PDF having multivalued fields? Studio uiautomation	8	313	May 30, 2023

Most Active Users - Yesterday
pikorpa
prashant1603765
Anil_G
ben.smith
jrdev2
More details...

Multiple field data extraction from PDF

Related topics