How to use regular expressions for pdf document

Sunitha_Bist · September 18, 2017, 6:34am

HI,

I have three pdf files in that I need to extract data below the field called Description or PRODUCT NAME or Description & Specification of Goods from those files using regular expressions.

One pdf file is scanned image in three of the files, I used ABBYY ocr to read the pdf but the output is not efficient since it has some misspelling of words…

Is anyone know how to solve both the problems?

these are all the pdf filesInvoice 5.pdf (51.2 KB)
Invoice 8.pdf (159.6 KB)
Invoice 7.pdf (47.4 KB)

Florent_Salendres · September 18, 2017, 7:46am

Hello,

I believe replies on those posts could be relevent to you too.

Cheers

lissynikkytha · September 18, 2017, 8:11am

You need identify the onset of tabular data based on some keywords and for each product, Split column based on delimiter and place it in data Table. you can then extract the necessary columns easily.
As far as i know, Abby OCR extract with more accuracy. You could have faced issues while processing Invoice 7 document because of legibility of the document.

Sunitha_Bist · September 18, 2017, 8:16am

@lissynikkytha I tried using abbyy but it is not extraction the word properly…

lissynikkytha · September 18, 2017, 8:26am

Hi,

Did you try modifying the scale? Are you facing issues with Invoice 5 and Invoice 8 samples as well?

Sunitha_Bist · September 18, 2017, 8:53am

@lissynikkytha thank you… it is working, but how can I apply regex expression for that to extract only the product name from the output text file?

cvent978 · September 19, 2017, 2:26pm

To use regex use the “Matches” activity which you can find in the activities panel by searching matches.

Topic		Replies	Views
Read PDF text, write specific information Help pdf , activities , regex , question , data_manipulation	3	876	November 19, 2019
How extract specific data by using RegEx Help	12	1762	January 30, 2020
Regex Based Extractor - Table Document Understanding activities , question	6	1677	March 1, 2021
I need to extract all the details from invoices pdf and line item describtion quantity and all the fields and i need to do this for all pdf files in the folder Studio studio , question , activities_panel	23	2978	June 30, 2021
Need help for Invoice extraction using Reg Ex Activities pdf , activities , question	7	1455	December 9, 2021

Most Active Users - Yesterday
ashokkarale
mkankatala
Parvathy
vrdabberu
sandyarpa767
pravallikapaluri
gantamohan502
indiedev91
naveen.s
Anil_G
More details...

How to use regular expressions for pdf document

Related Topics