Help with extracting from pdf

grish · March 6, 2019, 10:18am

hi i am triyng to extract some details in a pdf file.
Here how i do :
i open the pdf
i use scrape relative to extract the data

but the problem is some data are on the front page and some others are in the middle and last page. Is there a better way to get these datas without trying to move manually (with hotkeys) the page ?

thanks

anil5 · March 6, 2019, 10:22am

Hi,

You can use regular expressions to extract the text, use read pdf text and provide us the text so that we can help you how to extract information using regular expressions.

grish · March 6, 2019, 1:42pm

i can’t send you the document, it’s my company’s .
can you explain with an example ?

anil5 · March 6, 2019, 1:44pm

@grish,

You have to use regular expressions to extract values, Google how to use regular expression.
Or
You can use computer vision activities

anil5 · March 6, 2019, 2:46pm

Hi @grish,

Suppose you have a given text like below and you want to extract invoice No and date from given text, and invoice no and date is in middle of the text.

[DISBURSEMENT]
MAERSK LINE A/S INVOICE NO : SHVDA064505
ATT GROUP PROCUREMENT ESPLANADEN 52@ DATE : 27/89/2018
1263 COPENHAGEN DENMARK BILL FORM : SHBFN@38112

To Extract Invoice No Use the below Regular Expression

(?<=INVOICE NO :)(.+?)(?=\n)

(?<=INVOICE NO : ) - I have made INVOICE NO : as constant because this value doesn’t change
(.+?) - I am telling to take everything after the constant.
(?=\n) - Take the value before new line

To Extract Date use the below Regular expression :

(?<=DATE :)(.+?)(?=\n)
The same thing as above.

You can use https://regex101.com/ to test this.

Try this out and let me know if you face any difficulties in understanding.

grish · March 6, 2019, 3:05pm

thanks for the reply
regexp
i am maybe wrong but this wasn’t supposed to match ?

anil5 · March 6, 2019, 3:11pm

Hi,

For me it matches

pavanh003 · March 6, 2019, 3:11pm

Hi @anil5,

I dont think so the computer vision activities are able to get the required values for example the CV get text activithy to get the value corresponding to a reference is not happening.

I wold like @Ryan_Rush to look into the issue.

Regards,
Pavan H.

anil5 · March 6, 2019, 3:15pm

@pavanh003,

To try it out, in the CV.getText, click indicate and do a box selection in the wizard, instead of clicking a word. After that you will need to pick an anchor for it. At runtime, it will scrape the whole area of the selection. Very similar to scrape-relative

pavanh003 · March 6, 2019, 3:18pm

Hi,

Tried as same as you said but din work for me.
If you have a example can you please share

Regards,
Pavan H

grish · March 6, 2019, 3:19pm

it was the \newline, i forgot to do it

anil5 · March 6, 2019, 3:31pm

@pavanh003,

Please find the workflow.

I have used scanned pdf to extract the information, i will send the scanned pdf separately personal message as its confidential.

Sequence3.xaml (27.5 KB)

From the below image, i am trying to extract shipping based on Kanoo as anchor base.

anil5 · March 6, 2019, 3:45pm

@grish, did you understand the regex concept and were you able to extract the values like Invoice number and date.

grish · March 7, 2019, 7:31am

hi @anil5, i’m still trying to figure it out but i understand a little bit more now,
i am not able (for now) to extract the values but hope i can

system · March 10, 2019, 7:31am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Read PDF data Help	3	1191	August 1, 2018
Extraction Academy Feedback studio	12	1059	May 23, 2019
How to Extract a particular Data from a pdf file? Help	11	9426	August 8, 2019
How to extract data from pdf Help selector , uiautomation , activities , question	10	1122	February 5, 2020
Extract specific text from pdf to excel Help	12	2828	June 11, 2019

Help with extracting from pdf

Related topics