Help with extracting from pdf

hi i am triyng to extract some details in a pdf file.
Here how i do :
i open the pdf
i use scrape relative to extract the data

but the problem is some data are on the front page and some others are in the middle and last page. Is there a better way to get these datas without trying to move manually (with hotkeys) the page ?

thanks

Hi,

You can use regular expressions to extract the text, use read pdf text and provide us the text so that we can help you how to extract information using regular expressions.

1 Like

i can’t send you the document, it’s my company’s .
can you explain with an example ?

@grish,

You have to use regular expressions to extract values, Google how to use regular expression.
Or
You can use computer vision activities

Hi @grish,

Suppose you have a given text like below and you want to extract invoice No and date from given text, and invoice no and date is in middle of the text.

[DISBURSEMENT]
MAERSK LINE A/S INVOICE NO : SHVDA064505
ATT GROUP PROCUREMENT ESPLANADEN 52@ DATE : 27/89/2018
1263 COPENHAGEN DENMARK BILL FORM : SHBFN@38112

To Extract Invoice No Use the below Regular Expression

(?<=INVOICE NO :)(.+?)(?=\n)

(?<=INVOICE NO : ) - I have made INVOICE NO : as constant because this value doesn’t change
(.+?) - I am telling to take everything after the constant.
(?=\n) - Take the value before new line

To Extract Date use the below Regular expression :

(?<=DATE :)(.+?)(?=\n)
The same thing as above.

You can use https://regex101.com/ to test this.

Try this out and let me know if you face any difficulties in understanding.

3 Likes

thanks for the reply
regexp
i am maybe wrong but this wasn’t supposed to match ?

Hi,

For me it matches

1 Like

Hi @anil5,

I dont think so the computer vision activities are able to get the required values for example the CV get text activithy to get the value corresponding to a reference is not happening.

I wold like @Ryan_Rush to look into the issue.

Regards,
Pavan H.

@pavanh003,

To try it out, in the CV.getText, click indicate and do a box selection in the wizard, instead of clicking a word. After that you will need to pick an anchor for it. At runtime, it will scrape the whole area of the selection. Very similar to scrape-relative

Hi,

Tried as same as you said but din work for me.
If you have a example can you please share

Regards,
Pavan H

it was the \newline, i forgot to do it

@pavanh003,

Please find the workflow.

I have used scanned pdf to extract the information, i will send the scanned pdf separately personal message as its confidential.

Sequence3.xaml (27.5 KB)

From the below image, i am trying to extract shipping based on Kanoo as anchor base.

image

1 Like

@grish, did you understand the regex concept and were you able to extract the values like Invoice number and date.

hi @anil5, i’m still trying to figure it out but i understand a little bit more now,
i am not able (for now) to extract the values but hope i can

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.