hi i am triyng to extract some details in a pdf file.
Here how i do :
i open the pdf
i use scrape relative to extract the data
but the problem is some data are on the front page and some others are in the middle and last page. Is there a better way to get these datas without trying to move manually (with hotkeys) the page ?
You can use regular expressions to extract the text, use read pdf text and provide us the text so that we can help you how to extract information using regular expressions.
Suppose you have a given text like below and you want to extract invoice No and date from given text, and invoice no and date is in middle of the text.
[DISBURSEMENT]
MAERSK LINE A/S INVOICE NO : SHVDA064505
ATT GROUP PROCUREMENT ESPLANADEN 52@ DATE : 27/89/2018
1263 COPENHAGEN DENMARK BILL FORM : SHBFN@38112
To Extract Invoice No Use the below Regular Expression
(?<=INVOICE NO :)(.+?)(?=\n)
(?<=INVOICE NO : ) - I have made INVOICE NO : as constant because this value doesn’t change
(.+?) - I am telling to take everything after the constant.
(?=\n) - Take the value before new line
To Extract Date use the below Regular expression :
I dont think so the computer vision activities are able to get the required values for example the CV get text activithy to get the value corresponding to a reference is not happening.
To try it out, in the CV.getText, click indicate and do a box selection in the wizard, instead of clicking a word. After that you will need to pick an anchor for it. At runtime, it will scrape the whole area of the selection. Very similar to scrape-relative