Get ocr text not working in second pdf

Hi I am extracting data from pdf with get ocr text activity. It’s working fine for only 1 pdf but I am trying to iterate along multiple pdf it’s not working. I tried get pdf text but as I am not aware to extract specific data from get pdf output variable so I tried get ocr text.

Can anyone help please

1 Like

@dipon1112000 ,

If your PDF files are native means text can be selected and copied to clipboard or other application best practice to use Read PDF Text - Activities - Read PDF Text

If PDF is scanned means text cannot be copied use Read PDF With OCR - Activities - Read PDF With OCR

While using these activity you have to make the selectors dynamic to incorporate different files.

Thanks,
Ashok :slight_smile:

1 Like

Hi @dipon1112000

1.Use Read PDF with Ocr,Store the output in the string var
2.Use Find Matching Patterns activity,Pass the Regular expression Pattern to extract the value,String var in Text to be search in,Store the output in the FirstMatch as you have only one output.

The below screenshots will help you out

image

Hope it helps!!

1 Like

@dipon1112000

Try with reading pdf as text and then use regex or split to get the data

Cheers

1 Like

sorry for my ignorance the input in Pattern is going to be the extract I will be getting from pdf? if yes then am i going to repeat the activity (configure regular expression) multiple times to extract multiple data?

please advise

@dipon1112000

Yes, the input Pattern is going to be the extract the data from pdf.
You need to pass the regex expression in the Pattern.
Read pdf output in the Text to search in
I think the data you need to extract is the only one match.So,You pass the var in the FirstMtch which of String type.

If you have data need to extract more then create a var within Result.So that it will of type collection.
Then you iterate through the for each loop and access the elements.

Hope it helps!!

Thank you, the last thing I need your assistance is getting the output in notepad.

I am getting the desired output if I am checking it in message box

But output in notepad is giving

@dipon1112000

image

How are you passing the output as the output of regex is not a string, so if I am understanding the screenshot properly then am I suppose to add “.to string” to convert the output of regex? Can you share a xaml for better understanding

*My notepad has .CSV extention (it’s not a text file) so i am using add data row as such I need to pass string value.

@dipon1112000

if you stroing in the first match then you can folow the above process.
As you have many you can String manipulation to join all the current items and after the for each you can write the text as below

String.Join(",", {JoinedString, currentitem})

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.