Hi I am extracting data from pdf with get ocr text activity. It’s working fine for only 1 pdf but I am trying to iterate along multiple pdf it’s not working. I tried get pdf text but as I am not aware to extract specific data from get pdf output variable so I tried get ocr text.
If your PDF files are native means text can be selected and copied to clipboard or other application best practice to use Read PDF Text - Activities - Read PDF Text
1.Use Read PDF with Ocr,Store the output in the string var
2.Use Find Matching Patterns activity,Pass the Regular expression Pattern to extract the value,String var in Text to be search in,Store the output in the FirstMatch as you have only one output.
sorry for my ignorance the input in Pattern is going to be the extract I will be getting from pdf? if yes then am i going to repeat the activity (configure regular expression) multiple times to extract multiple data?
Yes, the input Pattern is going to be the extract the data from pdf.
You need to pass the regex expression in the Pattern.
Read pdf output in the Text to search in
I think the data you need to extract is the only one match.So,You pass the var in the FirstMtch which of String type.
If you have data need to extract more then create a var within Result.So that it will of type collection.
Then you iterate through the for each loop and access the elements.
How are you passing the output as the output of regex is not a string, so if I am understanding the screenshot properly then am I suppose to add “.to string” to convert the output of regex? Can you share a xaml for better understanding
*My notepad has .CSV extention (it’s not a text file) so i am using add data row as such I need to pass string value.
if you stroing in the first match then you can folow the above process.
As you have many you can String manipulation to join all the current items and after the for each you can write the text as below