Problems with Extract Screen

Hello, I try to use the extract screen to select a word in a pdf file and turn it into an OCR but I get this (the image), if someone could help me, thank you very much.

I attach the files:

Pedido_A484512.pdf (195.0 KB)
RobotOrganizador.xaml (10.0 KB)

Ideally, you do not need to open the PDF and extract a text.

you can use Read PDF with OCR activity to get the text and then use regex or simply split function to get the required text.

for your current approach, you can consider changing the OCR engine and try to get the complete text

The screen is just to validate if the text you need is getting extracted, if yes then simply click Finish button and you will get the text as output

The text in Output will be the text shown in this attached screen, selected text

I want to obtain information from a specific table of many pdfs within a folder through ocr.

Yes so you can read the pdf using OCR. look at the text and analyse what keyword can be used to split it and get the table content, then using a build datatable you can create a datatable with required columns and add the data in that after string operations

Your pdf is quite a simple invoice so string manipulation won’t be complicated. I’d suggest to split the text you read using the newline character and this way you’ll Get one row and one line of the pdf. After split you’ll have an array, you can Loop through it and get the required values for eg : if you want to get the total, then you can Loop through the array and have a if condition to check if that line contains Total keyword if yes then simply get the values of Total by splitting that array element .

Generic split works like… Split(strYourStringToBeSplitted,strKeywordToSplit)

Give it a try, that will work…