Best way to capture data from a PDF generated on the webpage

Hi guys,
I am very grateful for the help I have received here in the forum, each day that passes my understanding of the tool increases!
I have a Workflow that inserts some information and in the end generates a PDF ticket according to image:

Inspecting the Element I get this code:
image

I’m trying to get PDF content via Get Text but I have not yet succeeded.
Can you give me an orientation?get_PDF_File.zip (119.3 KB)

If you are trying to get particular information from the PDF, you can use relative scraping or Get all text and then use string manipulations to extract the required information.

Regards,
Karthik Byggari

This is what I’m trying to do:

Through the Get Value I try to allocate the text in a variable (vlr_DadosPDF) but a system.nullreferenceexception error occurs

when I try to view the value of the variable in a message box

This is because the selector is failing to locate the UI element and failing to assign the output.

And also make sure the PDF document is visible while scraping.

Can you please validate the selector using UIExplorer.

Regards,
Karthik Byggari

I have not yet succeeded in observing the Element,
I upload a flow (Flowchart.xaml) that exemplifies my goal:
get_PDF_File.zip (119.3 KB)

1 Like

Hi buddy @Rafaeloneil

Kindly follow the below steps that could really help you sort this out
Well this can be handled in many ways and let me tell you one by one
–if we are trying to extract specific terms from the pdf and if the pdf is not a native pdf, means the words in pdf can be selected as individual elements and if so we can use GET TEXT activity…But before using tis activity we need to use START PROCESS activity where we need to pass the file path of pdf as input to the file name property in start process…the reason to use start process is to open the pdf and bring that file in front of screen, then only the bot will be able to see the elements and get the text with get text activity…
–another way is if we are trying to get the text, a specific text but the pdf is native pdf, we cannot select the wordings as individual element, in that case we can use SCREEN SCRAPPING with OCR activities option in the design menu of studio, which will look for the text and get the text with OCR engines…and for this also the pdf must be brought up in front of screen so we need to use START PROCESS activity before this screen scrapping activity.
–Or if we are trying to extract the whole data from the pdf as a text, we can use READ PDF TEXT activity if there is no imaged text format as all the text wordings can be selected as individual elements, and if they are in image format like the whole pdf gets selected when trying to indicate a word with the selector, then we can use READ PDF OCR ACTIVITY, but for these two activities we dont need to use START PROCESS activity as they will read the content without opening the application

Kindly try this and let know for any queries buddy
you were almost done
Cheers @Rafaeloneil

Thanks for the great explanation, I solved the problem with some substrings by reading the pdf file after saving it

Fantastic
Cheers @Rafaeloneil

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.