Attended vs Unattended screen scraping / getting PDF field data

I have a fundamental question, but can’t seem to get a straight answer. When screen scraping with UiPath (or more generally, any RPA tool), must the application associated with the document necessarily be open on the client machine for scraping to occur, and would this make the process always be considered ‘attended’?

I would like to have a UiPath process determine what PDF filename to scrape, and automatically put certain fields from the PDF into UiPath variables. I note the PDF fields are not captured using Get PDF Text since they are text contained in a ‘field’ container, thus screen scraping appears the only option, apart from writing a Custom Activity in Visual Studio. Any referrals to a relevent example nuget package/VS project code would be greatly appreciated.

Just FYI if the robot must open an application to work that doesn’t necessarily make it attended. It can still do its work unattended. A robot is only attended if human interaction is required for it to complete items.

@mwiseman your issue seems a data extraction one, and not a screen scraping one… You could try to use the document processing framework in the IntelligentOCR package, and write a custom extractor for the values you need.

You can also use the Digitize Document activity to read the text from the PDF file and also get all the coordinates of each word in it, to base your custom extractor search on.

Please see this: https://activities.uipath.com/docs/about-the-uipathdocumentprocessingcontracts for what to use / what abstract classes to implement in order to build a custom extractor,
see this: https://activities.uipath.com/docs/data-extraction-scope for the Data Extraction Scope activity (where a custom extractor would be drag-and-dropped), and
see this: GitHub - UiPath/Document-Processing-Code-Samples: Code samples for document processing activities. for how to build your own classifier and data extractor custom activities.

Also, I strongly recommend that you enroll in the Academy Course with the 19.4 Updates, where you can select to watch the Intelligent OCR part only :slight_smile:

Hope this helps!

1 Like

Ok thanks! I note the Insider Preview release has an Advanced PDF activity, has anyone looked at this new activity?

@mwiseman do have a look at the PDF activity library, version 2.0.0. There are 5 new activities that will definitely help in handling PDF files, + the Read PDF Text and Read PDF with OCR activities have been improved.

Exactly i have the same issue and the same question in 2020. Read pdf text doesn’t work even with the preserve formatting flag set as true, so i am forced to use get full text screen scrapping activity and when i try to run this on a machine that is locked using an unattended bot, the jobs gets faulted with time out exception. I was able to run the read text under word application scope even when the system was locked without any issues. Not sure what is causing the problem with screen scrapping. When i unlock my system to see what went wrong, i only see a pdf document open and job failure…