Attended vs Unattended screen scraping / getting PDF field data

mwiseman · May 15, 2019, 7:59pm

I have a fundamental question, but can’t seem to get a straight answer. When screen scraping with UiPath (or more generally, any RPA tool), must the application associated with the document necessarily be open on the client machine for scraping to occur, and would this make the process always be considered ‘attended’?

I would like to have a UiPath process determine what PDF filename to scrape, and automatically put certain fields from the PDF into UiPath variables. I note the PDF fields are not captured using Get PDF Text since they are text contained in a ‘field’ container, thus screen scraping appears the only option, apart from writing a Custom Activity in Visual Studio. Any referrals to a relevent example nuget package/VS project code would be greatly appreciated.

DanielMitchell · May 16, 2019, 12:33pm

Just FYI if the robot must open an application to work that doesn’t necessarily make it attended. It can still do its work unattended. A robot is only attended if human interaction is required for it to complete items.

Ioana_Gligan · May 16, 2019, 12:48pm

@mwiseman your issue seems a data extraction one, and not a screen scraping one… You could try to use the document processing framework in the IntelligentOCR package, and write a custom extractor for the values you need.

You can also use the Digitize Document activity to read the text from the PDF file and also get all the coordinates of each word in it, to base your custom extractor search on.

Please see this: https://activities.uipath.com/docs/about-the-uipathdocumentprocessingcontracts for what to use / what abstract classes to implement in order to build a custom extractor,
see this: https://activities.uipath.com/docs/data-extraction-scope for the Data Extraction Scope activity (where a custom extractor would be drag-and-dropped), and
see this: GitHub - UiPath/Document-Processing-Code-Samples: Code samples for document processing activities. for how to build your own classifier and data extractor custom activities.

Also, I strongly recommend that you enroll in the Academy Course with the 19.4 Updates, where you can select to watch the Intelligent OCR part only

Hope this helps!

mwiseman · May 17, 2019, 3:00pm

Ok thanks! I note the Insider Preview release has an Advanced PDF activity, has anyone looked at this new activity?

Ioana_Gligan · May 31, 2019, 7:39am

@mwiseman do have a look at the PDF activity library, version 2.0.0. There are 5 new activities that will definitely help in handling PDF files, + the Read PDF Text and Read PDF with OCR activities have been improved.

mc00476004 · January 30, 2020, 12:12pm

Exactly i have the same issue and the same question in 2020. Read pdf text doesn’t work even with the preserve formatting flag set as true, so i am forced to use get full text screen scrapping activity and when i try to run this on a machine that is locked using an unattended bot, the jobs gets faulted with time out exception. I was able to run the read text under word application scope even when the system was locked without any issues. Not sure what is causing the problem with screen scrapping. When i unlock my system to see what went wrong, i only see a pdf document open and job failure…

Topic		Replies	Views
Issue occurred when using the Get OCR Text activity in the screen scraping method for an Unattended Process Activities ocr , activities , question , unattended , screen-scraping , get-ocr-text , screen-scraping-in-unattended , get-ocr-text-in-unattneded , how-to-use-screen-scraping-in-backg , timeout-reached	3	1285	September 17, 2021
PDF: Get text activity selecting entire page Activities pdf , activities , studio , question	19	2143	May 4, 2022
PDF Automation extracting text Help activities	4	3102	October 5, 2019
What method of scraping is used in the read PDF text activity? Help uiautomation , studio , data_scraping	0	6277	April 28, 2017
PDF extraction - identify selection Help	7	928	August 13, 2019

Most Active Users - Yesterday
Anil_G
sharazkm32
ashokkarale
Yoichi
ppr
singh_sumit
sonaliaggarwal47
marco.roensch
Ragavi_Rajasekar
Lucky1
More details...

Attended vs Unattended screen scraping / getting PDF field data

Related topics