Can the UiPath community confirm if I am on the right path for this project. I want to extract a policy number from a different scanned PDFs, the PDF is not tagged. The each PDF has different font and color and word size and the policy number changes location.
I tried relative scraping, but this is imaged based so it works for the first PDF and not the other because the font changes so it can not find the same exact image of the anchor.
I think the only two solutions to this are:
Create a robot for each type of PDF that all have the same font, color, word size and policy number is always in the same place.
Extract the text from the PDF with OCR, then have it create a data table or word document or other type, then have it extract the information from the data table using string manipulation.
As of now there is no such capabilities in UiPath to extract information from scanned pdf with different fonts, size and location of attributes.
However with recent release on Document Understanding these capabilities are built up but limited to Invoice and Receipt attributes.
Answering your first question, Yes you have to create a different robot or you can use some logical switch expression based on name or any other attribute of file and control flow of execution based on it. For Ex.
PDF A ; PDF B ; PDF C
Will flow through switch based on name so case would be Case A, Case B, Case C.
Based on case bot will try to extract information.
On your second question, String manipulation is also possible but there would be many challenges like…
Assume one PDF have policy number like this. Policy : 12345678Test
Other might have
Policy : 12345678Test
Is there a way to classify the different pdfs into several types and then use a specific extractor for each type? This is basically the multiple robots idea, but without actually having one robot per pdf type.