I have a workflow that needs to:
- save all the pdf attachments from automatic emails, and read each attachments (Completed part 1)
- scrape the report names, submission date, and form number and populate them in excel sheet.
- print the forms out
I’ve been doing it manually every day and now thinking to create a bot that handle it for me.
I’ve used all types of OCR engines and CV activities to scrape the data, workflow works for single pdf but not recognizing the 2nd and 3rd(number 8 is recognized as number 3 since they are scanned pretty badly). PDFs I have are not identical, they are different types of forms, that are scanned and sent by different organizations.
I need to scrape the organization names from each form, but my issue is: for form A: organization name field is located at box 2a., for form B: organization name field is located at box 1a. So the anchors are not fixed and set.
Another issue I’m having is: The report submission date is included in attachment/file name, for example:
ammended F44 incident 121519.pdf
January 2020 F55.pdf
SCC untitled_02122020.pdf
I need to populate the submission data in an excel sheet, but these dates are formatted differently as above, can my bot still recognize it? I’m not an expert in Uipath so please help.