Scraping specific data from multiple scanned pdf docs and populate them in excel sheet

Zoe_1020 · February 20, 2020, 4:18pm

I have a workflow that needs to:

save all the pdf attachments from automatic emails, and read each attachments (Completed part 1)
scrape the report names, submission date, and form number and populate them in excel sheet.
print the forms out

I’ve been doing it manually every day and now thinking to create a bot that handle it for me.

I’ve used all types of OCR engines and CV activities to scrape the data, workflow works for single pdf but not recognizing the 2nd and 3rd(number 8 is recognized as number 3 since they are scanned pretty badly). PDFs I have are not identical, they are different types of forms, that are scanned and sent by different organizations.

I need to scrape the organization names from each form, but my issue is: for form A: organization name field is located at box 2a., for form B: organization name field is located at box 1a. So the anchors are not fixed and set.

Another issue I’m having is: The report submission date is included in attachment/file name, for example:

ammended F44 incident 121519.pdf
January 2020 F55.pdf
SCC untitled_02122020.pdf

I need to populate the submission data in an excel sheet, but these dates are formatted differently as above, can my bot still recognize it? I’m not an expert in Uipath so please help.

Pradeep_Shiv · February 20, 2020, 4:54pm

did you try with MATCHES activity

Divyashreem · February 20, 2020, 5:31pm

Data scraping from PDF and if not all PDF is of the same format then we just cant achieve 100% by using only UiPath, You can use the integration of ABBYY flexi capture and UiPath and then you can expect a results till 90%. But keep in mind results are always depends on the quality of the PDF file.

Zoe_1020 · February 20, 2020, 5:45pm

Yes, sadly, did not work because of the quality of the pdf.

Topic		Replies	Views
Scan Pdf Document Extraction Academy Feedback	3	1212	August 25, 2020
Help.I want to extract data from Scanned Pdf and import to 1 excel spreadsheet AI Center selector , uiautomation , pdf , data_scraping , question , ai_center	1	1401	February 21, 2021
Scanned PDF files Help	8	3407	May 13, 2019
PDF Data Scraping Fail Studio studio , question	7	1213	March 11, 2022
Problem with data scraping in PDF Activities pdf , activities , data_scraping , question	5	1326	October 18, 2021

Scraping specific data from multiple scanned pdf docs and populate them in excel sheet

Related topics