PDF to Excel help

Hello everyone,
I am pretty new to automations and I am trying to create a process that extracts data from PDFs,creates a new excel file and pastes them there.
The PDF will always be two pages and formatted as seen in the sample below. With yellow are the fiels that I want to extract.I have read many forum topics and seen a lot of videos but I can’t seem to achieve it.Thanks for your time and any help In advance.
sample.pdf (215.3 KB)

Hi @DimiG

Can you create a sample excel file manually enter those data and share them here for the output structure and reference

Regards
Sudharsan

Hello @Sudharsan_Ka and thanks for replying,
this is a sample excel file
sampleExcel.xlsx (8.5 KB)

@DimiG , have you tried with form extractor ? as its fixed layout you should be able to extract the information.
you can find more details here Form Extractor and step by step guide - Extracting data from forms

Regards,
Balram

@balaraman.ramiya
Thank you for replying,I can’t seem to make it work but thanks for the nice info

Ok, you mean you are not able to extract the information form PDF using Extractor activity right?
Regards,
Balram

you can try with computer vision automation as your pdf layout is fixed. Give a try. You can find the details here - https://docs.uipath.com/activities/docs/using-the-computer-vision-activities
Regards,
Balram

I have tried some of these like Get Text and Screen Scope for example.Unfortunately I can’t get the data.All these were good tips and thanks for taking the time to reply

@DimiG , not sure you have used CV local server. I tried with that and it worked. Here is the screensnap. You can use label as anchor to extract the values.


It also recognize almost all of the fields.

As you have mentioned its fixed format, you can try with CV…Give a try.
Regards,
Balram

@balaraman.ramiya
Hello again,
I actually tried again the form extractor method u posted and was able to get results.
So thank you for that.
Only problem is everytime I run it it makes manually check and submit the values as shown below.
Any ideas how to make that process automated?

@DimiG , Well you are using validation station. If you dont want to validate, you can export the results directly to the excel using export extraction results activity.
Note - I would suggest you to go thru DU course in academy to get fundamentals on using the DU.

Regards,
Balram

1 Like

@balaraman.ramiya
Your advice helped me a lot and you are right i totally need to do the courses.
When yoy run it,did you get this wierd B5/U /L HS values?I am getting it on all the excels i created when testing
Screenshot

You might notice in validation station for this field its shows 3 correct? expand it and check see its extracting more similar values from other pages if so, apply anchor to extract the values. check the documentation here. docs.uipath.com/activities/docs/anchorbased-data-extraction-using-intelligent-form-extractor#setting-anchors-in-the-template

Regards,
Balram

1 Like

I find the value is from the second page of the .pdf.The weird thing is I already used anchors for the the 3 test values and I am still getting this output when i don’t use validation.

I worked my way around it ,seems to work for the 3 test fileds I have used.Thanks everyone for replying and especially @balaraman.ramiya for his patience and usefull documentation

Good that you are able to solve it and for the multiple value extraction you can look for the details here how it can be addressed using confidence score , require programming steps(Get Confidence for a particular field - DU - #2 by Lahiru.Fernando). This is typically provided to extract best match value for similar values. This problem is discussed in this thread (Definition & Execution of Business Rules for Fields (mandatory, possible values & fixed format rules) - #12 by sharon.palawandram) you can also light up to report this issue.

Regards,
Balram

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.