How to extract financial pdf document and save it to excel without using Document Understanding

How to extract financial pdf document and save it to excel without using Document Understanding and regular expressions?

Hi @Lalitha_Selvaraj

I think there is no other good options for this without third party solution. Extracting information from PDF requires exception mechanism, validation and so on.

You can try this activity

Extract Tables from PDF - RPA Component | UiPath Marketplace | Overview

Hi @riku_silva Thanks for the response. I’m having an financial pdf document(I dont have any tables in my document). I need to extract data and save it to excel. Is there any way to extract data without using document understanding.

Why do you keep saying without document understanding? DU is how you do this kind of thing.

The only other option is text manipulation which is tedious and requires the PDF is a text PDF (ie not a scanned document).

Hi @Lalitha_Selvaraj

Thank you for clarification

To extract text from PDF you can use UiPath.PDF.Activities, but you need to organize the information before send to excel, right? So you need string manipulation or a powerful tool to deal with it for you

There is no activity that extract exactly you want and save to excel without effort to customize

No @postwick . Our client is not accepting document understanding. since it required license.

Thanks @rikulsilva. I’ll try with text manipulation.

No it doesn’t. You can do it all locally.

@postwick

Just to check

the mentioned activities is only of OCR purposes, right? To transform the PDF in readable text for machine. To train a model and make prediction you still need license, right?

No, the only thing that is cloud based is the OCR/digitization but that can be done locally with the link I shared. Everything else you need to extract data is done locally with the DU activities.

@Lalitha_Selvaraj

Welcome to the community

you can use regex based extractors…string manipulation is also part of regex

cheers

Thanks @Anil_G . I will check on that.

1 Like

Thanks @postwick. for eg, If I need to extract data for around 500 pdfs , shall I use document understanding for that?
There’s no need for license right?

You can use DU for that, yes. Set up your taxonomy (at least one document type) and then use the Classify Document scope with Keyword Classifier to identify the document type. Put your fields in the taxonomy under each document type and use Regex to identify where each data point is.

I suggest creating a new project from the Document Understanding template to see how to do these things. Just add the local server package, and in Project Settings set local server to true.

Thanks @postwick. I will check on that…

@Lalitha_Selvaraj
try this :

Use the “Read PDF Text” activity in UiPath to extract text from the PDF, and then use the “Write Range” activity to save the data to an Excel file.

cheers…!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.