Fetching the data from pdf files

There are around 30 pdf files I want fetch the amount from the each file, if the pdf is not contain any amount then i need to ignore that pdf file, if the pdf contains amount then i need to take the amount and write into the excel file.

Can anyone please help me to solve the above automation.

Thank you in advance.

Hi @HeartCatcher

Can you send the sample pdf? or just send the image of the amount in the pdf file

1 Like

Hi @HeartCatcher,

A simpler approach is to read PDF files using https://docs.uipath.com/activities/docs/read-pdf-text and check if required text is present or not using string.contains
If amount is a number, you can make use of Regex for searching amount.

Thanks,
Saranya K R

1 Like

Hi @HeartCatcher
The general structure you can solve ur issue is

  1. Use assign actity to get the list of all pdf files in that folder using assign activity

files_list=Directory.GetFiles(folder,“*.pdf”)

  1. use Build datatable with required columns where u need to add amount data and store dt1 datatable variable.

now use for each to loop through each file (for file in file_list ) with type argument as String
inside it use the below activities

a. Use Read PDF activity to read the pdf and store in string input_text

b. U can use isMatch or contains method to validate whether the amount is present there or not

c. if the above expression is true, then extract the amount data by using Regex or string manipulation and store in datatable using add datatrow activity to add amount data in dt1.

d. if the expression id false, then use continue action to skip the current iteration and move to next one.

Hope it helps
Mark it as solution if it solves ur query
Regards,
Nived N

1 Like

Hi @HeartCatcher

Here is a sample Template

PDF.xaml (6.6 KB)

Let me know if you need help

@Gokul001


please find the attachments

Hi @HeartCatcher

Can you able to send the sample file? for extracting the amount from the Pdf file

I need a sample Pdf file



@Gokul001 pfa

Hi @HeartCatcher

Whether the marked one remains constant?

@Gokul001 its not constant.

For each session there will be 2 files
:black_small_square: Presentation(presented column) File
:black_small_square: Return(received) Settlement File.
File name with _R is a return settlement file and file without R is the presentation file.
ACHCR is credit Settlement file and ACHDR is debit settlement file."

Hi @HeartCatcher

Can you able to send sample text file with a preserved format

Hi @Gokul001

There is no text file , I have pdf and master excel. May I please which text file you are looking for??

May I Please know****

Create a sample workflow

  • Use Read pdf Activity (In properties → preserve format as “TRUE”)
  • Print the text in a Notepad file. using write file Activity
  • Send me the text file

PPdf.txt (672 Bytes)
@Gokul001

Hi @HeartCatcher

What value i want to extract from the file can you please tell

@Gokul001
8,785,915.00
-7,266,377.52
1,404,716.48

Above values which are under presented, received , net bilateral respectively

Sorry for the inconvenience

Hi @HeartCatcher
Here is the sample workflow
PPdf.txt (672 Bytes)
Regex.xaml (7.4 KB)
Let me If you face any issues.