Extracting text from multiple PDF files

Hi there,
New to UiPath and the community, so please bear with me in case I come back with questions :smiley:
I am using 23.10 and most of the questions and answers I found are related to older versions, so the projects shared have old activities which are not available anymore/have been updated/changed.
I have monthly >100 invoices and need to get 2 lines of info from them - invoice # and amount. I am unsure of how to:

  1. create the flow of reading all pdf files (structured, however some with more than 1 page) - # of invoices varies each month
  2. creating individual txt files for each pdf
  3. getting the data (invoice # and amount) from each txt file and adding them in an excel file
    Please help as I am stuck for 2 days now on this step :frowning:
    Thank you!
1 Like

Hi @arapeanu ,

Thanks for reaching out.

For your use case, you can follow the below steps.

  1. Get the Folder path where all of your files are located.
  2. Use For each activity to iterate files one by one.
  3. In For Each activity use Read PDF with OCR activity. (Use UiPath.PDF.Activities)package for PDF activities.
  4. Store the output of Read PDF with the activity in the Text file.
  5. Apply REGEX to get desired values from text files and store them in the variable.
  6. create a data table with Build data table activity with Column name “Invoice #” and “amount” then get ADD Data row activity to add data row to the data table
  7. Using Append range activity you can append your invoice and “Amount” in Excel

Hope this helps you.
Happy Automation,
@Vinit_Kawle

2 Likes

Hi, @Vinit_Kawle,
Thank you so much for the steps.
I am struggling with setting the variable for which folder/files to read :thinking: (as I said, I am new to all this but eager to understand and learn :smiley: )
I am getting this error


Also I am unsure what to actually use as variables for the “write to text”
image

Thank you for your help and patience :pray:

Hi @arapeanu ,

  1. In the file Name you can pass currentItem.ToString as you are iterating for each file in the folder.

  2. Take OCR Engine
    3.Create output variable to store extracted data “strOutputText”


  3. write extracted text to the file.

Hope this works for you.

Happy Automation.
@Vinit_Kawle

1 Like

Thank you, @Vinit_Kawle.
I don’t get an error, but nor do I get a result file :sweat_smile:
This is what the output says, no new file in any of the folders was created
image
Feel like I am missing something, although they look the same :thinking:

Hi @arapeanu,

There is no Log message given in between processing.
Can you Navigate and check the Project Folder?

There all text files will be created.

Happy Automation,
@Vinit_Kawle

1 Like