Extracting data from PDF Invoices from ma

Happy New Year to all.
I am new to the RPA still learning, and am stuck in extracting
pdf, I have different pdf invoices with different formats and
from different suppliers, now I want to read the data and extract
the data like, Invoice No., Date, Amount, Supplier Name and
so on. I would appreciate if anyone could help me with this.

Can you send, one of your pdf.

Hello Manish
Thanks for your reply, One of the sample invoice is attached,
I have 100’s of invoices from many suppliers.
wordpress.pdf (42.6 KB)

Hi @syed1980 ,

Happy New Year :slight_smile:

You can extract all the pdf data using “Read Pdf” activity and use Regex accordingly to retrieve the piece of required information .

Else, you can use “Anchor Base” activity to retrieve the required data and use the same piece of code to retrieve the information from the remaining PDF’s as well with the same format.

Regards,
Sasidhar

1 Like

Check this workflow, I have extracted invoice number from the pdf, similarly you can do it for other also,
Main (3).xaml (6.8 KB)

1 Like

Thanks for the reply @Manish540,
why am getting this error of missing
activity, what activity I have to fill here.

That’s because you probably don’t have the pdf activity installed, if you have it installed, Please use the read activities from pdf pacakge as per your need

1 Like

Ohh, sorry, Thanks Max Yes PDF activities was not installed.
Now this error-2 am getting, but you know when I installed
pdf activity this main has taken “Anchor Base Activity”.

Not sure how it took anchor base activitiy. But the reason you got that error is because the get text activity in the anchor base did not work. I have had similar issues with pdf extraction and i sought help from community and i was able get it done using regex and string manipulations using get full text or read pdf text

I am using “anchor base” which is given by @Manish540,
trying to solve this but still facing errors. :slightly_frowning_face: :frowning_face:

Check this thread

I suggest using the Extract Semi Structured Document Activity – it’s easy to use and understand.

Honestly @Sinan_Bolel_DoIT I don’t understand how to use
this activity, but I really appreciate your reply. I would be glad
if you could help me further to use this also.

Hi Syed,

  1. Put your PDF invoice into a folder on your computer. These PDFs should not be longer than 2 pages.
  2. Download the activity (on github, press the download ZIP button) and extract the project folder
  3. Create a new folder in the project directory called “Out” (for some reason, its missing in the project)
  4. Open the UiPath project in Studio
  5. Run the process (press the green arrow debug button)
  6. You should see a dialog box, asking you to pick a folder. Find the folder you created in step 0 and select it. The activity will resume.
  7. Once the activity completes, you will see an excel file in the Out folder with the results.
1 Like

Here’s the full post: Receipt and Invoice AI - Now available in Public Preview!

Sorry, I must be sounding dumb to you, but am just new to this
RPA.
I still haven’t create any project yet, do you want me to create the
project and download this “Extract Semi Structured Document Activity”
from Uipath Go!, I’ve already downloaded this file from “https://go.uipath.com/component/extract-semi-structured-document-activity-5abc47”
and saved this file in the Uipath folder, do you want me to save this file in the project
folder once I create it ?
this is the file which is downloaded

1 Like

Sorry, I meant to post the example on github: GitHub - uipath-antoine/ExtractSemiStructuredDocumentExamples: Examples for the Extract semi-structured document custom activity

This is a full process utilizing that activity that I linked earlier. :slight_smile: Don’t worry, no such thing as sounding dumb – we’re all here to learn

1 Like

OMG, thanks Sinan its work with some error, but I have
reached somewhere, done everything as per your instructions.
this is the error… “Extract semi-structured document: The server cannot or will not process the request due to an apparent client error”

Thanks for everything Sinan, Its working absolutely fine,
but the results are not just as desired. I have attached
one of the invoice, if you compare the attached result with
that then there are some mistakes like the description and
the total amount. If going forward if I want more details from
the invoice like Invoice Number, Order Number, & Invoice Date etc.
then how can I get that.

thanks
Invoice & Excel Output results both are attached. :arrow_double_down:

wordpress.pdf (42.6 KB)

You need to generate an API key for this endpoint