Capturing Invoice Details from PDF received as Email attachment

studio

#1

This is my situation and what I’m hoping to achieve with UiPath. I have E-mail address on which I am expecting to receive E-mails with attachments. Attachments are expected to be PDF files that contain Invoices. The job is to process received E-mails by UiRobot that captures the basic Invoice details from each PDF received as an attachment.

What I have managed to do so far is the following:

  1. I am receiving E-mails via IMAP - good
  2. Saving all attachments as individual PDF files - good
  3. Enumerting saved files one by one - good

Now is the missing bit. What I need to happen next. For each PDF file the following needs to take place:
4. PDF file is opened by some kind of UiPath reader. I don’t want Acrobat reader to appear on the screen. In fact, the “headless” server installation is desired.
5. Certain areas in PDF are identified by keyword (e.g. Invoice Number, Total Amount, etc.)
6. Values corresponding to keyword-identified areas are captured into UiPath variables.

Once the above is achieved I think I know what I can do next. The above part - steps 4,5,6 are the mistery to me. So far, by following the video tutorials I was unable to achieve that.

Anyone can offer an example of the described solution or ideas on how to approach it?


#2

@Gnum
Hi Gnum, there is an activity called Read PDF Text if you install UiPath.PDF.Activities package then you will see this read PDF text activity. In this activity you have to provide out put string and input PDF file path with extension(.pdf).
This activity will read entair text and send into output string,

if you use InStr() function to verify what word you want to check in the PDF text. Based on this InStr function you can validate your PDF and process accordingly. for InStr function please refer VB Script on google/W3schools.

Hope my inputs are useful.


#3

Thank you Krishna,
the solution you propose is the one I was trying to avoid. Reality is, reading the entire PDF as a single text produces content that cannot be reliable processed for extraction of required bits of information. The heuristics of detecting the values for the target data fields seems very complex - too many permutations to consider.

I was hoping for a more intelligent solution. One of the tutorials is showing the detection of keywords within PDF when it is opened by PDF Reader. I was hoping that similar approach may be taken for PDF file without having to open it with PDF Viewer. So far it looks like the opening PDF Viewer is the only option, which is far from ideal.


#4

@Gnum,
if you predict what kind of text you are expecting from the PDF to validate then you can count them and create switch case logic. if not please observe the format of PDF text and proceed further to create logic.

with out looking at consistent PDF text behaviors, i can’t give any application specific logic.
Hope … you will share your logic…soon


#5

Hi to extract information from different types of invoices including scanned images, I would suggest to try the Rossum invoice capture technology.
It can extract data from invoices without any template setup and will return the extracted fields plus metadata (as JSON, XML, CSV) that you can store in UiPath variable for further processing by the robot. The robot can then manage further invoice workflow. Developers’ edition is available here: https://rossum.ai/elis. The API documentation is here: https://api.elis.rossum.ai/docs. It can be used together with UIPath by using HTTPRequest activity. The robot can also send the pdf as email attachment and receive back an email with the extracted data.