Capturing Invoice Details from PDF received as Email attachment

Gnum · September 13, 2017, 1:16am

This is my situation and what I’m hoping to achieve with UiPath. I have E-mail address on which I am expecting to receive E-mails with attachments. Attachments are expected to be PDF files that contain Invoices. The job is to process received E-mails by UiRobot that captures the basic Invoice details from each PDF received as an attachment.

What I have managed to do so far is the following:

I am receiving E-mails via IMAP - good
Saving all attachments as individual PDF files - good
Enumerting saved files one by one - good

Now is the missing bit. What I need to happen next. For each PDF file the following needs to take place:
4. PDF file is opened by some kind of UiPath reader. I don’t want Acrobat reader to appear on the screen. In fact, the “headless” server installation is desired.
5. Certain areas in PDF are identified by keyword (e.g. Invoice Number, Total Amount, etc.)
6. Values corresponding to keyword-identified areas are captured into UiPath variables.

Once the above is achieved I think I know what I can do next. The above part - steps 4,5,6 are the mistery to me. So far, by following the video tutorials I was unable to achieve that.

Anyone can offer an example of the described solution or ideas on how to approach it?

rkelchuri · September 13, 2017, 8:55am

@Gnum
Hi Gnum, there is an activity called Read PDF Text if you install UiPath.PDF.Activities package then you will see this read PDF text activity. In this activity you have to provide out put string and input PDF file path with extension(.pdf).
This activity will read entair text and send into output string,

if you use InStr() function to verify what word you want to check in the PDF text. Based on this InStr function you can validate your PDF and process accordingly. for InStr function please refer VB Script on google/W3schools.

Hope my inputs are useful.

Gnum · September 13, 2017, 6:34pm

Thank you Krishna,
the solution you propose is the one I was trying to avoid. Reality is, reading the entire PDF as a single text produces content that cannot be reliable processed for extraction of required bits of information. The heuristics of detecting the values for the target data fields seems very complex - too many permutations to consider.

I was hoping for a more intelligent solution. One of the tutorials is showing the detection of keywords within PDF when it is opened by PDF Reader. I was hoping that similar approach may be taken for PDF file without having to open it with PDF Viewer. So far it looks like the opening PDF Viewer is the only option, which is far from ideal.

rkelchuri · September 14, 2017, 7:23am

@Gnum,
if you predict what kind of text you are expecting from the PDF to validate then you can count them and create switch case logic. if not please observe the format of PDF text and proceed further to create logic.

with out looking at consistent PDF text behaviors, i can’t give any application specific logic.
Hope … you will share your logic…soon

TobiasR · June 28, 2018, 3:22pm

Hi to extract information from different types of invoices including scanned images, I would suggest to try the Rossum invoice capture technology.
It can extract data from invoices without any template setup and will return the extracted fields plus metadata (as JSON, XML, CSV) that you can store in UiPath variable for further processing by the robot. The robot can then manage further invoice workflow. Developers’ edition is available here: https://rossum.ai/elis. The API documentation is here: Rossum API Reference. It can be used together with UIPath by using HTTPRequest activity. The robot can also send the pdf as email attachment and receive back an email with the extracted data.

savitha.kumari · December 14, 2018, 3:57am

Hi @Gnum,

Did you come up with any solution to process the PDF and extract the keywords?
Can you share how did you solve this?
We are also trying solve the same issue. Your reply is much appreciated.

Looking forward for your response!!

Thanks,
Savitha

Topic		Replies	Views
Email Challenge: This is similar, but unlike other Topics Help pdf , ocr , mail , studio	8	3061	October 29, 2018
Extract characters from PDF with various pages Studio studio , question , activities_panel	11	272	October 26, 2023
Rookie: UIPath reads native PDF file as image Help	5	3236	October 19, 2018
Getting problem regarding reading invoice Help	4	3064	January 12, 2018
Automate the process of checking the correct invoice Help	15	1211	August 26, 2019

Most Active Users - Yesterday
Anil_G
ashokkarale
jinal.shah
Gautham_Pattabiraman
postwick
chandreshsinh.jadeja
vrdabberu
Ajay_Mishra
sven.wullum1
Vyshnavi_Nalumachu
More details...

Capturing Invoice Details from PDF received as Email attachment

Related Topics