Read the pdf, Fetch the invoice values and Email them

pdf
scraping
studio

#1

I got this from net but got stuck in the middle. Can someone guide me to solve this one?

  1. Read the pdf
  2. Fetch the invoice values
  3. Create a new folder with date
  4. Copy the invoice pdfs into the new folder and Create new excel sheet (File name should be date
    specific)
  5. Create a header row and then feed the invoice data into it
  6. Send email (with excel attachment)

Note: I have used screen scraping but sometimes It is not catching the perfect values. So could anyone guide me what activities i should use for these 6?
Thank you in advance.
Regards,
Pranav


#2

Hi

  1. It depends on the type of PDF we are extracting from. If it is only text based then you can use Read PDF text and apply data manipulation functions on the result string in order to fetch the appropriate data. If it is a scanned PDF then you can utilize Read PDF with OCR.
  2. As said above you can apply string manipulation functions like substr/strlen or apply regex to fetch the right data.
  3. For third point check To create new folder with date and copy the files into it
  4. You can use Copy file activity to copy the PDF to the new folder.
  5. Can use write range for writing data table into excel and add data row for the header.
  6. There are multiple Excel based activities which you can utilize (like Send SMTP mail/Outlook mail)

#3

Hi qwerty123,
Thank you for the quick reply. for (1), these documents are scanned and I did use the PDF with OCR activity but for some PDFs, it is not catching the exact value. Is there anything else you can help me with this issue?
And by the way, thank you very much for the guidance. I look forward to hearing from you.
Regards,
Pranav


#4

Hi
That is indeed a genuine issue as it is a known fact that no OCR gives 100% correct results. Besides a lot of other constraints like quality of PDF, visibility of text can affect the result generated.
You can try Google cloud /Abbyy cloud OCR as they give the best results compared to standard Google/Microsoft OCR. But they are licensed versions so you wont be able to use it for free.


#5

Exactly. One last question, there are total 8 PDFs, so do i need to record a sequence for each PDF or there is some activity which can record for all the PDFs?


#6

I think if all 8 PDFs have same layout then you can have 1 sequence carrying the logic to read required data from it. You can make it dynamic enough so that it will work on all PDFs.
If all the 8 PDFs follow different layout then you might have to work on data extraction individually.


#7

Hi qwerty123,
Actually they do have different format. So I need to record it differently but I can copy and paste the email activity, can’t i?
Regards,
Pranav


#8

Yeah. You can have one email activity as reusable component and use that everywhere.


#9

Alright. Thanks.