hii guys,i have list of invoices in my mail i have to read each and every mail and extract some fields and put them in a database.it would be easy if all the invoices as of same template but if some invoices are of different type how can we do that. thanks in advance.
- You can process all the pdf from the mail first with get outlook mail activity and using save attachment activity with which you can save the files in the folder you want
- Then use a read pdf text if you can select the fields and wordings as element or use read pdf ocr if its of scanned type
- If its possible to get the element it is much better to use anchor base activity where you can keep the terms like invoice, date, quantity and those wordings as anchor with find element and get text activity to get the text near by of it for example ( invoice : 1235 ) where invoice word from find element activity ànd 1234 from get text activity
- Or if it is of scanned type you can use scrape relative activity which you can get from atl+ctrl+c where citrix wizard will open and select under image or text option with scrape relative…
- Thats all buddy you are done…and one thing i would suggest that only when the type of pdf of its get changed like it can ve either of scanned or native pdf, we can always use ocr type of activities like read pdf ocr and scrape relative…but if the format changes like being in table in some pdt and being in paragraph in some pdf cant be handled in one bot buddy…that is the change in type can be handled while change in format cant be handled buddy…
Kindly try this and let know whether this works or not buddy
@Palaniyappan thanks for the reply buddy.let me try it and i’ll get back to you if i struck anywhere.
Sure buddy anytime…
Did that work buddy @venkatmalla6
Hi @Palaniyappan I have a combination of scanned and digitized PDF’s here i want to extract key information like invoice no, date, amount, line items etc… and each invoice pattern if different then how we handle this situations is there a way that we can fetch information from unstructured PDF’s ?