I have a requirement where the staff will submit the medical claims through a portal. The medical receipts will be uploaded either in PDF, Image or Excel. The medical receipts will have a different format for each clinic and hospital. So will there be a solution for this.
We can check for extension of the file, and on that basis we can extract data.
For example:
Use switch case and make three cases in it:
1.filename.contains(“.pdf”)
Then use ‘read pdf’ activity to get data.
2.filename.contains(“.png”)
Then use ‘OCR’ activity to get data.
3.filename.contains(“.xlsx”)
Then use ‘read range’ activity to get data from excel.
Excel - use read range activity and you can extract specific data or if any tables.
Image - To extract information you need to use any ocr activities, if it doesn’t work you have to convert image to text using tessaract engines.
PDF - To extract from PDF use regular exprrssions if the provided PDF is native, if the PDF is scanned try using OCR techniques.