Extracting data from Multiple Files with different format

Hi,

I have a requirement where the staff will submit the medical claims through a portal. The medical receipts will be uploaded either in PDF, Image or Excel. The medical receipts will have a different format for each clinic and hospital. So will there be a solution for this.

1 Like

We can check for extension of the file, and on that basis we can extract data.

For example:
Use switch case and make three cases in it:
1.filename.contains(".pdf")
Then use ‘read pdf’ activity to get data.
2.filename.contains(".png")
Then use ‘OCR’ activity to get data.
3.filename.contains(".xlsx")
Then use ‘read range’ activity to get data from excel.

Hope this helps!
:slight_smile:

1 Like

You can also use regular expression for this.
Like (?i).(JPG|pdf)$

Hi,

If the reports are in

Excel - use read range activity and you can extract specific data or if any tables.
Image - To extract information you need to use any ocr activities, if it doesn’t work you have to convert image to text using tessaract engines.
PDF - To extract from PDF use regular exprrssions if the provided PDF is native, if the PDF is scanned try using OCR techniques.