Extract specific data from multiple pdf's of different structures


#1

I am looking for a reliable solution to develop a workflow that extracts some specific texts from pdf document that has different structures. I tried Get text and Anchor base activity but they seem to work only on pdf with similar structures.

My problem is to find a way to read invoices (pdf,tif) from different vendors that has different structures and process them further in my workflow. Will Screen scrapping method help in this situation ?


#2

Hey did you find any solution please?


#3

Hey @ashok_sharma,
did you find a solution to this?


#4

@hemchandu2000
Any solution for this?


#5

I think you will need to extract the entire data as raw text and use https://code.google.com/archive/p/graph-expression/ or IBM Whatson’s NLP to extract meaningful information from it.

if anyone knows of a better way where this can be done using Solely Ui path please let us know.


#6

i’m also have the same problem. Did you find any solution for this @hemchandu2000