I’m sorry but i’d like to ask a stupid question, what kind of solution would you recommend when you need to fetch the data from PDF?
I already looking for a good way to solve this for a month.
At the present, I already tried the ways below:
Convert PDF to Excel(3rd party)
Reason: Not all the file can be converted to excel completely without missing data.
Read PDF Text(PDF Activities)
Result: Failed but might working on this to solve the problem.
Reason: It can read all the text from PDF, but how to separate it to the CSV by using “Generate Data Table” is a problem.
Although I ask for the user tried their best to create a rule-based file(The content of PDF file is invoice), they still can leave a lot of human mistakes like some space, etc.(It can’t be easily separate the text to CSV by separator [space])
Reason: It’s useful but incorrect.
So I want to know does anyone has any experience to do these stuff and has a good solution?
Or any advise can be helpful
Thanks for your advice and Sorry for the late reply.
I tried your solution throw some old data on it, and it did work!
I must said that it’s really useful features but unfortunately those data are high confidential so I can’t just use the feature directly since the concern of data will deliver to the server.
I might try to start from Python whether can solve the problem or not.
After a period I’ll report if I have any achievement.
Thanks for your help and sorry for the late reply.
Currently, I solve the problem with your suggestion.
Fortunately, my data wasn’t so complex to arrange it so that I can use regex to get the specific data that I wanted.
However, due to my team member had the same issues need to overcome, now I’m still working on it that it’s quite difficult to fetch the data.
The situation is when the data to be read from PDF to TEXT, there’re 2 information about “address” will be correct. But the 2 address will be read and combine together since they’re write on left part and right par of the same row on the PDF.
The problem now I faced is I can’t use the rule to separate the combined information of 2 address.
Do you have any idea how to solve this problem?
Thank you for your help, I’ll try your advice try to write a custom extractor, hopes it can solve our problem.
I also have tried the invoice/receipt extractor feature, it did perfectly for fetching all the information we need. However, since the information are high confidential we couldn’t just use your feature directly since the message will send to your server. We’re sorry but looking forward to use this function when your team officially release the new feature.