We have a scenario in which we want to extract standard info from a pdf. We are able to get the information in a text file and also it preserve the format too.
The problem which we are facing is with below scenario
aws gds ppt tyt iop eqw
0 1 0 8 9
How should we extract the above information
We have tried regex, split by space which is not able to help as u see that gds above has no value and if we use split by space there is a left shift of values and (1 for ppt ) comes under gds and so on .
Can anybody advise on what can be the approach on this
(aws has 0
gds has nothing belowit
and so on)
After trying this option we could the same issue of having everything coming in one column which is not the required result
data extracted is in the tabular format which is visible in the pdf but not in the text file
please find attached text file. Let me know if everything can be an output in a separate cell. experiment.txt (3.5 KB)
I am facing issue with opening the project where some of the activity is showing missing but still i am able to see the structure of the flow
Major issue which we will face is
we have pdf files each having more than 8 pages each having multiple table ( format is fixed and variable only for the places where address is coming up)
We will have to use multiple regex ex to extract the data and pre processing, we are looking for a option which directly convert pdf into a csv.
Any suggestion. Also it will be difficult to share the data as it is lot of manual effort on my side to mask the data
I have already implemented the solution in python using tabula wrapper but looking for similar in UiPath