I am new using Uipath and I am trying to stract data from a PDF that contains a formular, and then transform this data into a Dictionary of strings.
The problem I am facing: I tried to use read pdf text activity and then capture the data i want to keep with Regex. I can’t make it work at all. There is also some kind of invisible data in the inputs of the formular that I can’t delete or control.
This is the pdf I have to read (since I am a new member i can’t upload anything)
You could use the ‘Read PDF Text’ and then past it on an Text File. As the document structure is a form, it may take some time, but I believe it can be done.
Then, in the text file, use could use Regex to find the text you want.
I myself converted the pdf to a text and I could not find any ‘invisble data’ as you mentioned.
Could please explain it better?
Well, I already did what you explained. In the sequence, I read the pdf as it is, I don’t modify anything, I don’t fill in any blanks, so I suppose it should just return the form, not any other information.
If you have an output txt, there you can see there’s a part where it says “NIF Apellidos o Razón social Nombre” and then just below, some numbers and some names. That information is hidden in the pdf, and I don’t know how to handle it, and when I write in the formular, it gets mixed with what I write.
I hope I have explained it well, as long as I can’t upload any document!
Hy, I found the hidden text you mentioned, it is very odd indeed.
As an alternative, you could open the file using the ‘Start Process’ Activity and then use the ‘get text’ to extract text from the file. I managed to test it sucessfully.