HI Team,
As I am trying to read the pdf file (text format), I am getting really weird result.
Please find the attachments:-
Please help me with the same.
Thanks and Regards,
@hacky
HI Team,
As I am trying to read the pdf file (text format), I am getting really weird result.
Please find the attachments:-
Please help me with the same.
Thanks and Regards,
@hacky
I check using Message Box, Log Messages, Write line, and also Write text file.
In All the cases it shows the same junk text.
Kindly suggest some option.
Thanks and Regards,
@hacky
@hacky Looks like the PDF is a Digital One, and it should have extracted the Information. Can you try using PreserveFormatting Property Set to True and Check the Output?
Thanks for your response mate!
I tried it before, but unfortunately it is showing the same results.
Regard,
@hacky
It is difficult to analyse the issue without having pdf document.
Is it possible to share a pdf doc to have a better look and see it what is happening.
Regards,
Karthik Byggari
If possible could you please share that file with me personally. So that I can check and help you.
Yes I tried with Read PDF Text activity and it’s giving Gibberish result.
Use Read PDF with OCR activity and inside that use Tesseract OCR and then try once. I checked and its working fine.
As per my understanding, OCRs wouldnt give 100% smoothe results.
Do you think its the recommended approach?
Also, do we have some other options to try?
For now, I want the text from this PDF. I am getting the gibberish text as shown in the question.
can u share ur pdf, so that i can try
As I mentioned before, I am getting unwanted differences using the Read Using OCR Tesseract.
Please have a look at it. Can you please re-check the values and get back to me/?
I can Share u text file??