I used UiPath Document Ocr engine in the Read PDF With Ocr activity since May 2021. The result text was very good. But suddenly from October 2021 up to now, the result text is in wrong order.
For example, if the pdf is: “That is a good idea” then the output result is “That good is a idea”. Everything are correct except the word order.
I attach the pdf file and some first lines of the result are here:
“Prasmatic Bookshelf of Many the designations used manufacturers by and sellers to distinguish their prod- ucts claimed are trademarks. as Where those designations in this appear book, and The Pragmatic Programmers, LLC was aware of trademark a claim, the designations have been printed in initial letters capital all in or capitals…”
You can see many positions the word are in wrong order.
I choose UiPath Document Ocr engine because it is correct nearly 100% with my documents. I tested other engines such as UiPath Screen OCR, Microsoft OCR, Google OCR, Tesseract, Abby, Omnipage many times with lots of my documents and the result is not good as UiPath Document OCR.
Does anyone face my error? Or can you recommend other engines for me? Thank you.
Test.pdf (80.8 KB)
Test.xaml (11.6 KB)
i tried your secuence and it worked for me.
Test.xaml (11.8 KB)
dunno whats going on
Thank you for trying my case. I am using Community Edition, and I’m in Vietnam. Does it just happen with Community Edition?
I really need to fix this problem. I don’t think it is a bug because fernando_zuluaga does not face it. So what other things or configuration do I have to try? Thank you.
Sorry, can someone from UiPath provide me an official answer for this case?
I see this is not an scanned pdf. Have you tried Read PDF Text activity instead of Read PDF with OCR?
No, this is just a sample file. My documents have a lot of scanned documents that need OCR so I can not use Read PDF Text. The problem here is it used to be very good until recently (October). And I have tried on different machines to see what is going on but I really can’t understand this strange error. Do not have any glue.
hi, i used community edition, and i got no problems, have you tried in another machine your workflow?
Try changing the “preserve format” parameters.
Hi @Gabriel_Wisniewski, Can you please explain more about “preserve format” in UiPath Document Ocr engine? I really don’t know about it. Thank you.
Yes, I’ve tried with the other engines (Google, Tesseract…). They are correct with the word order, but the accuracy is lower than UiPath Document OCR engine. So I can’t use them.