I am using document understanding framework to read pdf file.
After using activity Digitize Document i fetched the result of pdf in notepad file.
Issue - There are many underscore appearing in the notepad data.
When i fetch the same pdf file using read pdf with ocr then this issue doesnt occur and data in notepad file is fetched correctly.
Can anyone tell me how can i improve results of Digitize document activity.
I tried to manually edit the results in notepad and replace the underscore with space but i am not able to that.
Can you please suggest how can i optimize result for better use ?
Also when i am using form extractor in Data Extraction Scope then all the data i want has underscore between each digit.
For ex:-Invoice Number : AZ12
is visible as A_Z_1_2