I am currenly working with a project where i need to extract 4 specific elements from a pdf files. Such as company name, title , document type and date. I have used regex expression to extract the elemenets out however due to the ocr applied some of the elemeents are not correctly extracted take for what are the if function i can used to extract the correct elements? For example i extract the document type as ABC however it returns Bac and if that happeneds i hope to write the value as XXXX. is there any if fuction that would allow me to do so?
You need to tweak your OCR settings so it’s more accurate. It’s basically trial and error, unfortunately.
Oh what if take for instance i am trying to extract the company name and in my flow i use the match function to extract the name before LTD. But for one of the file i scan it capture as LTA hence the regex expression does not work , is it possibel to use if function to return this empty value as “XXXX”
i try with the aboveworkflow but it did not seems to work out. my variable for Company is of IEnumerable type.
Then in your logic, Company.Count is never 0.
i thought that Company.Count=0 means there nothing in the regex expression? Sorry i am reletively new to uipath and this is my first time dealing with if fucntion not really sure how to go about it.
Company is an array (IEnumerable) so Count tells you how many elements are in the array. You need to show us how you’re setting the value of Company.
is this what you mean about how i am setting the value of a company? i have save the output as variable:Company. My input is taken from doctext which is what i have taken from my digitlization flow at the top when i scan my pdf files. i am sorry but what does array and elements refers to? elements refers to characters in the word?
Here are my tips for dealing with OCR problems:
- Like @postwick said, tweak your OCR settings! Trial and Error. Most important in my experience is the “Scale”-setting.
- Try to identify common mistakes - example: In one of my projects the OCR engine would keep mistaking certain character combinations (like “-/”) for something else (“V”). We used a dictionairy from an excel file to identify and correct the common mistakes
- Optimize the process, so if you’re lucky you don’t need to deal with OCR. (Problem solved?)