Hello, I am scraping text from scanned PDFs using OCR then I’d like to search that text output if it contains a specific term. I can use the contains function like so:
VarPdfOutput.Contains(VarSpecificTerm)
Then it will return TRUE or FALSE. However, the contains requires an exact match, but I need it to be a partial match (at least above 80% match) because the OCR does make mistakes with scanned documents.
In my case the VarSpecificTerm is usually multiple words and not a single word.
Here is an example of what I’d like to do:
VarPdfOutput = “This is an example of some text that has been scraped from a pdf”
VarSpecificTerm = “has been scraped”
If we used VarPdfOutput.Contains(VarSpecificTerm) it will return TRUE.
Now lets say the OCR has made some mistakes in the VarPdfOutput like so:
VarPdfOutput = “Thos in an example of S0me texL that has beon scrapen fron a pdg”
Now if we used VarPdfOutput.Contains(VarSpecificTerm) it will return FALSE, however as a human we know that a few scraping errors were made in “has beon scrapen” but in the original scanned PDF it says “has been scraped” so I’d want it to return TRUE.
How can I build this fuzzy or partial contains function like so:
VarPdfOutput.FuzzyContains(VarSpecificTerm)
Then it will return TRUE.
It doesn’t necessarily have to be a function like the “FuzzyContains” example above, but how can I build this fuzzy contains functionality within UiPath? Or is there perhaps an activity for this?