I used READ PDF TEXT to read the PDF file that I have. In the next step, when I try to capture data using REGEX, it works fine in the regex builder, regex storm .net tester, and regex101.com. But It is not extracting the required data from the text.
Here are some potential reasons why your regex might not be working as expected in UiPath Studio’s “READ PDF Text” activity, even though it works in other regex testers:
Whitespace Encoding: The whitespace characters in the PDF might be encoded differently than what you’re assuming in your regex. Try using \s+ instead of \s* to match more types of whitespace characters.
Hidden Characters: The PDF might contain hidden characters that are not visible in the text preview but still affect the matching process. Experiment with different whitespace handling options in your regex engine (e.g., \s versus \w\W ).
Text Extraction Issues: There could be slight variations in how “READ PDF Text” extracts text compared to other tools. Double-check that the extracted text in UiPath Studio matches the text you’re testing in other tools.
Newline Characters: Be mindful of newline characters (\n ) before or after the target data. You might need to adjust your regex accordingly.
(?<=Usage Charges:\s+)\$\s*(\d+\.\d+)
Pls share sample file and code So Our UiPath Community family help you.
I tried the OmniPage OCR, and it was taking a lot of time to just read the PDF and wouldn’t move forward. Anyway, I tested the Tesseract OCR by changing the Profile to different one than Legacy (this is by default) and it was extracting data fine after changing the Profile