Document Data recognition

Good day everyone, I wanted to ask if there is a possibility or activity in UiPath or even if not UiPath…the way to extract a date from the pdf document and the date is not the same format like the other document but it needs to be recognized and extracted.

Hi @Anelisa_Bolosha1

Use read pdf text or read pdf with ocr(for scanned docs) to read the pdf
Use regular expressions to extract the date

1 Like

Hi @Anelisa_Bolosha1

Check whether the Pdf’s is scanned ot not
If Scanned,Use Read PDF with ocr
Else use Read PDF text
Find Matching Patterns activity

Check different Patterns and use in the RegEr .Below is one of the format.


Hope it helps!!

1 Like

Hi @Anelisa_Bolosha1

Download UiPath.PDF.Activities package from Manage Packages
→ Use Read PDF Text to extract the text from PDF if it’s not scanned and store that Output in a variable say str_Text.
→ If the PDF is a scanned copy then use Read PDF with OCR to extract the text from PDF and store that Output in a variable say str_Text.
→ Use Write Text file to write the data into notepad file. (Note: Write Text File can be removed at last since its used just for checking how the data has been written)
→ After that you can apply regular expressions to extract date.

May be some formats of text with date will help me write a common regular expression for extracting date. Try providing the formats. I will help you with regular expressions.


1 Like


Please show some sampels


Hallo, please see an example of one of the document below,we also trying chat GPT to try and achieve this but no too safe for company’s data.
Sometimes like this:


OR it can be MM/dd/yyyy format


As per the examples one commonality I see is a Date and colon(:slight_smile: try to use regex by reading the pdf data using read pdf text


RequiredDate = System.Text.RegularExpressions.Regex.Match(str,"(?<=Date.*:).*",RegexOptions.MultiLine).Value.Trim


This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.