Good day everyone, I wanted to ask if there is a possibility or activity in UiPath or even if not UiPath…the way to extract a date from the pdf document and the date is not the same format like the other document but it needs to be recognized and extracted.
Use read pdf text or read pdf with ocr(for scanned docs) to read the pdf
Use regular expressions to extract the date
Check whether the Pdf’s is scanned ot not
If Scanned,Use Read PDF with ocr
Else use Read PDF text
Find Matching Patterns activity
Check different Patterns and use in the RegEr .Below is one of the format.
Hope it helps!!
→ Download UiPath.PDF.Activities package from Manage Packages
→ Use Read PDF Text to extract the text from PDF if it’s not scanned and store that Output in a variable say
→ If the PDF is a scanned copy then use Read PDF with OCR to extract the text from PDF and store that Output in a variable say
→ Use Write Text file to write the data into notepad file. (Note: Write Text File can be removed at last since its used just for checking how the data has been written)
→ After that you can apply regular expressions to extract date.
May be some formats of text with date will help me write a common regular expression for extracting date. Try providing the formats. I will help you with regular expressions.
Please show some sampels
Hallo, please see an example of one of the document below,we also trying chat GPT to try and achieve this but no too safe for company’s data.
Sometimes like this:
OR it can be MM/dd/yyyy format
As per the examples one commonality I see is a Date and colon( try to use regex by reading the pdf data using read pdf text
RequiredDate = System.Text.RegularExpressions.Regex.Match(str,"(?<=Date.*:).*",RegexOptions.MultiLine).Value.Trim
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.