Extract Specific text from multiple Pdf's

Hi,

I am having multiple pdf’s in all the pdf’s i want to extract only “Invoice Date” “Total”
pdf samples attached below



Can anyone guide how to achieve this

Thanks in advance

Hi @rsr.chandu ,

Could you use Read PDF Text activity with Preserve Format property enabled and write the extracted data to a Text file and check how the data is arranged in the text file or you could send the text file here?

Hi @rsr.chandu,

You can extract the selected values using Regex.

  1. Use Read PDF Activity and store the extracted text in a variable like strExtractedText.
  2. For extracting Total value use an assign activity in the following way:
strTotal = System.Text.RegularExpressions.Regex.Match(strExtractedText, "(?:\s*|^)(\$\s*\d+(?:,\d+)*\.\d+)\s*(?:$|\s)").Groups(1).Value
  1. Same for Invoice Date:
strInvoiceDate = System.Text.RegularExpressions.Regex.Match(strExtractedText, "\s*(0?[1-9]|1[0-2])\/(0?[1-9]|[12][0-9]|3[01])\/\d{2}\s*").Value

Cheers

1 Like

Thanks it worked for me

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.