PDF Extract Text

I read the pdf and then print it to the txt file. There are a lot of expressions in the txt file.

Sample:

Document No: 1111
Invoice No: 222
Price: 4545

I just want to get the “invoice no” value. How can I do ?

a

Hi D Ulutas,

  • Read PDF Text activity is used to extract entire data of a document.
  • To extract a specific portion of data from the native PDF use Get Full Text activity.
  • If you are trying to extract data from a scanned PDF use Get OCR Text activity.

this won’t work because “invoice no” can be replaced.

Example: Invoice1

Document No: 1111
Invoice No: 222
Price: 4545

Example: Invoice2

Document No: 1111
Price: 4545
Invoice No: 222

@d.ulutas
assuming pdfText is the string outupt of read pdf text

you can get Invoice No using

System.text.regularExpressions.regex.match(pdfText, "Invoice No: ([\d]+)").Groups(1).value

image

try this

Sequence2.xaml (7.5 KB)

1 Like

Hey @d.ulutas

Take a look at my Regex MegaPost.

Its a starting point to learn Regex :slight_smile:

Cheers

Steve

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.