Extracting a portion of a text in a pdf file

I don’t know if it’s the right place to answer this, but I am too hollow-minded to think about the code for extracting a specific text from a pdf file. It is a pure document, not a scanned document nor image, so OCR won’t come into play.

I am eager to know how can I extract text from a pdf file in order to append to an existing text file. The screenshots come pre-attached for your ease.

This is the sequence in question.
image

And this is the text intending to be extracted from an existing pdf file, and it is on page 3 out of 15.
image

The steps here were unclear in my intellect for my understanding: https://docs.uipath.com/activities/docs/read-pdf-files#read-a-pdf-file-using-the-read-pdf-with-ocr-activity

Hello

You can use Regex,

But you need to make a Regex pattern. To do this,

We need:

  • A sample
  • Expected output
  • Whats consistent

Cheers

Steve

Okay. Since the project’s in a classic view, how can a regex pattern be done? If there is a walkthrough in the docs, please hint me to such.

Hi

Take a look here:

All you need is an Assign activity:

Assign Left:
str_Result

Assign Right:
system.Text.RegularExpressions.Regex.Match(yourStr, “INSERTxREGEXxPATTERN”).ToString

Take a look at this sample regex pattern

image

You can learn more about regex here:

Hopefully this helps

Cheers

Steve

Will do so. Thanks.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.