Search for a word in a scanned pdf text and display from that word onwards to another word that is indicated

Good afternoon, I am doing an automation and I have to get the text of a scanned pdf and from everything I get I need to search for a specific word and show what follows after that word until another word that I indicate. Can anybody help me please. For example, I have a text that says “Please read our Forum FAQ - Beginner’s Guide before creating a new post.” What I need is for it to bring me everything that goes after “Guide” and that is before “post.”

Hi @Melanny_Herrera_Cruz

  1. Use “Read PDF with OCR” activity to read the PDF and write it into an text file.
  2. After that you can use String Manipulations or Regex Expressions to extract the text from one word to another word.
Assign:- strinput: "Please read our Forum FAQ - Beginner’s Guide before creating a new post."
Assign:- Matches: System.Text.RegularExpressions.Regex.Matches(strinput,"(?<=Guide\s).*(?=\spost)")   
(Datatype of Matches: IEnumerable(System.Text.RegularExpressions.Match))

Print it Message box by giving below condition:

Matches(0)


Check the below workflow:

Hope it helps!!
Regards,

@Melanny_Herrera_Cruz
Read PDF Text: Output the scanned PDF text to a variable (let’s call it “pdfText”).

Matches Activity:

  • Input: pdfText
Pattern: (?<=Guide\s)(.*?)(?=\spost)

Hi @Melanny_Herrera_Cruz

Try this

1.Read PDF with OCR activity
2. By using Regex you can get the required data

image

Example:

(?<=Guide ).*(?= post)

I hope it helps!!

@Melanny_Herrera_Cruz
Second Method:
Read PDF Text: Output the scanned PDF text to a variable (let’s call it “pdfText”).

Assign Activity:

  • String variable “startIndex” = pdfText.IndexOf(“Guide”) + “Guide”.Length
  • String variable “endIndex” = pdfText.IndexOf(“post”)

Assign Activity:

  • String variable “extractedText” = pdfText.Substring(startIndex, endIndex - startIndex)

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.