Good afternoon, I am doing an automation and I have to get the text of a scanned pdf and from everything I get I need to search for a specific word and show what follows after that word until another word that I indicate. Can anybody help me please. For example, I have a text that says “Please read our Forum FAQ - Beginner’s Guide before creating a new post.” What I need is for it to bring me everything that goes after “Guide” and that is before “post.”
- Use “Read PDF with OCR” activity to read the PDF and write it into an text file.
- After that you can use String Manipulations or Regex Expressions to extract the text from one word to another word.
Assign:- strinput: "Please read our Forum FAQ - Beginner’s Guide before creating a new post."
Assign:- Matches: System.Text.RegularExpressions.Regex.Matches(strinput,"(?<=Guide\s).*(?=\spost)")
(Datatype of Matches: IEnumerable(System.Text.RegularExpressions.Match))
Print it Message box by giving below condition:
Matches(0)
Check the below workflow:
Hope it helps!!
Regards,
@Melanny_Herrera_Cruz
Read PDF Text: Output the scanned PDF text to a variable (let’s call it “pdfText”).
Matches Activity:
- Input: pdfText
Pattern: (?<=Guide\s)(.*?)(?=\spost)
- Result: Output the matches to a variable (let’s call it “matchedText”).
Try this
1.Read PDF with OCR activity
2. By using Regex you can get the required data
Example:
(?<=Guide ).*(?= post)
I hope it helps!!
@Melanny_Herrera_Cruz
Second Method:
Read PDF Text: Output the scanned PDF text to a variable (let’s call it “pdfText”).
Assign Activity:
- String variable “startIndex” = pdfText.IndexOf(“Guide”) + “Guide”.Length
- String variable “endIndex” = pdfText.IndexOf(“post”)
Assign Activity:
- String variable “extractedText” = pdfText.Substring(startIndex, endIndex - startIndex)
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.