Grabbing Text from a word document and PDF document then comparing to make sure they are equal to one another

Hello, I am in need of some suggestions on how to handle a specific work flow.

I currently have a work flow that downloads a word doc and a PDF . Both are going to need to have some sort of comparison with a “master” copy to make sure the text is properly generating. So far my work flow is able to download and open both successfully. My next step is comparing the word doc to the “master” word doc. What would be the best method in doing this. Like i said I would need specific text to be captured. (what would be the best way to even start with capturing the text)

Thanks in advance!

1 Like

We can read the word document with READ TEXT activity and get the output with a variable of type string
–i hope the specific term will have a anchor term aside of it like a name for a field
so we can get that with either string manipulation or regex method
–meanwhile we can read the pdf with READ PDF or READ PDF OCR activity and get the output with a variable of type string and get the specific term with same regex or string manipulation method

hope this would help you
kindly try this and let know for any queries or clarification
Cheers @P_Harry

1 Like

hi @Palaniyappan thank you for the quick response!
so this narrows down my ideas a little bit. In my word Doc there will be specific terms for example

Limit: 10,000

But instead of grabbing the 10,000 I want to go ahead and grab the whole line. I am struggling to come up with a String that would allow me to do this.

This my current string:
wordDoc.Substring(wordDoc.IndexOf("Limit: ")+"Limit: ".Length).Split(Environment.NewLine.ToCharArray)(0)

but it is grabbing all the text in the document.

1 Like

@P_Harry If you have the format than you can use regular expression to extract the data


if possible can i have few more words around the Limt: 10,000 which would be there along the sentence
so that we can come up with a substring method

Cheers @P_Harry