Grabbing Text from a word document and PDF document then comparing to make sure they are equal to one another

P_Harry · September 9, 2019, 2:45pm

Hello, I am in need of some suggestions on how to handle a specific work flow.

I currently have a work flow that downloads a word doc and a PDF . Both are going to need to have some sort of comparison with a “master” copy to make sure the text is properly generating. So far my work flow is able to download and open both successfully. My next step is comparing the word doc to the “master” word doc. What would be the best method in doing this. Like i said I would need specific text to be captured. (what would be the best way to even start with capturing the text)

Thanks in advance!

Palaniyappan · September 9, 2019, 2:51pm

Fine
We can read the word document with READ TEXT activity and get the output with a variable of type string
–i hope the specific term will have a anchor term aside of it like a name for a field
so we can get that with either string manipulation or regex method
–meanwhile we can read the pdf with READ PDF or READ PDF OCR activity and get the output with a variable of type string and get the specific term with same regex or string manipulation method

hope this would help you
kindly try this and let know for any queries or clarification
Cheers @P_Harry

P_Harry · September 9, 2019, 3:03pm

hi @Palaniyappan thank you for the quick response!
so this narrows down my ideas a little bit. In my word Doc there will be specific terms for example

Limit: 10,000

But instead of grabbing the 10,000 I want to go ahead and grab the whole line. I am struggling to come up with a String that would allow me to do this.

This my current string:
wordDoc.Substring(wordDoc.IndexOf("Limit: ")+"Limit: ".Length).Split(Environment.NewLine.ToCharArray)(0)

but it is grabbing all the text in the document.

indra · September 9, 2019, 3:19pm

@P_Harry If you have the format than you can use regular expression to extract the data

Palaniyappan · September 9, 2019, 3:32pm

Fine
if possible can i have few more words around the Limt: 10,000 which would be there along the sentence
so that we can come up with a substring method

Cheers @P_Harry

Topic		Replies	Views
Extract certain key words from multiple pdfs Activities pdf , activities , question	8	821	February 8, 2022
MS - Word and PDF comparison Help pdf , studio	2	1480	September 18, 2018
Search for a word in a retrieved text and display from that word onwards Studio studio	8	418	July 29, 2023
Need to extract specific content from word Studio activities , question , word	14	3755	March 20, 2020
Compare and Highlight text - word Document or PDF Document Help activities	10	2537	November 9, 2022

Most Active Users - Yesterday
ashokkarale
MD_Farhan1
Ajay_Mishra
postwick
Dheerendra_vishwakarma
Anil_G
chandreshsinh.jadeja
Gautham_Pattabiraman
vrdabberu
aravindbalineni123
More details...

Grabbing Text from a word document and PDF document then comparing to make sure they are equal to one another

Related Topics