Find word in a PDF (Compare PDF)


#1

Hello, I have been trying to built a robot that compares some specific information from several PDFs but without success. For some of the PDF I need OCR and so far I have been able to extract the data, split into strings by word but I haven’t been able to compare the data.

Any help?


#2

How do you want to compare? using the string outputs?


#3

Honestly I’m not sure. what I did was that I read the pdf with ocr and store the text as string then split it and store every word as row in a csv table. What I wanted to do after that is to be able to find a word in the list. For example a date(which is stored in another Excel file)


#4

This might help solve your issue


#5

Thank you, but what I need is a way to go through the table to find if any of the cells contain the string I have stored in the other table. I’ve been trying with the for each row but then I get lost and can’t figure out what to do.


#6

not sure how ur tables are formatted but i can think of below way. hope it points in a right direction.

Foreach1 - each row in table1

Foreach2 - each row in table2 (inside foreach 1)

If condirtion stringtable1=stringtable2 (inside foreach2)


#7

Slightly more complex solution but one that would prevent you from using a Foreach within a Foreach would be to use :slight_smile:

For each row in dataTable1 (i.e each word)
arrWords = dataTable2(“word” = row(“Word”) - create a string array of the words found

   if arrWords.length > 0 then word matched 
   row("Found") = True   i.e. mark the row as found

Next


#8

@richarddenton how do you iterate the currentWord variable?


#9

Thanks,
However, just yesterday I tried something different. Instead of using Data table I extracted the info from the pdf as a text string and then just tried for each row in the excel table match row.ToString into the string. Then if match is true continue…


#10

dtSelect table would be different to the datatable that you are looping through - sorry lazy writing on my behalf. Assuming all you care about is that at least one instance of the word exists in the other table this should work. I’ve edited the original example.


#11

Understood!