Hi guys,
I have few PDF files.
For each file i am read OCR Text and this text i am adding to data row.
Next i am creating two excel files (one for unique rows, other one for duplicates)
let’s say i have 3 PDF and data which i got using OCR is:
pdf1: [AA, BB, CC] unique one → write in ExcelUnique
pdf2: [AA, BB, RR] unique → write in ExcelUnique
pdf3: [AA, BB, CC] duplicated with pdf1 - write in ExcelDuplicates
Now if i check for each row in DT_ExcelUnique and compare it with row from DT dons’t work as i expected because:
First run, excel is empty (0 rows) and row from DT(pdf1) i unique so row is saving in ExcelUnique.xlsx as first row
Second run, excel contains 1 row but row from DT(pdf2) is also unique so row is saving in ExcelUnique.xslx as second row.
Third run, excel contains 2 rows (from pdf1 and pdf2) (so for each loop will execute twice):
- for first row in ExcelUnique, row from DT(pdf3) is duplicated with pdf1 - so duplicate will write in ExcelDuplicates.xlsx
for second row in ExcelUnique, row from DT(pdf3) is different so it means its unique - so this row will save to ExcelUnique - which is wrong because first row from excel already is e qual to this value and i don’t need it.
How to fix it? Any clue?