Matching the name of a pdf file with a value in an excel column

hey, how can i compare the names of different pdf files in a folder (more then one) with the values within an excel column, so that if they are equal the pdf’s are read?

thank you in advance

For each file in the folder (you could use snippet “For Each FIle in Folder”)
1/ Read file names from excel into a datatable (ReadRange activity)
2/ Search the PDF file name in the datatable (Lookup Data Table activity)

Cheers

1 Like
  1. Check whether the folder where pdf files reside exists?
  2. If exists, use Directory.GetFiles(filepath, “*.pdf”) to read names of all pdf files. The result will be an array of strings, arPdfFiles.
  3. Use read range activity and read the data from excel to a datatable variable, dtExcelData.
  4. Use assign activity and use the below linq query to get the files that exists in excel (Considering “FileName” is the column header in excel

dtExcelData.AsEnumerable().where(function(x) arPdfFiles.Any(function(y) x.Field(of String)(“FileName”).ToLower.Contains(Path.GetFileName(y).ToLower))).Select(function(x) x.Field(of String)(“FileName”)).ToArray

To get the files that doesn’t exists in excel, use

dtExcelData.AsEnumerable().where(function(x) not arPdfFiles.Any(function(y) x.Field(of String)(“FileName”).ToLower.Contains(Path.GetFileName(y).ToLower))).Select(function(x) x.Field(of String)(“FileName”)).ToArray

Hope this helps.

2 Likes

Ok first thank you very much.

I followed steps 2 and 3, do i have to read the complete excel table at step 3 or only the column to be compared?

At step 4 i am not sure how to implement it, because the bot should only read the pdf files whose names are listed in the column. then it should read certain information from the pdf and post it into the cells of the corresponding line. If the name does not exist, it should simply write something into another cell of the row. I would have thought that I would have to solve this somehow with an if condition.

It depends on your requirement. if you need any other information from excel table, then use read range and read entire table. Otherwise you can read only the column to be compared.

Step 4 I have already wrote the entire query. You just need to replace the datatable name, file array name and the column name to be compared.

Implement as much as possible and send me your workflow. will review and provide inputs accordingly.

1 Like

Unfortunately I can’t post the workflow I have so far, because I am a new user. Is there a possibility to send it private or something like that?

You can send it as message

,
i’m getting following error,
Please guide

You copy pasted my code. the website changed “” to special characters. Delete it and and type “”.

i did,giving me same error

CheckPdfNameInExcel.xaml (6.5 KB)

GetFileName function is part of System.IO library. and it is not added in your system. So refer complete path as below.
DT.AsEnumerable().where(Function(x) arrPdf.Any(Function(y) x.Field(Of String)(“FileName”).ToLower.Contains(System.IO.Path.GetFileName(y).ToLower))).Select(Function(x)x.Field(Of String)(“FileName”)).ToArray

and the variable PdfList should be of type String array. Change that as well.

@Mayyur
Updated your workflow. Please check
@lucky-lukes This should work for you as well. check it
CheckPdfNameInExcel.xaml (6.7 KB)