Looking up numbers in PDF, cross-referencing in excel to retrieve another number and renaming the file with that number in a different folder

Hi All,

I am trying to extract numbers from a PDF text that are in this format:

20220101_1234
They will always be 8 digits + “_” + 4 digits (different in each PDF). There is only one in each PDF in the folder.

After I retrieve that number, I will need to look for that number in an excel, where it will be under “ColumnA”. From there I need to get the data that is in “ColumnB” and rename the file with that in a different folder.

For example:
Step 1: Retrieve the 8digits_4digits from PDF - In this example: 20220101_1234 from the PDF

Step 2: Search this number in the Excel under column name “ColumnA” (the excel is constant) to retrieve the numbers from “ColumnB”: In this example I find numbers 0002 under “ColumnB”

Step3: Rename the file with the numbers retrieved from “ColumnB”: In this example the file will be renamed 0002 and save it in another folder

Could someone please help? Thanks in advance!

first read the pdf and use regex for extract 8digit_4digit

(?=[^\d]\d){8}.(?=_).(([^\d]\d){4}) - this regex will give you extract required formats
and from that each collection try var.split("“c)toarray(0) and var.split(”"c)toarray(1) for add to different columns

Hi Veera,

Should this be the variable code because it is not working:

System.Text.RegularExpressions.Regex.Match(pdfTextStr," (?=[^\d] \d){8}. (?=_). (([^\d] \d){4})",RegexOptions.IgnoreCase).Value.ToString.Trim

Also could you please explain to me how to use that number that it will get, to search in the excel that I already have for the number I need in another column?

you cant use like that

use matches activity and paste that regex code and its output will be collections

then you can iterate over that.

can you please tell actually what you want ?

You would need to take the 8 digits found using the regex and store that into a variable.

You can use the simplified regex (\d{8}_\d{4}) for your particular usecase if the string format is always the same.

Assign - MatchFound = System.Text.RegularExpressions.Regex.Match(pdfTextStr,"(\d{8}_\d{4})",RegexOptions.IgnoreCase).Value.ToString.Trim

Then read the excel spreadsheet into a datatable, and use the filter activity or other methods to identify the correct row, and then retrieve the data from column B of that row.

You can then use the Move File activity to rename and move at the same time.

Hi TimK and Veera,

I am a beginner so thank you for being patient. I attached for you two documents to make things clearer of what I need to make this work.

I need the automation to find the digits in the SamplePDF.pdf (203.8 KB)
and then cross reference the excel ExcelData.xlsx (24.4 KB) by using the number obtained which will be located in column name “P-ID”, to get the number from column name “X-ID”. Then it should create a new copy of the file in a new folder and rename it with that number.

To clarify: In this example, the SamplePDF shows the number 20220101_1234

From there we will need to get the number 12345 under column “X-ID” by cross referencing from column “P-ID”

image

Then copy the file into a new folder with the name of what was extracted from the excel which will be “12345” in this case.

I tried using the MatchFound in Assign as you stated TimK but it gives an error of “protection level”.
I would much appreciate it if you walk me through all the steps to reach my desired output please by showing me how to actually create the workflow if that is possible

image

Thanks in advance

Before the for each loop, read the spreadsheet into a datatable variable (use Read Range)- i.e. MappingTable

Your output of Read PDF Text should be pdfTextStr

So you need to create the variable - matchfound as String variable within the Body scope.

You can then use an assign to pull back the X-ID value.
Assign - XID = MappingTable.Select("[P-ID] = ‘"+matchfound+"’")(0)(“X-ID”).ToString

You can then use XID to save the file (using move file)
Source = original PDF filepath
Destination = NewFolderPath + XID +".pdf"