Check column header to see if it matches, if it does then populate the column


So my dilemma is as follows:

  1. I have an Excel Sheet with each column having a different date.
  2. The idea is to check a PDF file’s date and see if it matches the date on the Excel Sheet.
  3. If it does, populate the table with data from that particular PDF file.
  4. If it does not, go to the next one until the right date is found.

So far I have tried to to do:
For each Row → For each col in dt(columns), but that does not seem to be working out that well.

Can you provide sample excel sheet? How many date columns are expected?
Are you only getting one piece of data from the pdf?

I attached an excel sample based on fruits in a fruit shop. Also, the date columns can be expanded but for now there are 31 dates. And also, I created a whole process to get my data from the PDF it is more than 1 ( for populating the column). But for this case to check if the dates match up, it is only the date data.

I do not see any attachments. Maybe this will work
Read the header row in the excel into an array ARR
Do for each pdf file
for each item in ARR
check date. when date matches, update the DT using row and column indexes corresponding to the column name = to the array value and structure of your rows.

Sorry about that! Sample_excel.xlsx (8.3 KB)

Can you fill in the example what you expect it to look like once complete?

Sure! Sample_excel.xlsx (8.9 KB)

So the way I wanted to go about it is, letting the data extraction process run and place the results on one sheet, and then populate the main sheet. When the next date/column comes, run the data put it on that same sheet, then populate the main sheet’s next column

Also, where does the DT come from?

It is easier to manipulate a datatable than trying to manipulate in excel in my opinion. So I would read the master table as a DT, then write the new data to it, then create a new excel file to replace the original.

So when you mean by indexing also based on reading range of an empty excel sheet, it takes into account the rows and columns when they are blank right? Then could I used a counter on index?

It should. Play with a sample datatable, use build datatable to build a small 3 column table and put some default data in there, then play with the read and write activities. Once you get the desired results, apply it to your use case.

I may have almost gotten it just a quick question how do you fill up a datatable column based on a read column activity (I used that to get the data needed to fill the column)

Hi, so this is the current workflow I am working with and the excel file. Sheet2 basically contains information that needs to be put into the column.Main.xaml (13.9 KB) Sample_excel (1).xlsx (10.9 KB) Also how would you start from the 3rd row? In a sense.

Sorry, I still do not understand your workflow. I do not want to take you down the the wrong path not fully understanding it. Let’s see if anyone else has some ideas.

Hello Kayla,
In this video, I have 17 use-cases for extracting tables from PDF and write data in Excel:

2:00 GitHub free code for all the files
2:20 Logic of general workflow
4:40 File 1 simple PDF
9:50 File 2 PDF with a column with multiple lines
20:10 File 3 PDF with a column with multiple words ON the LAST column
27:00 File 5 PDF with a column with multiple words ON inside column (2 columns)
31:40 File 6 PDF with a column with multiple lines
39:10 File 8 simple PDF
42:15 File 9 PDF with multiple spaces on that need to be correct
45:50 File 10 PDF with multiple columns that have multiple lines + multiple pages
55:50 File 11 simple PDF with protection empty Cells
58:35 File 12 Big PDF with an empty line and Empty columns and partial total
1:02:25 File 13 PDF with multiple columns that have multiple words and hard to define a rule
1:10:15 File 15 PDF with multiple columns that have multiple lines
1:12:50 File 17 simple PDF remove spaces from headers also remove space from Data
1:16:05 File 18 simple PDF
1:17:10 File 19 PDF with multiple pages and columns with multiple lines
1:22:10 File 20 PDF with multiple columns that have multiple lines
1:25:00 File 21 PDF with empty columns and subtotal


Cristian Negulescu