Filter Datatable rows on yesterday's date - large excel

Hi,

I have a very large excel sheet with tens of thousands of rows. Column A contains dates in the format dd/mm/yyyy. I want to filter out the rows that have todays date -1 day. While this is simple to do it takes a long time for the ‘read range’ activity to read through the whole excel sheet and then to do a ‘for each row’.

Is there any other ways of doing this that would be quicker?

Hi @Automater999

Try with filter Datatable to.filter out the rows based on condition

Hope it helps

Nived N :robot:

Happy Automation :relaxed:

Thanks for the reply. Still very slow. Takes about 30 mins.

@Automater999
Find starter help here:
DT_FilterDates_Yesterday.xaml (7.9 KB)

Hi Could you please explain the workflow?

First step about generating dates is just for demo purpose. You will use your datatable e.g. from read range Excel.

grafik
filtering the datatable With the help of LINQ - returning the filtered rows as a list
If result is not empty then copytodatatable else create empty datatble by cloning origin datatable

LINQ Statement:
(From d In dtData.AsEnumerable

  • iterate over all rows, reference looped row with d

Where now.AddDays(-1).Date = dateTime.ParseExact(d(0).toString,“dd/MM/yyyy”,
CultureInfo.InvariantCulture).Date

  • evalutate if the parsed date from current row is equal to yesterdays date

Select d).toList

  • if the row passed the filter then add it to a list

Thanks for that. I tried it but am getting an error.

image

I created a sample file with just a few rows of data and it worked fine. Would blank rows in the spreadsheet affect it?

1 Like

@Automater999
Yes empty string values are not a valid datetime string. We can handle (block it) with following statement

(From d In dtData.AsEnumerable
Where Not If(IsNothing(d(0)), True, String.IsNullOrEmpty(d(0).toString.Trim))
Where now.AddDays(-1).Date = dateTime.ParseExact(d(0).toString,"dd/MM/yyyy", CultureInfo.InvariantCulture).Date
Select d).toList

Thank you this works.

It is just the read range part now that is taking a lot of time. If there is a faster way to do this that would be great!

ensure preserve format is not set to true (read range activity)

There are blank rows in the excel. If i do not select preserve format will it not pick up all rows?

preserve format / not preserve format should not ommit rows. However give a try and verify the result within a debug / breakpoint run. Maybe the date pattern will be changed and requires an adoption. But you will see it within the debugging

Okay i tried that and it brought back the same error as above with the filter.

Not sure what would have changed?

As mentioned above when changing the preserve format option the datetime parse pattern is to check and maybe to adopt. Just check it while debugging and inspecting the datatable content

Hi,

Yes it looks like its bringing in a timestamp as well to the column with the date. Any idea how i would work around this?

yes after ticking off preserve format the date will be passed in this format
just change the format when parsing it.

find reworked demo here:
DT_FilterDates_Yesterday.xaml (8.3 KB)

Thank you again. Just a bit confused. Do i need to bring in the add data row ?

The below error appears

Just ensure that the right column is referenced and check the pattern on the right format e.g. that dd and MM is in the right order/position

Yes i checked the excel and date is in format dd/MM/yyyy

is there a way to specifically reference a column?

@Automater999
in the LINQ statement every place where d(0) is used the column is referenced with the index and does mean that the first column is used. As common used it can be referenced by name or column index.

Just check the excel on date strings that will not fit to the defined format pattern