Getting row item using regex


#1

Hello!

I need to get a dates from multiple excel files, but he date isn’t in the same column in each file.
Right now I’m using a “complicated” loop to find the correct column for each excel file.

I was wondering if it’s possible to get a value that matches a regex from a row instead of using the column.


#2

Hi,

I think you are following a sensible procedure. What I would do personally:

  1. Loop over the items in one row once, either by integer or as follows, with DataColumn as TypeArgument: For each col in dataTable.Columns;
  2. Locate the first column that looks like a date, either by regex match or DateTime.ParseExact, and save it in a variable;
  3. Now loop over all rows and read out all the values from this column. You can use a DataColumn variable in place of an integer index, e.g. dataTable(0)(dateCol) or row(dateCol).

It’s theoretically also possible to do what you ask, but it would involve converting every row of every table to a string and searching it. This procedure will quickly (meaning with increasing table size) become much slower than the above, which is almost optimal and certainly short and easy to follow.


#3

@sfranzen couldn’t he just use Output Datatable to a string with a comma delimiter, then either split it and find the index of the date or just simply extract the date out with Regex? This method wouldn’t be too complicated.

For example with strTable as text from Output Datatable,
Foreach line In strTable.Split(vblf(0))
strdate = System.Text.RegularExpressions.Regex.Match(strTable, “[0-9]{2}/[0-9]{2}/[0-9]{4}”).Value

It might be more organized to figure out the column where the date is located though like what was suggested.

I’m also fairly sure you can use .IndexOf or .FindIndex to return the column index which would simplify things a lot. Just not completely sure on the syntax to use with Columns and short on time.

Maybe something like this,
Array.IndexOf(strTable.Split(vblf(0))(0).Split(",“c),(From item In strTable.Split(vblf(0))(0).Split(”,"c) Where System.Text.RegularExpressions.Regex.Match(item, “[0-9]{2}/[0-9]{2}/[0-9]{4}”).Value Select item)(0))

The above .Splits the table as a string to get 1st row and .Splits it by comma to use as the array of strings in .IndexOf, then using LINQ to Select out the date with Regex it uses that string to return the final Index or Column Index in your case that you can use.

Anyway, just throwing out multiple options.

EDIT

Here is the FindIndex alternative (also using Output Datatable):
Array.FindIndex(strTable.Split(vblf(0))(0).Split(",“c), Function(x) System.Text.RegularExpressions.Regex.Match(x.ToString, “[0-9]{2}/[0-9]{2}/[0-9]{4}”).Value<>”").ToString


#4

Indeed, he could and it may not even be noticeably slower than the other method. My consideration about slowness is merely theoretical: for some table size, searching every row instead of just one for a date will become slower and this difference will then grow with increasing table size. However, because of overhead and delays in the rest of the workflow, I will hazard the guess that it will probably be insignificant compared to the total execution time, unless the tables are really large (say dozens of columns and many hundreds of rows). Only a direct comparison test would give conclusive evidence.

Anyway, that was mostly my inner programmer speaking. I would definitely always perform the column lookup only once for each table, subsequently accessing its values directly by index is a very fast operation.