Compare two lists of strings for partial matches without for each loop

Hi there, I have two lists of strings. One of which contains validated terms (product types), the other which contains product names.

I would like to compare these two lists to find whether the product names contain any of the validated terms - these would be partial matches within the longer product names.

Sample product names:
Amazing Industries, furry dog coat
Amazing Industries shiny collar - M
Tasty Pet Food

Validated Product Types
Dog Coat
Collar
Pet Food

Result:
Amazing Industries, furry dog coat, Product Type = Dog Coat
Amazing Industries shiny collar - M, Product Type =Collar
Tasty Pet Food, Product Type = Pet Food

Once it has found a match I then want to assign that product type to a variable that I can then add to a spreadsheet, but my current concern is how to find the partial match in an efficient way. Ideally avoiding loops if possible.

The nearest thing I’ve found is this thread using LINQ, but the data involved is fixed in length so they are able to pattern match using substrings.

My data is highly variable, so I need something more like a partial match - string.contains etc.

Any thoughts greatly appreciated.

Thanks,

Jake

@ ppr hi there, I think your previous solution is quite close to what I’m trying to do. Do you have any further advice on how to compare two lists of strings for partial matches?

Many thanks,

Jake

@jakegill
I will later have a look on this

How long is it taking by using For each loops? I would imagine that it would take less than a minute if you are comparing less than a few million string.contains() (product types * product names). Just make sure you use break statements so it can reduce the number of loops.

In all honesty, I’d go this route as it’s easy to understand and anyone can update/maintain it easily. Unless it’s taking 1hr+ it seems like that’d be the route to go. Keep in mind that LINQ is still using for each loops as well, it just does it in the background :slight_smile:

@jakegill here is an example of what I’m talking about. Can you give this a quick try on some of your data and let me know how long it takes? I really think it’d take less than a few minutes unless you are looking at millions or tens of millions of products.

jakegill.xaml (11.6 KB)

Thanks so much for you time Dave. Will fire this up when I’m back in tomorrow & let you know how it goes.

@jakegill
A statement could look like this:
(From pn In ProductNames
From pt In ProductTypes
Where pn.ToLower.Contains(pt.ToLower)
Select pn + " = " + pt).toList

Result:
List(3)
{
“Amazing Industries,
furry dog coat = Dog Coat”,
“Amazing Industries shiny collar - M = Collar”,
“Tasty Pet Food = Pet Food”
}

it’s easy to understand and anyone can update/maintain it easily.

I agree to @Dave. the common skills of a team are more important as a super cool LINQ statement that not all teammembers can handle. LINQ should never replace the essentials

I understood your requirements in that way
if Tasty Pet Food, Product Type contains Pet Food then is a match

Another match could maybe if any string Pet Food is contained in Tasty Pet Food, Product Type then it is a match. For this I would suggest just count the number of matches from single Product Types Strings in a product name. if count >0 then its match.

Here a combination from for each and a little bit LINQ could be an approach combining all good things from both areas

Thanks so much ppr, really appreciate your time. I’ll be trying this out and will let you know.

Best,

Jake

Hi Dave, would this If condition need to be amended if I were working from datatables instead of string arrays?

ProductName.IndexOf(ProductType,StringComparison.OrdinalIgnoreCase) > -1

@jakegill Yes, that statement assumes we are workign with a collection instead of a datatable. If you were using a datatable, I would recommend using a SELECT or WHERE statement to return datarows matching your criteria, then you would count to see if the datarows returned is > 0

So if you’re searching a datatable’s ProductName column for a word containing the ProductType and you want it to NOT be case sensitive, then put the following into 2 assign activities. If you want it to be case sensitive, then you can ignore the first assign activity below:

Assign dt1.CaseSensitive = False
Assign RowsReturned (this is a variable of type datarow array) = dt1.Select("[ProductName] like '%" + ProductType + "%'")

Then your if condition would be: If RowsReturned > 0

Hi Dave, you have my gratitude! Your previous suggestion worked for string arrays and this datatable version is working exactly as I’d hoped. I’ve now switched to directly comparing datatables instead of lists. Many thanks for your advice :slight_smile:

In case it’s useful for anyone else reading - I had to make sure I referenced the relevant columns in the datatables e.g. -

productNamesDT.Select(“[columnName1] like '%” + ProductType(“columnName2”).ToString + “%'”)

and the If statement I used was - RowsReturned.Count > 0

Thanks again :slight_smile:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.