Hi there, I have two lists of strings. One of which contains validated terms (product types), the other which contains product names.
I would like to compare these two lists to find whether the product names contain any of the validated terms - these would be partial matches within the longer product names.
Sample product names:
Amazing Industries, furry dog coat
Amazing Industries shiny collar - M
Tasty Pet Food
Validated Product Types
Dog Coat
Collar
Pet Food
Result:
Amazing Industries, furry dog coat, Product Type = Dog Coat
Amazing Industries shiny collar - M, Product Type =Collar
Tasty Pet Food, Product Type = Pet Food
Once it has found a match I then want to assign that product type to a variable that I can then add to a spreadsheet, but my current concern is how to find the partial match in an efficient way. Ideally avoiding loops if possible.
The nearest thing I’ve found is this thread using LINQ, but the data involved is fixed in length so they are able to pattern match using substrings.
My data is highly variable, so I need something more like a partial match - string.contains etc.
@ ppr hi there, I think your previous solution is quite close to what I’m trying to do. Do you have any further advice on how to compare two lists of strings for partial matches?
How long is it taking by using For each loops? I would imagine that it would take less than a minute if you are comparing less than a few million string.contains() (product types * product names). Just make sure you use break statements so it can reduce the number of loops.
In all honesty, I’d go this route as it’s easy to understand and anyone can update/maintain it easily. Unless it’s taking 1hr+ it seems like that’d be the route to go. Keep in mind that LINQ is still using for each loops as well, it just does it in the background
@jakegill here is an example of what I’m talking about. Can you give this a quick try on some of your data and let me know how long it takes? I really think it’d take less than a few minutes unless you are looking at millions or tens of millions of products.
@jakegill
A statement could look like this:
(From pn In ProductNames
From pt In ProductTypes
Where pn.ToLower.Contains(pt.ToLower)
Select pn + " = " + pt).toList
Result:
List(3)
{
“Amazing Industries,
furry dog coat = Dog Coat”,
“Amazing Industries shiny collar - M = Collar”,
“Tasty Pet Food = Pet Food”
}
it’s easy to understand and anyone can update/maintain it easily.
I agree to @Dave. the common skills of a team are more important as a super cool LINQ statement that not all teammembers can handle. LINQ should never replace the essentials
I understood your requirements in that way
if Tasty Pet Food, Product Type contains Pet Food then is a match
Another match could maybe if any string Pet Food is contained in Tasty Pet Food, Product Type then it is a match. For this I would suggest just count the number of matches from single Product Types Strings in a product name. if count >0 then its match.
Here a combination from for each and a little bit LINQ could be an approach combining all good things from both areas
@jakegill Yes, that statement assumes we are workign with a collection instead of a datatable. If you were using a datatable, I would recommend using a SELECT or WHERE statement to return datarows matching your criteria, then you would count to see if the datarows returned is > 0
So if you’re searching a datatable’s ProductName column for a word containing the ProductType and you want it to NOT be case sensitive, then put the following into 2 assign activities. If you want it to be case sensitive, then you can ignore the first assign activity below:
Assign dt1.CaseSensitive = False
Assign RowsReturned (this is a variable of type datarow array) = dt1.Select("[ProductName] like '%" + ProductType + "%'")
Then your if condition would be: If RowsReturned > 0
Hi Dave, you have my gratitude! Your previous suggestion worked for string arrays and this datatable version is working exactly as I’d hoped. I’ve now switched to directly comparing datatables instead of lists. Many thanks for your advice
In case it’s useful for anyone else reading - I had to make sure I referenced the relevant columns in the datatables e.g. -
productNamesDT.Select(“[columnName1] like '%” + ProductType(“columnName2”).ToString + “%'”)
and the If statement I used was - RowsReturned.Count > 0