can anyone help me wit the following problem
I open a excel workbook and store it into a data table. This data table can contain a large number of rows. Each of these rows contains a code in one of the columns - lets call it product code. The trick is that we don’t know what that product code means so we have to look it up. Since we have 5k+ lines we cannot lookup one code at the time, but there is a trick: codes that have a similar beginning or ending we can be 99% confident they are the same product \ but just some variation of it. Hence, we can say that 90% of the data are only repeated products but with a slightly different code - which we can now lookup and apply the same product name to all similar product numbers:
234256756 - we can look this up and find that this is an u iPhone
------------------------- - and say that with 99% confidence the codes below are the same product
x3432343332 - except this one that is starting a new sequence codes for the same product
so if you can see that there will be a similarity in the first 50% or last 50% of characters as well as the overall product code length.
My approach was to:
- create a datatable where i will store only the unique codes - get the code for the first row and assign it to a variable - previousCode and add the first code to a data table called unique codes.
Once i have that code in a for each row i move to the second row and assign the new code to a variable called current Code and check if the two strings match like:
if previousCode.substring(0,cint(0,5previousCode.Length)) = currentCode.substring(0,cint(0,5currentCode.Length)) And - similar logic to check the last part as well (you understand the point i guess). if this criteria is matched we can say that this is not an unique code but a variation of the previous one. if there is no match this is new code that we add to the unique codes data table.
This works but its ridiculously slow. Is there any way to filter data tables based on similar but not same string.