String Parsing

Hi folks,

Need a good approach to solve this problem. I have multiple strings. For example, “LOT 37 DISTRICT PLAN BCS321 NW SEC 1”. Now, I want to extract the LOT|LT|L which is 37. PLAN|PLN|PL which is BCS321 and SECTION|SEC|SECT|S which is 1. But the problem is with different scenarios that come. I am attaching some scenarios and I want help from the community if you can help me come up with some good approach to solve this problem. Much much appreciated and waiting for your response. Thanks.

  1. “LOT11PLANVAS1326DESSEC184” → LOT=11, PLAN=VAS1326, SEC=184
  2. “LTB DL7199 KDP” → LT=B
  3. “LOT 1 SECTION 84 VICTORIA PLAN 26161” → LOT=1, Section=85, Plan=26161
  4. “PLBCP2211LT31DL280LD36” → Pl=BCP2211, LT=31
  5. “L4 S12 T23 NWD PLAN LMP3458” → L=4, s=12, plan= LMP3458

I am stuck here. I am counting on here to help me out. Thanks in advance.

Hello @Umer_Shahid

Thank you for the samples,

Can you tell us exactly what text you need from each sample?

Cheers

Steve

Hi,

Can you also share expected output for each sample?

But it may be difficult because we cannot tell header and data apart. For example, in case of PL is included in data such as LotAPLB : there is a possibility (Lot A) (PL B) OR (Lot APLB)

Regards,

@Yoichi @Steven_McKeering. I updated the post with the expected results too.

Hi,

Is there any rule to identify each data?

Why PLAN doesn’t contains DES?
Why SEC is 184? From the above rule it may be S is EC184.

Regards,

Hi @Umer_Shahid

Are you using OCR to collect these values?

Cheers

Steve

@Yoichi plan doesn’t have any character after the number. It might have character only before number.
And yes there are different possibilities there can be section if not then sec if not then s or the string doesn’t contain that.

@Steven_McKeering no I’m getting it from excel.

Hello @Umer_Shahid

Give these patterns a go but test test test them…

Use the 1st match only on the below patterns:
LOTS
PLAN
SECTION

They may or may not work as your data has a difficult structure…

Cheers

Steve

@Steven_McKeering your solution is good. I am just brainstorming. We can do this for example for Lot. We can see if Lot exists then we run regex for Lot only. If LT exists then regex for LT. And so on. This should work, right?
Can you do one more thing like can this regex be modified to pick the first element found not multiple? For example, the second example matches two items. Can you modify it to collect the first matching element?

Also do comment on the brainstorming I did with you.
image

You can achieve this yes,

Use a ‘Regex.IsMatch’ IF statement to check for the match. Like this:
System.Text.Regularexpressions.Regex.IsMatch(INSERTxTEXT,INSERTxPATTERN)

To get the 1st match only use a ‘Regex.Match’:
System.Text.Regularexpressions.Regex.Match(INSERTxTEXT,INSERTxPATTERN)

To split the Regex Patterns, remove the pipe symbols “|”

Let us know how it goes.

Cheers

Steve

@Steven_McKeering Sure lemme implement and I’ll update you for sure. But I gues your solution will work with a little bit of changes.

@Steven_McKeering man your regex rocked. I made few changes with that and it works now. I am keeping this thread for a day or two. If I have any question, I’ll post here and if not i’ll mark your answer as solution. Again, big thanks.

Good to hear - your samples have a difficult structure and you will need to monitor to make sure everything works :slight_smile:

Cheers

Steve

Yes, that’s what I am handling in my code and regex. @Steven_McKeering

@Steven_McKeering can you modify this regex: (?<=(PLAN\s*))([A-Za-z]\d+|\d+|[A-Za-z0-9]\s*[0-9]*)

Test runs:
STRATA LOT 142, BLOCK 5N, PLAN BCS3 444, SECTION 22, RANGE 2W, NEW WEST (Result: BCS3 444)

Like there can be scenarios in which after plan text there is a space and then number. Can you modify above one according to it?

Hello

Take a look at this updated pattern

Cheers

Steve

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.