I need some help with a regexp and hope you can help me.
Example rows look like this (from a PDF) that have 4 columns
In the following the brackets are there just to show the columns:
[2019-09-20] [2019-10-26] [940 C] [65565656]
Each have always ONE space in between.
I need to get the value “940 C”
The values can also look like this
2019-09-20 2019-10-26 940 A65565656
** Should give the value “940”
2019-09-20 2019-10-26 930 C 65565656
** Should give the value “930 C”
2019-09-20 2019-10-26 940 B B655656CC
** Should give the value “940 B”
2019-09-20 2019-10-26 B655656CC
** Here we have 3 columns. Bonus if this could give the value “” or empty, otherwise this can be handled afterwards.
Are you set on regex? I think Split might be better here. First split by newline, then for each line you can split on spaces, ignore empty values, then grab index 2, check if index 3 contains exactly 1 character and append that to index 2 value if so.
If you’re absolutely set on regex then I can help with an expression on that as well, but it will rely on you always getting 2 dates, and always having the dates in the yyyy-MM-dd format & it is a bit tougher to get the optional single character after the number.
I’d still recommend using split, but below is a regex solution that would work. Get the value and use .Trim() to get rid of the excess spaces that could be present. Note that it is not possible to get an empty value using this regex method for option 4. However, you could use regex on each line individually and if no match is found, then give a string.empty value instead.
Assumptions:
You are looking for digits only in column 3 (switch the \d+ with .+ or the specific characters within [square brackets] if that isn’t true)
The 2 dates always come in the format of 4 digits - 2 digits - 2 digits
I’ll try with split as well and see how that works.
Your RegEx seems to work fine but not for an example like this (where it will find “655656” which I don’t want):
2019-09-20 2019-10-26 655656
How can we add so that the value we are looking for is not at the end (not ends with $).
Then I think we can have a solution with Regexp for this.
Multiline option must be used. This assumes it could be any character 0-9, A-Z, or a-z that you want to capture. It will grab column 3, column 4 (if it exists) and will not grab the final column
I tested the below Input and received the following matches: 940, 930 C, 940 B, 904C