Hello All,
I am reading a pdf and trying to extract multiple text at different stages. In this particular use case where i am stuck, I am trying split a string using Regex of “>= 10 - < 20” or “>= 0,1 - < 1”… The regex is used as a delimiter in my string.split function.
I have tried different combination of Regex but somehow they doesn’t seem to work accurately.
Requesting your advise here.
Assuming text is a variable containing your entire string, maybe something like this would work: text.Split({System.Text.RegularExpressions.Regex.Match(text, “(>= 10 - < 20)|(>= 0,1 - < 1)”).Value},System.StringSplitOptions.RemoveEmptyEntries)
So, basically you are using Regex to pull out the text that matches that pattern and using it as your delimiter to split the string into an array of parts.
[This suggestion has not been tested.]
As for the Regex pattern you are using and the one I used in my suggestion, I would need to see your string and all the various ways the delimiter can be outputted to come up with an ideal pattern.
Thanks. Hope this strikes some ideas for you.
EDIT: Was thinking that you could use a pattern like this “>=.*-< [0-9]{2}” , that way the numbers can be anything, however I’m no expert on Regex patterns.
Thanks @ClaytonM for your advise. I was able to get more constructive ideas post reading your suggestions and got the code working but a see a discrepancy here and if you can please advice.
My string is a very long string consisting of data extracted from PDF and there’s a table for which i am trying to extract rows with the help Split function using delimiter as Regex.
My assign activity, reads like this : “Temp_1.Split({System.Text.RegularExpressions.Regex.Match(Temp_1, “>=.*- < [0-9\D]{1,3}”).Value},System.StringSplitOptions.RemoveEmptyEntries)” where Temp_1 is my text and i am storing result in arr of String and using a loop, looping into each item but to my surprise, this assign split activity extracts the first item correctly and then in the second iteration, gives back the entire remaining output.
Could you please take a look at the workflow file and suggest.forum_1.xaml (12.5 KB)
I found part of the problem. The Regex was including a trailing space throwing it off. I added a .Trim and it worked. Like, Temp_1.Split({System.Text.RegularExpressions.Regex.Match(Temp_1, “>=.*- < [0-9\D]{1,3}”).Value.Trim},System.StringSplitOptions.RemoveEmptyEntries)
The next issue is that when you pull the value from the Regex Pattern you are only pulling out the first instance of the pattern. That means if you need to split also by the “- < 10” and not just the “- < 20” you will need some adjustments.
You might need to run the split through some sort of loop. I would probably recommend LINQ or Lambda expression to create your Array. I will try to reply back with that solution if I am able to solve it.
So looks like Regex has a .Split function with it. Sorry I didn’t realize that before, lol.
Here’s my solution now: System.Text.RegularExpressions.Regex.Split(Temp_1, “>=.*- < [0-9\D]{1,3}”,System.Text.RegularExpressions.RegexOptions.None).Where(Function(x) x<>“”).ToArray()
So using the .Split with Regex will split it by that pattern. I then used the .Where to get rid of all the empty entries cause I didn’t see an option for that unless I missed it.
Hi i want to split pdf based, in every document date changes how to give regex for it.
I have used like this readText.split({“[0-9]{2}/[0-9]{2}/[0-9]{4}”}, StringSplitOptions.None)
For reference u can check pdf i have attached 05.pdf (177.3 KB) 06.pdf (14.4 KB)
You can use a normal Split if the “string or character that you split by” is static, as in does not change. If you need to use a pattern or wildcard, then you need to use Regex.
@arivu96
i have more doubt, if string contains
Ex:-
kira he 234.23
kirad he1w huwe 423.342
kirasdasd df2sh fhsjf dhsj 7838.09
Kiran hi h2i hi gdhs 95321.098
In example it may many more lines from each line i want number in last in one variable and rest of it another variable for each line