I have a dynamic .txt file that contains job advertisements which follow specific pattern. The pattern is as follows “Company Name has a vacancy for the occupation of DRIVER , suitably qualified applicants can contact 1717171717 or CompanyNAme@GMAIL.COM ”. Company Name and all other bold details is different in each line, and each advertisement ENDS with an email. The .txt files contains a continues text without break lines or delimiters. I need to break each single advertisement after every email address, and output the results in XLS file. Anyone can help please?
jobs-test.txt (22.2 KB)
Hi @Anwar_Mirza
Please try below and see if this helps.
Let’s say your string is in variable “jobs” and Arr is an array of strings.
Arr= Jobs.split(“.com”)
Then You can loop through array and refer to each of the values. You may have to concatenate the .com in the end
When i observe your text, some emails contains .COM and .com, in this fi we want to ignore the case and split accordingly after .com irrespective of case. for this you can use below regex
System.Text.RegularExpressions.Regex.Matches(strValue, “.*?[\w.]+@[\w.]+”, System.Text.RegularExpressions.RegexOptions.IgnoreCase)
I will give a sample flow snap, you can build along with that
lstvar type should MatchCollection
Output in Excel
Please mark it as solution if you find it helpful!!
Happy Automation!!
you can use this to get all the values
System.Text.RegularExpressions.Regex.Split(str,"(?<=\.com)\s*",System.Text.RegularExpressions.RegexOptions.IgnoreCase).ToArray this will give you array of strings which each item represents one part
cheers
Hi Anwar,
To split the text, use below expression and store it in matchescollection
System.Text.RegularExpressions.Regex.Matches(
strInput,
“([a-zA-Z0-9.%±]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,})\s+(.*?)(?=[a-zA-Z0-9.%±]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}|\z)”,
System.Text.RegularExpressions.RegexOptions.Singleline
)
Use it inside for each
new string(){ item.Groups(2).Value.Trim + Environment.NewLine + item.Groups(1).Value }
Then You can loop through the collection and add it to data table then write in excel
Thank you all for sharing your knowledge, I will test all of them and share the results. Although majority of emails ends with .com, but I do have emails ending with .bh, .au, .sa and so forth. I will see what I can do about this, and appreciate if you can share your thoughts about it also.
System.Text.RegularExpressions.Regex.Matches(strValue, “.*?[\w.]+@[\w.]+”, System.Text.RegularExpressions.RegexOptions.IgnoreCase)
This pattern will work for all of your scenarios, just give a try and see, whether it is working or not.
here we are referring the @ symbol, so it can be anything .com, .ah, .in anything it can be even it will works
Thank you, I just tested it and it is working fine however I struggled with quite few cases where there is a space in the email address itself which causes the line to break in a wrong place. Is there a way to trim these wrongly made spaces from the string?
Thanks @sonaliaggarwal47 however this will only work with emails ending with.COM, other emails ending .in for instance will not be captured.
Thanks @Anil_G but this will only capture emails ending with .com, other emails will not be captured.
Can you show me example, at what scenario you are getting error
Hi @Anwar_Mirza
In that case, use below syntax:
Arr = str.Split({“.com”,“.bh”,“.au”,“.sa”},stringSplitOptions.None)
if there can be any more like .in etc simply, add in the list above and it should work.
Can you give a try with below syntax and see
System.Text.RegularExpressions.Regex.Matches(strValue, “.?[\w\s.]+@\s[\w\s.]+.\s*\w+”, System.Text.RegularExpressions.RegexOptions.IgnoreCase)
Thanks I will try it first thing tomorrow and let you know. Thank you so much @sonaliaggarwal47
Mainly .bh, .au, .in, .org, .net, and .sa. I think I will define this as a variable in the project and update it each time something new pops up.
Try with below pattern, this was giving the correct output
System.Text.RegularExpressions.Regex.Matches(strValue, “.?(?:[\w.-]+@\s[\w.-]+\s*.\s*\w+)”, System.Text.RegularExpressions.RegexOptions.IgnoreCase)
Happy Automation!!
you can use like this,this is more generic to get all types
System.Text.RegularExpressions.Regex.Split(str,"(?<=\.[A-Za-z]{2,3})\s+",System.Text.RegularExpressions.RegexOptions.IgnoreCase).ToArray
cheers








