Looking assistance in regex

lisa_R · March 10, 2023, 11:51am

Hi Brilliant Team,
Need your assist with regex. I have tried but it’s not extracted properly. Could you please assist me someone?
With the partition of India in 1947, it became the Pakistani province of East Bengal (later renamed East Pakistan), one of five provinces of Pakistan, separated from the other four by 1,100 miles (1,800 km) of Indian territory. In 1971 it became the independent country of India, with its capital at.

look With the partition of India in 1947, it became the Pakistani province of East Bengal (later renamed East Pakistan), one of five provinces of Pakistan, separated from the other four by 1,100 miles (1,800 km) of Indian territory. In 1971 it became the independent country of india, with its capital at.

With the partition of India in 1947, it became the Pakistani province of East Bengal (later renamed East Pakistan), one of five provinces of Pakistan, separated from the other four by 1,100 miles (1,800 km) of Indian territory. In 1971 it became the independent country of India, with its capital at.

we are reading the news and after data massaging we will published in portal. The data is very confidential. We need to check it one by one.

Output would be data table: like below

![image|690x453](upload://3M2wYbsCy

I have tried with regex but it’s not working properly.
I am looking your help.
Thanks

supermanPunch · March 10, 2023, 12:24pm

Hi @lisa_R ,

Could you maybe try the below Suggestion :

As the Multiple Capturing data is present within the same whole text, We can use the Split Method and Separate each section and then use the Title and Depression Regex to capture the data.

System.Text.RegularExpressions.Regex.Split(strInput1,"(?=Title:)" )

Here, strInput1 is a string variable containing the whole text data. The output from the above expression is an Array of String, which we can use it to loop through a For Each Loop Activity.

Using the Expression above in the For Each Loop activity is shown below. We then Capture the Title and Description for each splitted Section.

image736×146 5.7 KB

Expressions to Capture Title and Description :

title = System.Text.RegularExpressions.Regex.Match(currentItem,"(?<=Title:).*").Value.ToString.Trim

Description = System.Text.RegularExpressions.Regex.Match(currentItem,"(?<=Description:)[\s\S]+").Value.ToString.Trim

Next, we add this data to the Datatable using Add Data Row Activity.
At the end, Outside the For Each Loop, we can use Write Range Activity and write the datatable to an Excel sheet.

Note: The NormalDT used was built at the start using the Build Datatable Activity containing the columns Title and Description

lisa_R · March 10, 2023, 12:43pm

Hi @supermanPunch thanks for your quicker reply. Description is not working where i did stuck too. It should be extract all the paragraph under description. But it extracted all of the text below description which is added another title too.

supermanPunch · March 10, 2023, 12:50pm

@lisa_R ,

Were the Steps suggested followed ?

Firstly, we will be Splitting the Sections (From Title to Character Before Next Title word). The Regex Split does this and provides us with an Array of Splitted Sections, Then we can Apply the Regex mentioned on the Splitted items so that we won’t be getting all the data.

I do get the Output in the below way :

Yoichi · March 10, 2023, 12:56pm

Hi,

Another approach:

Can you try the following sample?

dt = System.Text.RegularExpressions.Regex.Matches(yourString,"Title:[\s\S]+?(?=Title:|$)").Cast(Of System.Text.RegularExpressions.Match).SelectMany(Function(m) System.Text.RegularExpressions.Regex.Matches(System.Text.RegularExpressions.Regex.Match(m.Value,"(?<=Description:\s*)[\s\S]+").Value,"[^\r\n]+").Cast(Of System.Text.RegularExpressions.Match).Select(Function(m2) dt.LoadDataRow({m.Value.Split(chr(10)).First,m2.Value},False))).CopyToDataTAble

Sample20230310-6L.zip (3.3 KB)

Regards,

lisa_R · March 10, 2023, 1:19pm

Hi @Yoichi thanks a lot. It’s working perfectly. Could you please elaborate functionality where i can modify and extract expected output?
Could you please suggest the tutorial where i can learn this functionality?

Yoichi · March 10, 2023, 1:42pm

Hi,

In order to improve readability, set “Ssytem.Text.RegularExpression” at Import Tab in advance. And added linebreak as the following expression.

Regex.Matches(yourString,"Title:[\s\S]+?(?=Title:|$)").Cast(Of Match) _
    .SelectMany(Function(m) _
             Regex.Matches(Regex.Match(m.Value,"(?<=Description:\s*)[\s\S]+").Value,"[^\r\n]+").Cast(Of Match) _
        .Select(Function(m2) _
	        dt.LoadDataRow({m.Value.Split(chr(10)).First,m2.Value},False) _
	    ) _
    ).CopyToDataTable

The first Regex.Matches method extracts strings which starts with “Title:”. It’s assigned to m by the 2nd line.
Next, the second (and 3rd) regex extracts each description after “Description”. it’s assigned to m2 by the 4th line.
Then, dt.LoadDataRow method returns datarow which has the first line of m and m2
Finally, these datarows are converted to DataTable by CopyToDataTable.

Could you please suggest the tutorial where i can learn this functionality?

The above using Regex and LINQ.If you are not very familiar with these function. it may be good to start to check the following documents.

Regards,

lisa_R · March 10, 2023, 2:07pm

Thanks @Yoichi its a great help.

@supermanPunch thanks too.
you guys both are genius.

system · March 13, 2023, 2:07pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Regex operation Help activities , regex , question	12	1028	January 19, 2020
Logics for manipulation result Activities question , system	13	409	July 27, 2023
Need help on regex from a image extracted text Activities uiautomation , activities , studio , regex	5	313	September 27, 2023
Need to get correct output using REGEX or other things Studio studio , regex , question	5	746	April 24, 2022
Text Extraction and Regex challenge Studio studio , question , activities_panel	9	53	November 27, 2024

Looking assistance in regex

Related topics