How to extract percentage from text

I have data in the following format. When I extract it to a text or Excel file, it gets stored in a single cell. I need to validate whether the returns are 100% based on the time period. If the returns are exactly 100%, I should mark it as ‘Pass’; otherwise, it should be ‘Fail’. I was able to extract the percentage after ‘Return USD’ using line-by-line reading with regex. However, I am unable to extract the data based on the time period. Could you please help me with this?

We have different types of accounts, and the data will vary. Some accounts may have only one entry (Refer-1 and 2).

Sometimes, the data can be extensive, containing more than 16 or 17 lines. Additionally, within the same period, you may see more than 5-6 indices contributing returns on same period(Refer-3)

Refer-1
benchmakdetails
Time Period BMK
Jun 30 2024 - Jan 31 2025 HDFC Bank return USD 100%
Jun 30 1958 - Jun 30 2025 ICC Bank return USD 100%

Refer-2
benchmakdetails
Time Period BMK
Jan 31 2013 - Jan 31 2025 HDFC Bank return USD 100%

Refer-3
benchmakdetails
Time Period
Jan 31 2023 - Dec 31 2024 Axex 500 return USD 50%
N&P 500 return USD 40%
MCCC Net index return USD 5%
ITC Gross return USD 5%
Aug 31 2000 - Jan 31 2023 S&P 500 return USD 15%
Bloomberg zyd 500 return USD 5%
Blk MUCCSS return USD 10%
UR smallCAP return USD 10%
Blk MUCCSS return USD 10%
JAPAN INDex return USD 10%
Axex 500 return USD 10%
HDFV 500 return USD 10%
SBI 500 return USD 20%
Aug 31 1995 - Aug 31 2020 S&P 500 return USD 15%
Bloomberg zyd 500 return USD 5%

1 Like

@Raghu_km

It would be good if you can attach a sample excel or datatable to know how data looks

And how you want to validate

Cheers

sorry for the late response, please refer the below data in text form. I want to store this in the excel.

ICCsamplefile.txt (515 Bytes)

expecting in this format

Hi @Raghu_km

Check this workflow
RegexMatch.zip (9.6 KB)

Get the data to required group data using the regex

MatchList = System.Text.RegularExpressions.Regex.Matches(inputTextData, "[A-Za-z]{3} \d{1,2} \d{4} - [A-Za-z]{3} \d{1,2} \d{4}(\s*|\S)*(?=[A-Za-z]{3} \d{1,2} \d{4} - [A-Za-z]{3} \d{1,2}|close)").Cast(Of System.Text.RegularExpressions.Match)().Select(Function(m) m.Value).ToList()

And each group get the date and sum of the percentage

timePeriod = System.Text.RegularExpressions.Regex.Match(currentText,"[A-Za-z]{3} \d{1,2} \d{4} - [A-Za-z]{3} \d{1,2} \d{4}").Value


sum = System.Text.RegularExpressions.Regex.Matches(currentText, "(?<=USD )\d+(?:\.\d+)?(?=%)") _
    .Cast(Of System.Text.RegularExpressions.Match)() _
    .Select(Function(m) CDbl(m.Value)) _
    .Sum()

Sample input and output

benchmakdetails
Time Period
Jan 31 2023 - Dec 31 2024 Axex 500 return USD 50%
N&P 500 return USD 40%
MCCC Net index return USD 5%
ITC Gross return USD 5%
Aug 31 2000 - Jan 31 2023 S&P 500 return USD 15%
Bloomberg zyd 500 return USD 5%
Blk MUCCSS return USD 10%
UR smallCAP return USD 10%
Blk MUCCSS return USD 10%
JAPAN INDex return USD 10%
Axex 500 return USD 10%
HDFV 500 return USD 10%
SBI 500 return USD 20%
Aug 31 1995 - Aug 31 2020 S&P 500 return USD 15%
Bloomberg zyd 500 return USD 5%
close

Hope this helps!

Hi Sanjay, I tried to open this workflow in my current UiPath setup, but we are getting a dependency error. We are using Studio 2023.10.8. I tried to repair it, but it is affecting my existing workflow.

@Raghu_km

Step 1: use build data table activity with columns Time period and Percentage

Step2: Use assign activity and create a list variable
MatchList = System.Text.RegularExpressions.Regex.Matches(inputTextData, "[A-Za-z]{3} \d{1,2} \d{4} - [A-Za-z]{3} \d{1,2} \d{4}(\s*|\S)*(?=[A-Za-z]{3} \d{1,2} \d{4} - [A-Za-z]{3} \d{1,2}|close)").Cast(Of System.Text.RegularExpressions.Match)().Select(Function(m) m.Value).ToList()

Step 3: Use for each activity with MatchList as expression

Inside for each

Use assign activity with the below Time period expression with timePeriod as string variable
timePeriod = System.Text.RegularExpressions.Regex.Match(currentText,"[A-Za-z]{3} \d{1,2} \d{4} - [A-Za-z]{3} \d{1,2} \d{4}").Value```

Use assign activity with sum variable with system.double as datatype

sum = System.Text.RegularExpressions.Regex.Matches(currentText, "(?<=USD )\d+(?:\.\d+)?(?=%)") _
    .Cast(Of System.Text.RegularExpressions.Match)() _
    .Select(Function(m) CDbl(m.Value)) _
    .Sum()

Use add data row activity and pass the array {timePeriod, sum.tostring+“%”}

Hope this helps

thanks let me try this.

Could you please share your workflow diagram here…

@Raghu_km


Hope this helps!

Since you are already extracting percentages with regex, the next step is linking them to their respective time periods. Try using regex or a script to group returns by date ranges before validation. In Excel, you could use Power Query or VBA for structured extraction.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.