Extract specific String

ysshin.temp · January 8, 2021, 12:23am

Hi. I brought all text from PDF file by UiPath activity,
and I want to use regex and extract these items;

In row, first item is index.
Second item, which is underlined, is product name.
Third item is quantity of product, and fourth is price per product.
Fifth item, which is underlined, is total price. Others are extra.

I need second item and fifth item.
If have any good idea, please share that. Thanks.

prasath17 · January 8, 2021, 12:29am

@ysshin.temp - Is it possible for you to share the text file?

prasath17 · January 8, 2021, 12:48am

@ysshin.temp - Please try this…

To extract the 2nd Item

To extract the 5th Item

ysshin.temp · January 8, 2021, 3:59am

Thanks, but upload image was just one of example.
For using RPA machine, I have to make common format to extract items.

Second, third, fourth, fifth, sixth(that is note for explanation, so it can exist/not exist) is variable.
Just First item(index) is constant.

1 VMWare vCenter Server 6 SupportSubscription 1 1,800,000 1,800,000
2 VMWare vSphere 6 Enterprise Plus SupportSubscription 6 2,150,000 12,900,000
3 VMWare vSphere 6 Standard SupportSubscription 6 390,000 2,340,000
4 VMWare vSphere 6 Standard SupportSubscription 6 6 390,000 2,340,000 비고내용입니다. 2,340,000
100 VMWare vSphere 6 Standard SupportSubscription 6 6 100 100 2,340,000 비고입니다.

I attach the text of example again, so can you give good idea again?
Thanks.

park363 · January 8, 2021, 4:43am

for 2nd item
(?<=^\d{1,}\s).*?(?=\s\d{1,}\s)
for 5th item
[\d,]+\r

you can get string list like below

txt ← source text

Assing Activity
pattern1 = “(?m)(?<=^\d{1,}\s).*?(?=\s\d{1,}\s)”
pattern2 = “[\d,]+\r”

list1 = Regex.Matches(txt,pattern1).Cast(Of Match).Select(Function(m) m.Value).ToList
list2 = Regex.Matches(txt,pattern2).Cast(Of Match).Select(Function(m) m.Value).ToList

prasath17 · January 8, 2021, 4:54am

Hi @park363 - With all due respect, It didnt pick the value in the last row…amount we need to capture is 100 here…(As per the screenshot)…

I am still scratching my head to get this regex…its very very tricky…

Plus for the first pattern 4th and fifth row we need to capture up to 6 another tricky place

I came so close for the first pattern, but 5th line is not perfect…check this…

moenk · January 9, 2021, 11:00am

Actually a simple split function in an assigm would do it.

prasath17 · January 9, 2021, 3:13pm

Hi…@moenk … with what character you would split in this case? See the user requirement above …I am interested to see …

moenk · January 10, 2021, 8:49am

Its obivously seperated by spaces, so if you split, you grab 2nd, 3rd, 5th item from resulting array. Do this in a loop over an array of the lines, for this need split the entire thing by CrLf first of course.

prasath17 · January 10, 2021, 12:49pm

@moenk… I am afraid It won’t work …please look closely again for the 2nd element he asked…it has so many spaces …see below

1 VMWare vCenter Server 6 SupportSubscription 1 1,800,000 1,800,000…

moenk · January 10, 2021, 8:55pm

Sorry, you are right, so we have to split on “SupportSubscription” first, but this makes it not so beautiful any more. I’d look at the process now if there is another format available.

park363 · January 10, 2021, 11:38pm

I did not check last 2 lines.
It is very tricky.

ysshin.temp · January 11, 2021, 5:41am

Thanks for your help.

I will remain and save first item to fifth item with regex like this:
[0-9]+ [a-zA-Z0-9가-힣 ]+ [0-9]+ ([0-9]{1,3},)([0-9]{1,3}) ([0-9]{1,3},)([0-9]{1,3})

Then, I will use your idea.
It is tiresome, but I don’t know how to extract at once.

Thanks for your help.

ysshin.temp · January 11, 2021, 5:46am

Thanks for your help.

prasath17 · January 11, 2021, 5:56am

@ysshin.temp - If the 2nd string would have ended consistently like the first 3 rows, I think there is a possibility. Since it is ending with 6 followeup by another integer makes the pattern tougher.

system · January 14, 2021, 5:57am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Extract Specific Info from PDF Something Else feedback	8	902	January 17, 2022
Extraction of Text using Assign and Match Regex Help activities , regex , string , question	3	3312	December 31, 2019
RegEx tool confusion Studio studio , question , tools	9	321	December 10, 2023
Extract certain key words from multiple pdfs Activities pdf , activities , question	8	821	February 8, 2022
Extract string from pdf Help	15	1966	February 16, 2020

Most Active Users - Yesterday
ashokkarale
MD_Farhan1
Ajay_Mishra
postwick
Dheerendra_vishwakarma
Anil_G
chandreshsinh.jadeja
Gautham_Pattabiraman
vrdabberu
aravindbalineni123
More details...

Extract specific String

Related Topics