How to extract last 4-5 words from string

I want to extract last 4 or 5 words from string.
For e.g.
“Siddhi vinayak society 6900 gulmarg road,main road Navi Mumbai IN 89L”

I want output as ‘Navi Mumbai IN 89L’. How can we extract this part.

Note: this is only example, each time string will be different.

And also i want to find starting 4 or 5 words.
I.e.
Siddhi vinayak society 6900

From above string
What changes we need to do in following regex.

Hi,

The following expression returns last 4 words.

System.Text.RegularExpressions.Regex.Match(yourString,"(\s+\S+){4}$").Value.Trim

Can you share details If you have some rule to identify 4 or 5 words?

Regards,

1 Like

Here is another approach without regex

LAST_N_WORDS.xaml (7.3 KB)

Hi @dokumentor,

I love reading threads here but often when solutions are just uploaded as .xaml, readers like me miss out on novel solutions.

  • Could you kindly update your post with a screenshot to show the contents of .xaml
  • A breif explanation of what approach you use instead of regex can also help readers / the person with the query
  • Bonus points for explaning the advantages and disadvantages of your approach as well :slight_smile:

Of course @jeevith mine is a basic solution using a structure:

image

image

If user has to evaluate the word he can do it inside the iteration

2 Likes

Hi @Yoichi and @dokumentor,

I first thought a quantifier (\s+\S+){4,5}$ would take care of 4 or 5 words, but I think as @Mansi_Mhatre will inform us, this address parsing for long addresses is quite tricky. Here the ending string can also just be "Mumbai IN 89L".

@Mansi_Mhatre please share more expected scenarios.

1 Like

Address parsing is very difficult and patterns change from one country to other.

It would be great to have inside UiPath a fuzzy database extractor that helps you locate address parts based on the possible relations between them. For example if you can collect a database of street names with zip codes and province/state you may use this feature to accurately locate full address. Variable parts like street number can be extracted using regex (related to previous result position)

This is a great feature available in document processing software like Kofax Transformation

I will create a feedback post expanding this topic

I have 200+ countries with keywords.
Please find some sample keywords.

India IN,Mumbai,Delhi,Mysore,Surat,Hyderabad,Kolkata,Punjab,Bangalore
Iraq IQ,BAGHDAD,BASRA,NAFJAF,KIRKUK,KARBALA,NASIRIYAS,AMARA
China CN,SHANGHAI,beijing,chongqing,guangzhou,tianjin,shenzhen,hangzhou,wuhan
France FRA,Paris,Lyon,Nice,Nantes,Reims,Lille
England UK,London,Manchester,Bristol,Liverpool,Southampton
Russia Moscow,Kazan,Omsk,Volgograd,Krasnoyarsk,Samara
Afghanistan AF,Kabul,kandahar,jalalabad,kandahar,herat,ghazni,khanabad
Zimbabwe Harare, Bulawayo, Manicaland, Midlands, Mashonaland West, Masvingo
Germany Berlin,Munich,Hamburg,stuttgart,essen,Leipzig
Morocco MA,Casablanca,Rabat,fez,Tangier,Agadir,kenitra,oujda,meknes
Pakistan Karachi,Lahore,Punjab,Islamabad,Hyderabad,Faisalabad,Peshawar,Multan,Bahawalpur,Sargodha
Jamaica JM,Kingston,Portmore,Montego,saint catherine,mandeville,maypen,holdharbour
Mali bamako,gao,sikasso,kalabancoro,koutiala,segou,kayes,kati,mopti,niono
Iran Hamadan, Ilam,Karaj, Tehran, East Azerbaijan, Semnan, Birjand, Bushehr
Spain Madrid,Barcelona,Seville,bilbao.valencia,cadiz,Girona

I have given countries keywords with name in dictionary (key,value).

So i want to check address string from backside i e. Last word first to find accurate Country.
Basically I want to check string in reverse order if multiple country keywords found then it should return last keyword country name.

Input example 1:
“786 apartment, Shanghai building, Mumbai 400710”

It contains 2 keywords - 1 from (Shanghai)china and 1 from (Mumbai) India

Output:
India(Mumbai)

Input example 2:
" Kingston apartment, near Uk ambassy , Madrid 7896 ES"

It contains 3 keywords- 1 from Jamaica (Kingston), 1 from England (UK) and 1 from Spain(Madrid)

Output:
Spain

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.